upload

Description

Upload to Datelake. All files in a directory can be uploaded at once by specifying the directory.

Synopsis

$ abeja datalake upload [--help]
Usage: abeja datalake upload [OPTIONS] [PATHS]...

  Upload file or directory

Options:
  -c, --channel_id TEXT           Channel identifier  [required]
  --dry-run, --dry_run            Dry run, only shows upload candidate files
  -r, --recursive                 Recursively upload directory
  -m, --metadata, --meta-data, --meta_data METADATASTRING
                                  Metadata to add all upload files
  -l, --file-list, --file_list PATH
                                  JSON file which list files and metadata
  --retry [ask|no|force]          Retry to upload files if there are files
                                  couldn't be uploaded (default: 'ask')
  --save-result FILENAME          Save uploaded file info as JSON at the
                                  specified path.
  --skip-duplicate-files          Don't upload file if the file whose name is
                                  same already exists in the channel.
  --help                          Show this message and exit.

Argument

PATH

Specify absolute path/relative path where to upload files or a directory. Multiple specification is possible.

Options

-c, --channel_id

Specify channel ID where to upload.

--dry-run, --dry_run

This option shows a list of files expected upload Actual upload will not be performed in case of using this option.

-r, --recursive

This option is for searching a directory recursively which upload needed.

-m, --metadata, --meta-data, --meta_data

Create a metadata for all upload files. Multiple specification is possible for a metadata. use <name>:<value>format to specify it as below.

--metadata label:cat

Pleases select the --metadata option multiply in case of multiple metadata specification as below.

--metadata label:cat --metadata category:animal

-f, --file-list, --file_list

Specify the upload files list with JSON format setting file. The contents of setting files has at least one array and each element including following property.

  • file - (Required) A string of relative path/ absolute path for upload files
  • metadata -The object to specify metadata for setting files

For example, Setting files is specified 3 files test1.txt, test2.txt, test3.txt as follows.

[{ "file": "test1.txt" }, { "file": "test2.txt" }, { "file": "test3.txt" }]

Possible to specify different metadata for each files as follows.

[
  {
    "file": "images/sample1.jpg",
    "metadata": {
      "weight": 50,
      "size": 150,
      "kind": "W5"
    }
  },
  {
    "file": "images/sample2.jpg",
    "metadata": {
      "weight": 45,
      "size": 300,
      "kind": "E3"
    }
  }
]

--retry <value>

Specify the value whether retry or not when upload failed. The values are ask, no, force and taking ask as default setting.

--save-result <value>

Specify a path for storing the uploaded files information. File information will be saved as a json format as below.

[
  {
    "file": "/path/to/filename.jpg",
    "file_id": "20180101T000000-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
    "metadata": {
      "x-abeja-meta-filename": "filename.jpg",
      "x-abeja-meta-foo": "2",
      "x-abeja-sys-meta-organizationid": "1234567890123"
    }
  },
  ...
]

--skip-duplicate-files

This option does not upload the file if the same file name already exists in the output directory path.

Example

Upload a file to datalake

Upload specified files to channel is ID like 1234567890123 and setting sample-value on the metadata sample-key.

Command:

$ abeja datalake upload ./upload_file.txt --channel_id 1234567890123 --metadata sample-key:sample-value

Upload a directory to datalake at once

Upload all files in a directory ./upload_dir to channel that ID is like 1234567890123 and setting sample-value on the metadata sample-key.

Command:

$ abeja datalake upload ./upload_dir --channel_id 1234567890123 --metadata sample-key:sample-value --recursive