import-from-datalake

Description

Creating training data from a file saved at datalake or metadata.

Synopsis

$ abeja dataset import-from-datalake --help
Usage: abeja dataset import-from-datalake [OPTIONS]

  Imports dataset items from datalake. You can import dataset items from a
  datalake channel with properties. You have to prepare a datalake channel
  that has files you want to use as dataset. `x-abeja-meta-label` and
  `x-abeja-meta-label_id` is used as properties by default. For example,
  `cat00001.jpg` with `x-abeja-meta-label:cat` and `x-abeja-meta-label_id:1`
  is registered to a dataset with label `cat` and label_id 1. You can
  specify the multiple metadata names for properties with `--property-
  metadata-key` option.

Options:
  -c, --channel_id, --channel-id TEXT
                                  DataLake channel id  [required]
  -d, --dataset_id, --dataset-id TEXT
                                  Dataset id  [required]
  --property-metadata-key TEXT    DataLake metadata that is used as property
                                  of dataset. label and label_id is used as
                                  default
  --category_id, --category-id TEXT
                                  category id of the property. default is 1
  --type TEXT                     dataset type. default is classification
  --max-size-for-label INTEGER    Max number of items for each labels that is
                                  uploaded to dataset API
  --help                          Show this message and exit.

Options

-c, --channel_id

Specify datalake channel_id (Required)

-d, --dataset_id

Specify dataset_id of dataset defining training data (Required)

--property-metadata-key

Specify metadata key (Strings after x-abeja-meta-) which is used extract labels from files on the datalake. x-abeja-meta-label and x-abeja-meta-label_id will be the default setting.

--category-id

Specify category_id of properties to register. default is `

--type

Specify type of dataset item to register. defualt is classification

--max-size-for-label

Specify the maximum number to upload teaching data.

Formatting teaching data

Possible to create teaching data for classification, detection, segmentation-image on current version.

{
  "source_data": [{
    "data_uri": "datalake://130985764897/20170704T062222-cb6750bf-e679-48a6-ab96-0f4292e09f76",
    "data_type": "image/jpeg"
  }],
  "attributes": {
    "classification": {
      "category_id": 1,
      "label_id": 1
    }
  }
}

Example

specify the metadata name for labeling to x-abeja-meta-cateogry

Command:

$ abeja dataset import-from-datalake --channel_id 1234567890123 \
                                     --dataset_id 1375869849573 \
                                     --property-metadata-key category