Dataset

For model training, provide annotated metadata (label giving “correct answer” in supervised learning) in addition to input data. These are generally called “data sets”, but ABEJA Platform provides resources such as dataset and dataset items to handle these easily.

Dataset

To register annotated metadata which are used to train model on ABEJA Platform, create a dataset in one of the organization. After that, you can register as many items as you want in the dataset.

In the dataset, we manage the name and information like inference type but the most important thing is to manage the label of training data.

{
  "categories": [
    {
      "labels": [
        {
          "label_id": 1,
          "label": "dog"
        },
        {
          "label_id": 2,
          "label": "cat"
        },
        {
          "label_id": 3,
          "label": "other"
        }
      ],
      "category_id": 1,
      "name": "cats_dogs"
    }
  ]
}

Dataset Item

Dataset item is the combination of data reference and labels of trained data. In the dataset, you can register dataset item as many as you want.

{
  "source_data": [{
    "data_uri": "datalake://130985764897/20170704T062222-cb6750bf-e679-48a6-ab96-0f4292e09f76",
    "data_type": "image/jpeg"
  }],
  "attributes": {
    "classification": {
      "category_id": 1,
      "id": 2,
    }
  }
}

During model training, you can acquire data entities and labels using the SDK provided by ABEJA Platform.