- Overview
- Getting Started Guide
- UserGuide
-
References
-
ABEJA Platform CLI
- CONFIG COMMAND
- DATALAKE COMMAND
- DATASET COMMAND
- TRAINING COMMAND
-
MODEL COMMAND
- check-endpoint-image
- check-endpoint-json
- create-deployment
- create-endpoint
- create-model
- create-service
- create-trigger
- create-version
- delete-deployment
- delete-endpoint
- delete-model
- delete-service
- delete-version
- describe-deployments
- describe-endpoints
- describe-models
- describe-service-logs
- describe-services
- describe-versions
- download-versions
- run-local
- run-local-server
- start-service
- stop-service
- submit-run
- update-endpoint
- startapp command
-
ABEJA Platform CLI
- FAQ
- Appendix
Upload your dataset
Introduction
In this page, we explain the procedures for uploading publicly available benchmark data sets and previously annotated data to ABEJA Platform.
Advance preparation
Create a Datalake channel and upload data
First, create a Datalake channel.
from abeja.datalake import Client as DatalakeClient
from abeja.datalake.storage_type import StorageType
ABEJA_ORGANIZATION_ID = 'XXXXXXXXXXXXXX'
ABEJA_PLATFORM_USER_ID = 'XXXXXXXXXXXXXX'
ABEJA_PLATFORM_TOKEN = 'XXXXXXXXXXXXXX'
credential = {
'user_id': ABEJA_PLATFORM_USER_ID,
'personal_access_token': ABEJA_PLATFORM_TOKEN
}
datalake_client = DatalakeClient(organization_id=ABEJA_ORGANIZATION_ID, credential=credential)
name = 'XXXXXXXXXXXXXXXXXXX'
description = 'XXXXXXXXXXXXXXXXXXXXXXX'
channel = datalake_client.channels.create(name, description, StorageType.DATALAKE.value)
Next, upload the data.
channel = datalake_client.get_channel(channel.channel_id)
file = channel.upload_file('cat.jpg')
Create Dataset (Classification)
Create Dataset settings as follows. First, define the schema (class information). For reference, let’s consider the dog/cat class 2 classification problem. For multi-class classification, add multiple categories to the categories
list.
from abeja.datasets import Client as DatasetClient
datasets_client = DatasetClient(organization_id=organization_id, credential=credential)
labels = [{"label_id": 0, "label": "dog"}, {"label_id": 1, "label": "cat"}]
category = {'labels': labels, 'category_id': 0, 'name': 'cats_dogs'}
props = {"categories": [category]}
dataset = datasets_client.datasets.create(name='XXXXXXXXXXXXX', type='classification', props=props)
Upload annotation data as follows.
source_data = [
{
'data_type': 'image/jpeg',
'data_uri': 'datalake://{}/{}'.format(channel.channel_id, file.file_id),
}
]
data = {
'category_id': 0,
'label_id': 1
}
attributes = {'classification': [data]}
dataset_item = dataset.dataset_items.create(source_data=source_data, attributes=attributes)
Dataset creation (Detection)
In the case of Detection, it will be as follows. The schema is the binary classification of dogs and cats as before. Change type
todetection
.
from abeja.datasets import Client as DatasetClient
datasets_client = DatasetClient(organization_id=organization_id, credential=credential)
labels = [{"label_id": 0, "label": "dog"}, {"label_id": 1, "label": "cat"}]
category = {'labels': labels, 'category_id': 0, 'name': 'cats_dogs'}
props = {"categories": [category]}
dataset = datasets_client.datasets.create(name='XXXXXXXXXXXXX', type='detection', props=props)
Upload annotation data as follows.
source_data = [
{
'data_type': 'image/jpeg',
'data_uri': 'datalake://{}/{}'.format(channel.channel_id, file.file_id),
}
]
rect = {'xmin': 200, 'ymin': 0, 'xmax': 1000, 'ymax': 900}
det1 = {
'category_id': 0,
'label_id': 1,
'rect': rect
}
attributes = {'detection': [det1]}
dataset_item = dataset.dataset_items.create(source_data=source_data, attributes=attributes)
Create Dataset (Custom)
In addition to Classification / Detection, free-form annotations can be used. In this case, set type
tocustom
.
from abeja.datasets import Client as DatasetClient
datasets_client = DatasetClient(organization_id=organization_id, credential=credential)
labels = [{"label_id": 0, "label": "dog"}, {"label_id": 1, "label": "cat"}]
category = {'labels': labels, 'category_id': 0, 'name': 'cats_dogs'}
props = {"categories": [category]}
dataset = datasets_client.datasets.create(name='XXXXXXXXXXXXX', type='custom', props=props)
Upload annotation data as follows.
source_data = [
{
'data_type': 'image/jpeg',
'data_uri': 'datalake://{}/{}'.format(channel.channel_id, file.file_id),
}
]
d = {
'category_id': 0,
'label_id': 1,
'text': 'nyaan'
}
attributes = {'custom': [d]}
dataset_item = dataset.dataset_items.create(source_data=source_data, attributes=attributes)
Check the data
Let’s check the last uploaded data.
from abeja.datasets import Client as DatasetClient
client = DatasetClient(organization_id=organization_id, credential=credential)
dataset = client.get_dataset(XXXXXXXXXXXX)
dataset_list = list(dataset.dataset_items.list(prefetch=False))
d = dataset_list[0]
file_content = d.source_data[0].get_content()
file_like_object = io.BytesIO(file_content)
img = Image.open(file_like_object)
annotation = d.attributes