Upload file to Datalake using SDK

Introduction

This guide explains how to upload file to Data Lake using ABEJA Platform SDK.

Install SDK

Please confirm installation method of ABEJA Platform SDK and install it.

Upload files in a directory

Directory tree as follows

$ tree
.
├── cats
│   ├── cat-1.jpeg
│   ├── cat-2.jpeg
│   └── cat-3.jpeg
└── dogs
    ├── dog-1.jpeg
    ├── dog-2.jpeg
    └── dog-3.jpeg
2 directories, 6 files

Confirmation and configure of authentication information

ABEJA Platform SDK is authenticated by user authentication. Check the user ID and personal access token described here.

Setting up authentication information for SDK. Set the authentication information in the environment variable as follows

$ export ABEJA_ORGANIZATION_ID='0123456789123'
$ export ABEJA_PLATFORM_USER_ID='user-1111111111111'
$ export ABEJA_PLATFORM_PERSONAL_ACCESS_TOKEN='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

Configure user authentication is also possible inside the source code. For authentication information setting from source code, please refer to [ABEJA Platform SDK Documentation](http://sdk-spec.abeja.io/#client-parameter).

Execution of Python program

Execute the following program to upload file in the directory to Datalake.

from abeja.datalake import Client

client = Client()
channel_id = '1234567890123'

channel = client.get_channel(channel_id)

# upload cats dir
cat_metadata = {
    'label': 'cat'
}
channel.upload_dir('./cats', metadata=cat_metadata)

# upload dogs dir
dog_metadata = {
    'label': 'dog'
}
channel.upload_dir('./dogs', metadata=dog_metadata)

Confirm Uploaded file

You can check uploaded files on the Datalake details page.

3-datalake-file-uploaded-by-sdk.png