- Overview
- Getting Started Guide
- UserGuide
-
References
-
ABEJA Platform CLI
- CONFIG COMMAND
- DATALAKE COMMAND
- DATASET COMMAND
- TRAINING COMMAND
-
MODEL COMMAND
- check-endpoint-image
- check-endpoint-json
- create-deployment
- create-endpoint
- create-model
- create-service
- create-trigger
- create-version
- delete-deployment
- delete-endpoint
- delete-model
- delete-service
- delete-version
- describe-deployments
- describe-endpoints
- describe-models
- describe-service-logs
- describe-services
- describe-versions
- download-versions
- run-local
- run-local-server
- start-service
- stop-service
- submit-run
- update-endpoint
- startapp command
-
ABEJA Platform CLI
- FAQ
- Appendix
train-local
On this page, explain how to use train-local.
※Install ABEJA Platform CLI and Docker are required to install before using training local. Please refer to here for Platform CLI installation
What is TRAIN-LOCAL
This function is useful that able to implement training at the environment of machine learning(GPUs) in a local environment and able to implement model management and provided API in the ABEJA Platform as using existing resources.
Create training.yaml
「training.yaml」is required for local training. Run following command to create training.yaml
.
$ abeja training init
training.yaml
will be created on directory after execute command.
Please refer to the following sample and edit it if it is needed.
Please refer to here for the description method
■training.yaml (sample)
name: train-local-demo
handler: train:handler
image: abeja-inc/all-cpu:18.10
params:
NUM_EPOCHS: '1'
C: '1'
MODEL_FILENAME: model.pkls
Create job definition
Execute following command with training.yaml
$ abeja training create-job-definition
Job definition will be created with training.yaml
to organisation setting by “~/.abeja/config”.
Create version for job definition
Create handler
function at train job cord. Refer to here for handler
description.
Next, move on to create version for job definition after creation of train job code has been done.
$ abeja training create-version
Run job definition at local environment.
$ abeja training train-local --help
Usage: abeja training train-local [OPTIONS]
Local train commands
Options:
-o, --organization_id, --organization-id TEXT
Organization ID, organization_id of current
credential organization is used by default
[required]
--name TEXT Training Job Definition Name [required]
--version TEXT Training Job Definition Version [required]
--description TEXT Training Job description
-d, --datasets DATASETPARAMSTRING
Datasets name
-e, --environment ENVIRONMENTSTRING
Environment variables
-v, --volume VOLUMEPARAMSTRING Volume driver options, ex) /path/source/on/h
ost:/path/destination/on/container
--v1 Specify if you use old custom runtime image
--runtime TEXT Runtime, equivalent to docker run
`--runtime` option
--config PATH Read Configuration from PATH. By default
read from `training.yaml`
--help Show this message and exit.
(The value defined on training.yaml
will be overridden when training.yaml
is on the directory that you run the command.)
■ Command sample
■ When running already defined training.yaml
$ abeja training train-local --version 1 --environment NUM_EPOCHS:100 --environment C:3
After running the above command, It it able to confirm running job, logs and learning result at management console.
( Use --environment
to override version of job definition.)
■ When using local data
$ abeja training train-local --version 1 --volume `pwd`:/data --environment NUM_EPOCHS:100 --environment C:3
# For instance, when there is a data at current directory and you want to put to `/data` in learning job directly,
Input above command then it will be mounted to environment(container
) to be run the job from local environment.
Note:
- Logs are not sent in real time as for now. (Aug./2019)
- TensorBoard is not be able to use on local training.
- It will not be set “read only” with the directory mounted
--volume
option. - The name will be changed train-local to debug-local so far.