Skip to main content
POST
/
job
Start a new LLM/VLM training job
curl --request POST \
  --url https://training-suite.app.simplismart.ai/job/ \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form org=0bf00b43-430a-4ca3-a8b3-b13cc8dc6d4f \
  --form experiment_name=launch-simplismart-causal_lm-lora \
  --form 'dataset_config={
  "preprocessing": {
    "lazy_tokenize": true,
    "streaming": false,
    "prompt": {
      "system": null,
      "max_length": 4096,
      "template": null
    }
  },
  "split": {
    "type": "random",
    "ratios": [0.9, 0.1]
  }
}
' \
  --form 'model_details={
  "base_model": "meta-llama/Llama-3.2-1B-Instruct",
  "ownership": "public",
  "source_type": "hf",
  "model_type": "llm",
  "quantization": {
    "quant_bits": 4
  }
}
' \
  --form 'train_config={
  "type": "sft",
  "torch_dtype": "bfloat16",
  "task_type": "causal_lm",
  "train_type": "lora",
  "tuner_backend": "simplismart",
  "hyperparameters": {
    "num_epochs": 1,
    "per_device_train_batch_size": 8,
    "per_device_eval_batch_size": 8,
    "gradient_checkpointing": true,
    "save_steps": 500,
    "save_total_limit": 2,
    "eval_steps": 500,
    "logging_steps": 5,
    "learning_rate": 0.0001,
    "dataloader_num_workers": 1
  },
  "adapter_config": {
    "r": 16,
    "alpha": 16,
    "dropout": 0.1,
    "targets": ["all-linear"]
  },
  "distributed": {
    "type": "ddp"
  }
}
' \
  --form 'dataset_details={
  "dataset_name": "dataset-name",
  "dataset_path": "s3://training-dev-datasets/ds/sharegpt_ds_half.jsonl",
  "dataset_description": "",
  "dataset_type": "jsonl",
  "dataset_format": "sharegpt",
  "source_type": "s3",
  "ownership": "private",
  "secret_id": "<your-secret-key>",
  "region": "us-west-2"
}
' \
  --form 'infra_config={
  "gpu_type": "h100",
  "gpu_count": 2,
  "infra_type": "simplismart",
  "node_count": 2
}
'
{
  "request_id": "<string>",
  "status": "QUEUED",
  "message": "<string>",
  "experiment_name": "<string>"
}

Authorizations

Authorization
string
header
required

JWT token for authentication

Headers

Authorization
string
required

Bearer token for authentication and authorization.

Body

multipart/form-data
org
string
required

Organization ID associated with the training job.

Example:

"0bf00b43-430a-4ca3-a8b3-b13cc8dc6d4f"

experiment_name
string
required

Name assigned to the training experiment.

Example:

"launch-simplismart-causal_lm-lora"

dataset_config
string
required

JSON-formatted string containing dataset preprocessing and split configuration.

Example:

"{\n \"preprocessing\": {\n \"lazy_tokenize\": true,\n \"streaming\": false,\n \"prompt\": {\n \"system\": null,\n \"max_length\": 4096,\n \"template\": null\n }\n },\n \"split\": {\n \"type\": \"random\",\n \"ratios\": [0.9, 0.1]\n }\n}\n"

model_details
string
required

JSON-formatted string containing model configuration including base model, quantization, and ownership details.

Example:

"{\n \"base_model\": \"meta-llama/Llama-3.2-1B-Instruct\",\n \"ownership\": \"public\",\n \"source_type\": \"hf\",\n \"model_type\": \"llm\",\n \"quantization\": {\n \"quant_bits\": 4\n }\n}\n"

train_config
string
required

JSON-formatted string containing training configuration including hyperparameters, adapter settings, and distributed training options.

Example:

"{\n \"type\": \"sft\",\n \"torch_dtype\": \"bfloat16\",\n \"task_type\": \"causal_lm\",\n \"train_type\": \"lora\",\n \"tuner_backend\": \"simplismart\",\n \"hyperparameters\": {\n \"num_epochs\": 1,\n \"per_device_train_batch_size\": 8,\n \"per_device_eval_batch_size\": 8,\n \"gradient_checkpointing\": true,\n \"save_steps\": 500,\n \"save_total_limit\": 2,\n \"eval_steps\": 500,\n \"logging_steps\": 5,\n \"learning_rate\": 0.0001,\n \"dataloader_num_workers\": 1\n },\n \"adapter_config\": {\n \"r\": 16,\n \"alpha\": 16,\n \"dropout\": 0.1,\n \"targets\": [\"all-linear\"]\n },\n \"distributed\": {\n \"type\": \"ddp\"\n }\n}\n"

dataset_details
string
required

JSON-formatted string containing dataset information including path, format, and access credentials.

Example:

"{\n \"dataset_name\": \"dataset-name\",\n \"dataset_path\": \"s3://training-dev-datasets/ds/sharegpt_ds_half.jsonl\",\n \"dataset_description\": \"\",\n \"dataset_type\": \"jsonl\",\n \"dataset_format\": \"sharegpt\",\n \"source_type\": \"s3\",\n \"ownership\": \"private\",\n \"secret_id\": \"<your-secret-key>\",\n \"region\": \"us-west-2\"\n}\n"

infra_config
string
required

JSON-formatted string containing infrastructure requirements including GPU type, count, and node configuration.

Example:

"{\n \"gpu_type\": \"h100\",\n \"gpu_count\": 2,\n \"infra_type\": \"simplismart\",\n \"node_count\": 2\n}\n"

Response

Training job submitted successfully.

request_id
string

Unique identifier for the training job request

status
enum<string>

Initial status of the training job

Available options:
QUEUED,
RUNNING,
COMPLETED,
FAILED,
CANCELED
message
string

Additional information about the job submission

experiment_name
string

Name of the training experiment