This updated guide provides an overview of our enhanced UI for training large language models and vision language models, supporting both Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). For each training type, you can choose between full-model fine-tuning or parameter-efficient approaches like LoRA. While full-model fine-tuning is fully supported across both SFT and RLHF, we recommend using LoRA for most use cases due to its faster convergence, lower GPU memory usage, and simplified checkpointing.
Note: This guide refers to the latest version (v2). If you’re using the older interface, please refer to the Legacy v1 Guide.

Starting a Training Experiment

Experiment Name: A unique identifier for each training job within your organization.

Model Details

  • Base Model: Select a supported model from the list below.
  • Source Type: Currently supports models from Hugging Face.
  • Model Type: Auto-filled based on the selected base model.

Supported Models

  • meta-llama/Llama-3.1-8B-Instruct
  • meta-llama/Llama-3.2-1B-Instruct
  • meta-llama/Llama-3.2-3B-Instruct
  • meta-llama/Llama-3.2-11B-Vision-Instruct
  • Qwen/Qwen2.5-3B-Instruct
  • Qwen/Qwen2.5-14B-Instruct
  • Qwen/Qwen2.5-VL-7B-Instruct
  • tiiuae/falcon-7b-instruct
title
Note:
  • If you select LLM as the model type for a VLM base model, only the language component will be trained.
  • To train the vision component, ensure both base model and model typeare set to VLM.

Dataset Selection

Configure your dataset for training using the following fields:
  • Source Options: Select the source of your dataset. Supported options include
    • Hugging Face (public Hub)
    • AWS S3
    • GCP Storage (GCS)
  • Dataset Name
    This should be unique within your organization to help with organizing and reusing datasets.
  • Dataset Path
    Specify the dataset location. For AWS S3 & GCP GCS, use the full path in the format.
    e.g., s3://your-bucket/your-file.jsonl
  • Dataset Description (Optional)
    Provide a brief description of the dataset’s contents or purpose. Optional but useful for reference.
  • Secret (Required for AWS S3 or GCP GCS)
    Provide your cloud credentials to enable secure access to private storage buckets.
  • Region (Required for AWS S3 or GCP GCS)
    Select the region where your storage bucket is located.

Dataset Format

We support JSONL format for all training data.
  • For VLM models, use a ZIP file containing both the image files and a train.jsonl file (the master training file).
    The directory should be archived in a .zip file and stored in an object storage.
    Example zip command:cd path/to/dataset_dir && zip -r dataset_dir.zip ./*
Each line in a .jsonl file should represent a complete training example. The supported format styles are:
  1. ShareGPT Format
    {
      "system": "<system>",
      "conversation": [
        {"human": "<query1>", "assistant": "<response1>"},
        {"human": "<query2>", "assistant": "<response2>"}
      ]
    }
    
  2. OpenAI SFT Format
    {
      "messages": [
        {"role": "system", "content": "<system>"},
        {"role": "user", "content": "<query1>"},
        {"role": "assistant", "content": "<response1>"},
        {"role": "user", "content": "<query2>"},
        {"role": "assistant", "content": "<response2>"}
      ]
    }
    
  3. OpenAI DPO Format (for preference training)
    {
      "messages": [
        {"role": "system", "content": "You are a useful and harmless assistant"},
        {"role": "user", "content": "Tell me tomorrow's weather"},
        {"role": "assistant", "content": "Tomorrow's weather will be sunny"}
      ],
      "rejected_response": "I don't know"
    }    
    

title

Dataset Configuration

  • Lazy Tokenize: Delay tokenization until needed. Speeds up dataset loading for large files.
  • Streaming: Enable only for public HF Datasets to load records on-the-fly, reducing local storage needs.
  • Prompt Max Length: Maximum token length for prompt. Longer sequences will be truncated.
    Recommended: 2048
  • System Prompt: (Optional) A global prefix to every example, e.g., You are a helpful assistant.
  • Prompt Template: (Optional) If your data needs wrapping in a custom template, e.g., <system> {system_prompt} <user> {prompt}.
  • Train/Validation Split: Percentage (fraction) for splitting your .jsonl into training and validation sets.
    • Split Type
      Currently, only random split is supported. The dataset will be randomly divided into training and validation sets.
    • Train Split Ratio
      Enter the ratio of data to be used for training (e.g., 0.9 for 90%).
    • Validation Split Ratio
      Enter the ratio of data to be used for validation (e.g., 0.1 for 10%).
      Train Split Ratio should be greater than 0.8

Infrastructure Configuration

  • GPU Type: Select instance GPU, e.g., H100, L40s.
  • GPU Count: Number of GPUs to allocate for this job.
Adjust based on model size and dataset scale. More GPUs reduces training time. title

Training Configuration

  1. Core Options
ParameterDescriptionExample
Train TypeSelect the tuning algorithmSFT
Adapter TypeChoose adapter methodLoRA, Full
Torch DTypePrecision setting for trainingbfloat16
Adapter Type
  • Full – Use this option for full-model fine-tuning, where all model parameters are updated.
  • LoRA – Use this for parameter-efficient fine-tuning using Low-Rank Adapters (LoRA), which updates a small subset of weights for faster training and lower resource usage.
Note: LoRA is generally recommended for efficiency and ease of deployment.
Screenshot 2025-09-01 at 10.29.55 PM.png
  1. Tuner Backend (Applicable only for SFT Training type)
    The Tuner Backend defines the framework used to run fine-tuning and enables faster performance through efficient training strategies.
    • PEFT (Parameter-Efficient Fine-Tuning) Backend
      Standard backend widely used for LoRA-based fine-tuning.
      • Supports distributed training with either DDP (Distributed Data Parallel) or DeepSpeed.
    • Simplismart Backend
      Optimized backend designed for more efficient GPU compute and memory utilization.
      • Currently supports only DDP for distributed training, ensures consistent and predictable scaling across multiple GPUs.
    DDP replicates the model across GPUs and synchronizes gradients at each step, providing stable multi-GPU training.DeepSpeed adds advanced features like optimizer state partitioning, gradient sharding, and memory offloading, enabling the training of larger models on limited hardware.
  2. RLHF Configuration (Applicable only for RLHF Training type) Screenshot 2025-09-01 at 10.31.53 PM.png When selecting Training Type = RLHF, additional configuration fields appear under RLHF Config. These vary depending on the chosen RLHF Type. The platform supports the following RLHF variants:
    • DPO (Direct Preference Optimization)
      • Beta
        Controls the trade-off between preference loss and KL regularization.
        Default: 0.3
        Optional: Yes, but recommended.
    • GRPO (Generative Rollouts with Preference Optimization)
      • Beta
        Similar to DPO, this governs the preference vs. KL loss balance.
        Default: 0.0001
      • Max Num Seqs
        Number of sequences to use during rollout.
        Default: 1
        Recommended Value: 1
      • Enforce Eager
        If enabled, forces rollouts to run in eager mode rather than compiled mode. Useful for debugging or compatibility issues.
        Default: Unchecked
        Recommended: We suggest enabling Enforce Eager during GRPO training.
    • Common Parameters:
      FieldDescriptionRequiredDefault
      RLHF TypeSelect the RLHF variant to use-
      Reference ModelPath to the baseline model used for KL regularizationOptional-
      Reward ModelPath to the reward modeOptional-
  3. Optimization Hyperparameters
    ParameterDescriptionDefault
    Values
    Recommended ValuesPermissible Range
    Num EpochsNumber of full passes through the dataset12-5< 50
    Train Batch SizeSamples per device for training88< 16
    Eval Batch SizeSamples per device for evaluation18<16
    Learning RateInitial learning rate for optimizer0.00011×10⁻⁵ to 2×10⁻⁵< 5×10⁻⁵
    Dataloader Num WorkersParallel data-loading threads per device14<10
  1. Checkpointing & Monitoring
    ParameterDescriptionDefaultRecommended ValuesPermissible Range
    Save StepsInterval (in steps) between saving model checkpoints.100100<= 100
    Save Total LimitMax number of checkpoints to keep locally.22-5<10
    Eval StepsInterval (in steps) between running evaluation loop.100100 100 - 200
    Logging StepsInterval (in steps) between logging metrics to the dashboard.55< 20
title

LoRA Adapter Configuration

ParameterDescriptionDefaultRecommended ValuePermissible Range
Rank (r)Dimensionality of the low-rank decomposition.161664
AlphaScaling factor for the adapter output.163264
DropoutDropout probability for adapter layers.0.10.11
TargetsWhich modules to apply adapters to (e.g., all-linear).all-linearall-linearNA
These settings control the LoRA injection into your base model. Higher rank increases capacity but uses more memory. title

Distributed Training Configuration

ParameterDescriptionDefaultRecommended ValueAvailable Options
TypeChoose your distributed backendDeepSpeedDeepSpeedDeepSpeed, DDP
StrategyOnly available for deepseedzero3_offloadzero3_offloadzero1,
zero2,
zero2_offload,
zero3,
zero3_offload
Set Type to DeepSpeed to enable ZeRO optimizations, or DDP for native PyTorch distributed training.
When using DeepSpeed, select the zero3_offload strategy to maximize memory savings by offloading optimizer states to CPU/GPU.
title

Launching Your Job

  1. Review all settings.
  2. Click Create Job.
  3. Monitor progress under My Trainings > Your Training Job > Metrics .
  4. Compile the model and deploy when training completes.