Initiate a New Training Job

  • Navigate to the My Trainings section in the platform

Click on Add a Training Job to create a new job.


Basic Details and Model Selection

  • Provide a name for your experiment.
  • Enter huggingface model path of the base model you wish to fine-tune.

Click here to explore the list of models supported for training.

Upload Your Dataset

Upload your training dataset. The maximum file size is 100MB, and the supported file format is CSV.

Download a sample csv here.

Select Training Parameters

Update the training parameters based on your requirements for the training job.


Here is a short explanation of these parameters

  • Gradient Accumulation Steps: Number of steps to accumulate gradients before updating model weights.

  • Learning Rate: Controls how much the model adjusts its weights during training.

  • Batch Size: Number of samples processed together in one training iteration.

  • Trainer Epochs: Total number of times the model trains on the complete dataset.

  • Adapter Alpha: The weight changes applied to the original model weights are scaled by a factor, determined as alpha divided by the rank to balance adaptation and original model knowledge.

  • Adapter R: The integer rank of the update matrices. A lower rank creates smaller update matrices, reducing the number of trainable parameters.

    It is recommended that Alpha be set to twice the value of Rank.

  • Adapter Dropout: The dropout probability for the LoRA layers, used to prevent overfitting.

    A value of 0.1 (10%) indicates a 10% chance for each neuron to be dropped during training.


Advanced Configurations

Update Training Configuration

The training configuration provides a flexible way to define the inputs, outputs, and other advanced settings for your model. Below is a sample configuration:

{
  "input_features": [
    {
      "name": "<input column name>",
      "type": "text"
    }
  ],
  "output_features": [
    {
      "name": "<output column name>",
      "type": "text"
    }
  ],
  "quantization": {
    "bits": 4,
    "llm_int8_threshold": 6,
    "llm_int8_has_fp16_weight": false,
    "bnb_4bit_compute_dtype": "float16",
    "bnb_4bit_use_double_quant": true,
    "bnb_4bit_quant_type": "nf4"
  },
  "trainer": {
    "learning_rate_scheduler": {
      "warmup_fraction": 0.01,
      "decay": "linear"
    }
  }
}

Key Components

  • Input Features: Attributes from your dataset used as input for the model.
  • Output Features: The predictions or results generated by the model.
  • Quantization: Reduces precision to optimize memory usage and inference speed.
  • Trainer: Configures training behavior, such as the learning rate scheduler, decay strategies.

Configuring Input Features

Input features define how your dataset’s content is presented to the model. There are two methods to define input features:

Single Column Input

  • Use this if the model input directly corresponds to a single column in your dataset.
  • Example:
"input_features": [
  {
    "name": "<input column name>",
    "type": "text"
  }
]

Prompt Template Input

  • Use this if the input requires additional static text or needs formatting.
  • Define a prompt as a top-level key with a template specifying how multiple columns are combined.
  • Example (from the alpaca dataset linked in the sample CSV above):
"input_features": [
  {
    "name": "prompt",
    "type": "text"
  }
],
"prompt": {
  "template": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
}

Configuring Output Features

Output features represent the model’s predictions. Typically, this is a single column in your dataset that the model aims to generate or predict.

Example:

"output_features": [
  {
    "name": "<output column name>",
    "type": "text"
  }
]

Start and Monitor the Training Job

Once the configuration is updated, start the training job and monitor its progress in the Recent Jobs section of the UI. Keep track of metrics, logs, and any intermediate results to ensure the training meets your requirements.

The beta version of the training suite allows only one training job to run at a time. Additional training jobs will be queued and automatically begin once the current job is finished.

After the model is trained, you can also deploy the LoRA via Simplismart. Click here to know how.