Sequence Classification Model Training

Sequence classification Language models are ideal for tasks like sentiment analysis, spam detection, intent classification, and other text categorization problems.

Prerequisites

Before starting, ensure you have:

Dataset formatted according to the Sequence Classification Training Requirements

Supported models

LLM/VLM Architectures

meta-llama/Llama-3.1-8B-Instruct
meta-llama/Llama-3.2-1B-Instruct
meta-llama/Llama-3.2-3B-Instruct
Qwen/Qwen2.5-3B-Instruct
Qwen/Qwen2.5-14B-Instruct

Creating a Training Job

To create a new training job, navigate to My Trainings > LLM/VLM Model > Add a Training Job

Configure Basic Settings

Provide the following details:

Experiment Name: Enter a descriptive name for your training experiment
Model Details:
- Base Model – Select the base model you want to fine-tune. Supported models (e.g., meta-llama/Llama-3.1-8B-Instruct) are available in the dropdown.
- Source Type – Automatically filled based on the selected model source (e.g., Hugging Face).

When the base model is selected, the rest of the parameters get updated automatically with recommended defaults for that model and training type.

Dataset Details

You can either create a new dataset or select an existing one.Create New Dataset

Source – Choose the dataset source (e.g., AWS S3, GCP).
Dataset Name – Provide a friendly name for your dataset.
Dataset Path – Specify the full path to your dataset (e.g., s3://bucket/file.jsonl).
Dataset Description – Optional field for describing your dataset.
Secret – If AWS/GCP source, select the credential secret required to access private buckets. Learn how to configure cloud credentials.
Region – If AWS/GCP source, choose the region where your bucket is located.
Dataset Type – Specify the data format, such as JSONL.

Select Existing DatasetYou can reuse a previously uploaded dataset instead of creating a new one.

In the Dataset Details section, select Use Existing Dataset.
A dropdown will appear listing all datasets available under your organization.
Choose the dataset you want to attach to this training job.
Once selected, key information such as Dataset Name, Source, Path, and Region will auto-populate based on the saved configuration.
Review the prefilled values to ensure the dataset is still valid and accessible.
After selection, proceed to configure Dataset Configuration parameters.

Dataset Configuration

Configure how your dataset will be processed and split for training:

Lazy Tokenize – Tokenizes text during training rather than upfront, reducing memory usage and initial load time.
System Prompt – Optional instruction prepended to each input sequence (e.g., “Classify the sentiment of the following text:”).
Prompt Template – Template for formatting inputs consistently (supports variables like {content}).
Split Type – Method for dividing data into train/validation sets. Currently supports random splitting.
Train Split Ratio – Proportion of data used for training (default: 0.9 or 90%).
Validation Split Ratio – Proportion reserved for validation to monitor overfitting (default: 0.1 or 10%).

Infrastructure Configuration

Select the compute resources for your training job:

Infrastructure Type – Choose where to run training:
- Simplismart Cloud – Fully managed infrastructure
- Bring Your Own Compute – Use your own cloud resources
- Imported Cluster – Use a pre-configured standalone cluster
GPU Type – Select GPU hardware based on your performance needs
Node Count – Number of machines to use
GPU Count per Node – GPUs per machine

Set Training Parameters

Configure your training parameters based on your use case. The configuration is organized into several sections:

Basic Training Configuration

Parameter	Description	Default Value
Training Type	Training methodology. Auto-selected as `SFT` (Supervised Fine-Tuning) for encoder models.	SFT
Torch Dtype	Numerical precision for model weights and activations. `bfloat16` or `float32`	bfloat16
Adapter Type	Parameter-efficient fine-tuning method. `LoRA` or `Full` (full finetuning)	LoRA

Tuner Configuration

Parameter	Description	Default Value
Tuner Backend	Framework for parameter-efficient fine-tuning. `PEFT` (Parameter-Efficient Fine-Tuning) is recommended.	PEFT
Task Type	Defines the model’s objective. For encoder training, use `Sequence Classification`.	Sequence Classification
Number of Labels	Total number of classes in your dataset (e.g., `2` for binary classification, `5` for 5-class).	Required

Extra Parameters for Tuner Backend

For the PEFT backend, these extra parameters needs to be configured, based on your usecase:Task Type (for PEFT Training)

Casual Language Modeling: For LLM/VLM finetuning jobs
Sequence Classification: For sequence classification training jobs.

Number of Labels

Total number of classes in your dataset (e.g., 2 for binary classification, 5 for 5-class).

Hyperparameters

Parameter	Description	Default Value
Num Epochs	Number of complete passes through the training dataset.	1
Train Batch Size	Number of samples processed together per GPU during training.	8
Eval Batch Size	Batch size during validation.	8
Save Steps	Checkpoint frequency. Model is saved every N training steps for recovery and evaluation.	100
Save Total Limit	Maximum checkpoints to keep. Older checkpoints are deleted to save storage.	2
Eval Steps	Validation frequency. Model performance is evaluated on validation set every N steps.	100
Logging Steps	How often metrics (loss, accuracy) are recorded to tracking systems	5
Learning Rate	Initial learning rate for optimizer	0.00001
Dataloader Num Workers	Parallel data-loading threads per device	1

Adapter Configuration

Configure fine-tuning parameters based on your selected Adapter Type (LoRA or Full). Different parameters apply depending on your choice. Learn more about adapter configuration.

Parameter	Description	Default Value	Applies To
Rank (r)	Adapter rank determines capacity. Higher rank = more expressive but slower. 16-64 works for most tasks.	16	LoRA only
Alpha	Scaling factor for adapter updates. Typically set equal to rank. Higher alpha = stronger influence.	16	LoRA only
Dropout	Regularization to prevent overfitting. Randomly drops adapter weights during training.	0.1	LoRA & Full
Targets	Which model layers to fine-tune. `all-linear` targets all linear/attention layers for maximum adaptation.	all-linear	LoRA & Full

Distributed Configuration

Configures multi-GPU or multi-node training for large-scale training.

Parameter	Description	Default Value
Type	Distributed training framework. `DeepSpeed` enables memory-efficient training across GPUs.	DeepSpeed
Strategy	Memory optimization strategy. `zero3_offload` splits model states across GPUs and CPU for large models.	zero3_offload

Create and Monitor Training

Review all settings carefully
Click Create Job to start training
Monitor training progress in the My Trainings > Your Training Job > Metrics tab.

Deployment and Inference

Once training completes successfully, you can compile and deploy your finetuned model for inference.

During model compilation, if your usecase is text-classification please add a field under extra_params adding "task": "text-classification".Sharing below, a sample Pipeline Configuration for your reference. By default, it takes fill-mask as a task.

{
  "mode": "chat",
  "type": "llm",
  "loras": [],
  "is_lora": true,
  "lora_repo": {
    "path": "",
    "type": "",
    "secret": {
      "type": ""
    },
    "ownership": ""
  },
  "extra_params": {
    "task": "text-classification"
  },
  "load_lora_dynamic": false,
  "enable_model_caching": true,
  "quantized_model_path": {
    "path": "",
    "type": "",
    "secret": {
      "type": ""
    },
    "ownership": ""
  }
}

For sequence classification models, once deployed, you can run inferences using the example given below. Your request payload and expected output will change according to your use-case.

Text Classification Example

import requests

url = "YOUR_MODEL_ENDPOINT"
data = {
    "text": "The capital of France is Paris"
}

headers = {
    "Authorization": "Bearer <api-key>"
}

response = requests.post(url, json=data, headers=headers)
print(response.json())

Expected Output:

[{'label': 'LABEL_1', 'score': 0.5462469458580017}]

Fill Mask Example

You can refer to the HuggingFace page of the respective model for more information about the<MASK>token.

import requests

url = "YOUR_MODEL_ENDPOINT"
data = {
    "text": "The capital of France is [MASK]."
}

headers = {
    "Authorization": "Bearer <api-key>"
}

response = requests.post(url, json=data, headers=headers)
print(response.json())

For RoBERTa base model replace [MASK] with <mask>

Expected Output

[
    {'score': 0.9036276936531067, 'token': 2201, 'token_str': ' Paris', 'sequence': 'The capital of France is Paris.'
    },
    {'score': 0.08029197156429291, 'token': 12790, 'token_str': ' Lyon', 'sequence': 'The capital of France is Lyon.'
    },
    {'score': 0.004803310614079237, 'token': 16911, 'token_str': ' Nice', 'sequence': 'The capital of France is Nice.'
    },
    {'score': 0.002099075587466359, 'token': 8239, 'token_str': ' Nancy', 'sequence': 'The capital of France is Nancy.'
    },
    {'score': 0.0011299046454951167, 'token': 35767, 'token_str': ' Napoleon', 'sequence': 'The capital of France is Napoleon.'
    }
]

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

Sequence Classification Model Training

Prerequisites

Supported models

LLM/VLM Architectures

Creating a Training Job

Basic Training Configuration

Tuner Configuration

Extra Parameters for Tuner Backend

Hyperparameters

Adapter Configuration

Distributed Configuration

Deployment and Inference

Text Classification Example

Fill Mask Example

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

​Prerequisites

​Supported models

​LLM/VLM Architectures

​Creating a Training Job

​Basic Training Configuration

​Tuner Configuration

​Extra Parameters for Tuner Backend

​Hyperparameters

​Adapter Configuration

​Distributed Configuration

​Deployment and Inference

​Text Classification Example

​Fill Mask Example

Prerequisites

Supported models

LLM/VLM Architectures

Creating a Training Job

Basic Training Configuration

Tuner Configuration

Extra Parameters for Tuner Backend

Hyperparameters

Adapter Configuration

Distributed Configuration

Deployment and Inference

Text Classification Example

Fill Mask Example