Skip to main content
Your dataset must be in JSONL format, where each line contains the input text and its corresponding label for text classification. Example JSONL Entry:
{
  "messages": [
    {
      "role": "user",
      "content": "The weather is really nice today"
    }
  ],
  "label": 1
}
{
  "messages": [
    {
      "role": "user",
      "content": "Today is really unlucky"
    }
  ],
  "label": 0
}
Field Descriptions:
  • messages – Array containing the input text for classification. Each message has a role and content.
  • role – Indicates the message source. Use "user" for encoder model training.
  • content – The text sequence to be classified (e.g., product review, customer query, email content).
  • label – Integer representing the target class (e.g., 0 for negative sentiment, 1 for positive sentiment).

Example JSONL File

Here’s a complete example for sentiment analysis (binary classification with 2 labels):
{"messages": [{"role": "user", "content": "The weather is really nice today"}], "label": 1}
{"messages": [{"role": "user", "content": "Today is really unlucky"}], "label": 0}
{"messages": [{"role": "user", "content": "So happy"}], "label": 1}
{"messages": [{"role": "user", "content": "This product is terrible"}], "label": 0}
{"messages": [{"role": "user", "content": "Excellent service and quality"}], "label": 1}
Label Guidelines:
  • Labels must be integers starting from 0
  • For binary classification: use 0 and 1
  • For multi-class: use 0, 1, 2, etc. (e.g., 5 classes = labels 0-4)
  • Ensure all labels in your dataset are represented in the training data