messages– Array containing the input text for classification. Each message has aroleandcontent.role– Indicates the message source. Use"user"for encoder model training.content– The text sequence to be classified (e.g., product review, customer query, email content).label– Integer representing the target class (e.g.,0for negative sentiment,1for positive sentiment).
Example JSONL File
Here’s a complete example for sentiment analysis (binary classification with 2 labels):Label Guidelines:
- Labels must be integers starting from
0 - For binary classification: use
0and1 - For multi-class: use
0,1,2, etc. (e.g., 5 classes = labels 0-4) - Ensure all labels in your dataset are represented in the training data