Dataset Format
Choose the file type for your dataset. Currently supported types are:jsonl
(JSON Lines)zip
The directory should be archived in a.zip
file and stored in an object storage.
Example zip command:cd path/to/dataset_dir && zip -r dataset_dir.zip ./*
.jsonl
file should represent a complete training example. The supported format styles are:
ShareGPT Format
Example JSONL File
Message fields
system
: The initial system instruction that sets the behavior or tone for the assistant.conversation:
A list of human-assistant message pairs forming the dialogue history.human
: A user query or input in the conversation.assistant:
The assistant’s response to the corresponding human input.
OpenAI SFT Format
Example JSONL File
Message fields
messages:
A sequential list of role-based messages representing a full conversation.role:
The identity of the message sender (e.g., system, user, assistant).content:
The actual text of the message corresponding to the role.