Dataset Format
Choose the file type for your dataset. Currently supported types are:jsonl(JSON Lines)zipThe directory should be archived in a.zipfile and stored in an object storage.
Example zip command:cd path/to/dataset_dir && zip -r dataset_dir.zip ./*
.jsonlfile should represent a complete training example. The supported format styles are:
ShareGPT Format
Example JSONL File
Message fields
system: The initial system instruction that sets the behavior or tone for the assistant.conversation:A list of human-assistant message pairs forming the dialogue history.human: A user query or input in the conversation.assistant:The assistant’s response to the corresponding human input.
OpenAI SFT Format
Example JSONL File
Message fields
messages:A sequential list of role-based messages representing a full conversation.role:The identity of the message sender (e.g., system, user, assistant).content:The actual text of the message corresponding to the role.