OpenAI DPO Format (for preference training)
Example JSONL File
Message fields
messages:
A list of role-based conversational turns that provide context for the model.content:
The text of each message exchanged between roles in the conversation.rejected_response:
The assistant’s response that is considered less helpful or incorrect for training purposes.