OpenAI DPO Format (for preference training)
Example JSONL File
Message fields
messages:A list of role-based conversational turns that provide context for the model.content:The text of each message exchanged between roles in the conversation.rejected_response:The assistant’s response that is considered less helpful or incorrect for training purposes.