OpenAI DPO Format (for preference training)

{
  "messages": [
    {"role": "system", "content": "You are a useful and harmless assistant"},
    {"role": "user", "content": "Tell me tomorrow's weather"},
    {"role": "assistant", "content": "Tomorrow's weather will be sunny"}
  ],
  "rejected_response": "I don't know"
}

Example JSONL File

{"messages": [{"role": "...", "content": "..."},], "rejected_response": "..."}
{"messages": [{"role": "...", "content": "..."},], "rejected_response": "..."}
{"messages": [{"role": "...", "content": "..."},], "rejected_response": "..."}

Message fields

  • messages: A list of role-based conversational turns that provide context for the model.
  • content: The text of each message exchanged between roles in the conversation.
  • rejected_response: The assistant’s response that is considered less helpful or incorrect for training purposes.