DPO (LLM)

OpenAI DPO Format (for preference training)
Example JSONL File
Message fields

OpenAI DPO Format (for preference training)

{
  "messages": [
    {"role": "system", "content": "You are a useful and harmless assistant"},
    {"role": "user", "content": "Tell me tomorrow's weather"},
    {"role": "assistant", "content": "Tomorrow's weather will be sunny"}
  ],
  "rejected_response": "I don't know"
}

Example JSONL File

{"messages": [{"role": "...", "content": "..."},], "rejected_response": "..."}
{"messages": [{"role": "...", "content": "..."},], "rejected_response": "..."}
{"messages": [{"role": "...", "content": "..."},], "rejected_response": "..."}

Message fields

messages: A list of role-based conversational turns that provide context for the model.
content: The text of each message exchanged between roles in the conversation.
rejected_response: The assistant’s response that is considered less helpful or incorrect for training purposes.

Sequence Classification Initiate a New Training Job

⌘I

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

OpenAI DPO Format (for preference training)

Example JSONL File

Message fields

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

​OpenAI DPO Format (for preference training)

​Example JSONL File

​Message fields

OpenAI DPO Format (for preference training)

Example JSONL File

Message fields