Specifies how to structure conversational data for DPO-based fine-tuning.
messages:
A list of role-based conversational turns that provide context for the model.content:
The text of each message exchanged between roles in the conversation.rejected_response:
The assistant’s response that is considered less helpful or incorrect for training purposes.