Dataset Preparation

Overview

This dataset format is designed to represent multi-turn conversations between a user and an assistant, with optional associated images.

It is particularly suited for fine-tuning dialogue or multimodal models that require both text and image context.

Directory Structure

.
└── dataset_dir/
    ├── images/
    │   ├── /absolute/path/to/img_1.png
    │   ├── ...
    │   └── /absolute/path/to/img_i.png
    └── dataset.jsonl
  • images/ – Directory containing image files. Paths must be absolute.
  • dataset.jsonl – Main dataset file in JSONL format.

Dataset Structure

Each dataset entry is stored as a single JSON object.
{
  "messages": [
    {"role": "user", "content": "...."},
    {"role": "assistant", "content": "...."}
  ],
  "images": ["/absolute/path/to/image1.png", "/absolute/path/to/image2.png"]
}
  • Multiple turns can be added by extending the messages list.
  • Multiple images can be included by extending the images list.
  • images must always be a list (even if only one image is present).

Field Definitions

KeyTypeDescription
messagesList[Dict]Ordered dialogue turns between user and assistant.
rolestr"user" or "assistant" — identifies the speaker.
contentstrThe text content of the message.
imagesList[str]List of absolute paths to the associated image files.

Example Entry

{
  "messages": [
    {"role": "user", "content": "<image>Which of these states is farthest north?"},
    {"role": "assistant", "content": "The answer is: A. West Virginia"},
    {"role": "user", "content": "<image>Which of these states is farthest south?"},
    {"role": "assistant", "content": "The answer is: C. Texas"}
  ],
  "images": ["/home/datasets/images/000000.png","/home/datasets/images/000001.png"]
}

Usage Notes

  • Conversations can contain any number of turns.
  • The images field can contain multiple entries or be empty ([]).
  • Always use absolute paths for image references to avoid path resolution errors.
  • Store images in a consistent directory (e.g., /home/datasets/images/).