> ## Documentation Index
> Fetch the complete documentation index at: https://docs.simplismart.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# SFT (VLM)

# Dataset Preparation

## Overview

This dataset format is designed to represent multi-turn conversations between a **user** and an **assistant**, with optional associated images.\
\
It is particularly suited for fine-tuning dialogue or multimodal models that require both text and image context.

## Directory Structure

```json theme={null}
.
└── dataset_dir/
    ├── images/
    │   ├── /absolute/path/to/img_1.png
    │   ├── ...
    │   └── /absolute/path/to/img_i.png
    └── dataset.jsonl
```

* **images/** – Directory containing image files. Paths must be absolute.
* **dataset.jsonl** – Main dataset file in JSONL format.

## Dataset Structure

Each dataset entry is stored as a single JSON object.

```json theme={null}
{
  "messages": [
    {"role": "user", "content": "...."},
    {"role": "assistant", "content": "...."}
  ],
  "images": ["/absolute/path/to/image1.png", "/absolute/path/to/image2.png"]
}
```

* Multiple turns can be added by extending the `messages` list.
* Multiple images can be included by extending the `images` list.
* `images` must always be a **list** (even if only one image is present).

## Field Definitions

| **Key**    | **Type**     | **Description**                                           |
| ---------- | ------------ | --------------------------------------------------------- |
| `messages` | `List[Dict]` | Ordered dialogue turns between user and assistant.        |
| `role`     | `str`        | `"user"` or `"assistant"` — identifies the speaker.       |
| `content`  | `str`        | The text content of the message.                          |
| `images`   | `List[str]`  | List of **absolute paths** to the associated image files. |

## Example Entry

```json theme={null}
{
  "messages": [
    {"role": "user", "content": "<image>Which of these states is farthest north?"},
    {"role": "assistant", "content": "The answer is: A. West Virginia"},
    {"role": "user", "content": "<image>Which of these states is farthest south?"},
    {"role": "assistant", "content": "The answer is: C. Texas"}
  ],
  "images": ["/home/datasets/images/000000.png","/home/datasets/images/000001.png"]
}
```

## Usage Notes

* Conversations can contain **any number of turns**.
* The `images` field can contain multiple entries or be empty (`[]`).
* Always use **absolute paths** for image references to avoid path resolution errors.
* Store images in a consistent directory (e.g., `/home/datasets/images/`).
