VLMs
Dataset Preparation
Dataset formatting guide for image-text model training with JSON examples.
Required format for Dataset
- image : Points to the location of the image (URL) that the model will access during training.
- conversations: Contains the interaction where the “human” requests a caption and the “gpt” provides the caption or description.
The <image>
tag in the conversation will be replaced by the actual image during training, allowing the model to generate the caption based on the image.