Required format for Dataset

[
{
  "from": "human",
  "value": "<image>\n your prompt here"
},
{
  "from": "gpt",
  "value": "the ground truth here"
}
]
  • image : Points to the location of the image (URL) that the model will access during training.
  • conversations: Contains the interaction where the “human” requests a caption and the “gpt” provides the caption or description.

The <image> tag in the conversation will be replaced by the actual image during training, allowing the model to generate the caption based on the image.


Sample Dataset containing a single reference datapoint

{
  "id": "allava_laion_cap_124558",
  "image": "allava_laion/images/124558.jpeg",
  "conversations": [
    {
      "from": "human",
      "value": "<image>\nCould you please express the image in a descriptive format?"
    },
    {
      "from": "gpt",
      "value": "The image displays a colorful promotional graphic or poster with a photograph of a seafood meal prominently featured. The food consists of crab legs with bright red and white shells, served with slices of lemon, arranged on a paper-lined tray. There are condiment cups and a bread roll also visible on the tray, suggesting a ready-to-eat meal. The background behind the food is not clear but seems to be a casual outdoor setting, possibly a dining area near water, as suggested by the reflections and light quality.\n\nThe top and bottom portions of the image contain text blocks overlaying a solid color background. The top section has a deep red banner with stylized white script text reading \"Martie On the Road.\" The bottom section has a mustard yellow background with three lines of text in a plain, bold, sans-serif font. These lines read \"Where to GO,\" \"Where to EAT,\" and \"Where to STAY,\" implying advice or recommendations for travelers. The overall design suggests that this is a cover or advertisement for travel and dining guidance, possibly for a blog or publication."
    }
  ],
  "url": "https://images.squarespace-cdn.com/content/5361bc9fe4b00f49de2aa41e/1420519215970-TQR3CEI7YIYYLTXCBSEL/where+to+eat+food+network.jpg?content-type=image%2Fjpeg",
  "caption": "The image displays a colorful promotional graphic or poster with a photograph of a seafood meal prominently featured. The food consists of crab legs with bright red and white shells, served with slices of lemon, arranged on a paper-lined tray. There are condiment cups and a bread roll also visible on the tray, suggesting a ready-to-eat meal. The background behind the food is not clear but seems to be a casual outdoor setting, possibly a dining area near water, as suggested by the reflections and light quality.\n\nThe top and bottom portions of the image contain text blocks overlaying a solid color background. The top section has a deep red banner with stylized white script text reading \"Martie On the Road.\" The bottom section has a mustard yellow background with three lines of text in a plain, bold, sans-serif font. These lines read \"Where to GO,\" \"Where to EAT,\" and \"Where to STAY,\" implying advice or recommendations for travelers. The overall design suggests that this is a cover or advertisement for travel and dining guidance, possibly for a blog or publication.",
  "llava-1.5-7b-PPL": 6.8742
}