Dataset formatting guide for image-text model training with JSON examples.
[
{
"from": "human",
"value": "<image>\n your prompt here"
},
{
"from": "gpt",
"value": "the ground truth here"
}
]
<image>
tag in the conversation will be replaced by the actual image during training, allowing the model to generate the caption based on the image.
{
"id": "allava_laion_cap_124558",
"image": "allava_laion/images/124558.jpeg",
"conversations": [
{
"from": "human",
"value": "<image>\nCould you please express the image in a descriptive format?"
},
{
"from": "gpt",
"value": "The image displays a colorful promotional graphic or poster with a photograph of a seafood meal prominently featured. The food consists of crab legs with bright red and white shells, served with slices of lemon, arranged on a paper-lined tray. There are condiment cups and a bread roll also visible on the tray, suggesting a ready-to-eat meal. The background behind the food is not clear but seems to be a casual outdoor setting, possibly a dining area near water, as suggested by the reflections and light quality.\n\nThe top and bottom portions of the image contain text blocks overlaying a solid color background. The top section has a deep red banner with stylized white script text reading \"Martie On the Road.\" The bottom section has a mustard yellow background with three lines of text in a plain, bold, sans-serif font. These lines read \"Where to GO,\" \"Where to EAT,\" and \"Where to STAY,\" implying advice or recommendations for travelers. The overall design suggests that this is a cover or advertisement for travel and dining guidance, possibly for a blog or publication."
}
],
"url": "https://images.squarespace-cdn.com/content/5361bc9fe4b00f49de2aa41e/1420519215970-TQR3CEI7YIYYLTXCBSEL/where+to+eat+food+network.jpg?content-type=image%2Fjpeg",
"caption": "The image displays a colorful promotional graphic or poster with a photograph of a seafood meal prominently featured. The food consists of crab legs with bright red and white shells, served with slices of lemon, arranged on a paper-lined tray. There are condiment cups and a bread roll also visible on the tray, suggesting a ready-to-eat meal. The background behind the food is not clear but seems to be a casual outdoor setting, possibly a dining area near water, as suggested by the reflections and light quality.\n\nThe top and bottom portions of the image contain text blocks overlaying a solid color background. The top section has a deep red banner with stylized white script text reading \"Martie On the Road.\" The bottom section has a mustard yellow background with three lines of text in a plain, bold, sans-serif font. These lines read \"Where to GO,\" \"Where to EAT,\" and \"Where to STAY,\" implying advice or recommendations for travelers. The overall design suggests that this is a cover or advertisement for travel and dining guidance, possibly for a blog or publication.",
"llava-1.5-7b-PPL": 6.8742
}