[Question] Clarification on JSON Format for Object Detection Task #1512

makemecker · 2024-05-20T07:10:11Z

Question

Hi,

Thank you for the comprehensive guidelines on fine-tuning LLaVA on custom datasets! The provided example for generating tag-style captions for Stable Diffusion has been very helpful.

I am currently working on fine-tuning LLaVA for object detection tasks and have created the following JSON format:

"image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
"conversations": [
  {
    "from": "human",
    "value": "Find objects of the classes vehicle, human, swimming pool, landfill, building, tree, satellite antenna and street light in the image. Provide the coordinates of the bounding boxes in the format class: [bounding box coordinates]."
  },
  {
    "from": "gpt",
    "value": "vehicle: [0.85, 0.139, 0.024, 0.049], vehicle: [0.603, 0.158, 0.029, 0.055], human: [0.945, 0.018, 0.014, 0.036], human: [0.83, 0.028, 0.017, 0.041], human: [0.85, 0.123, 0.024, 0.067]."
  }
]

Could you please confirm if this JSON structure is correct for fine-tuning LLaVA on object detection tasks? Specifically, I would like to know:

Is the structure of the JSON file appropriate for object detection?
Are the metadata fields correctly defined?
Is the format for bounding box coordinates accurate?

Any additional insights or corrections would be greatly appreciated.

Thank you for your assistance!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Clarification on JSON Format for Object Detection Task #1512

[Question] Clarification on JSON Format for Object Detection Task #1512

makemecker commented May 20, 2024

[Question] Clarification on JSON Format for Object Detection Task #1512

[Question] Clarification on JSON Format for Object Detection Task #1512

Comments

makemecker commented May 20, 2024

Question