Home > Back-end >  How to form JSON from CSV
How to form JSON from CSV

Time:11-25

I'm struggling to build a JSON from a CSV file. My CSV file looks as follows:

Shoot ID,Photo ID,Photo Name,Category,X1,Y1,X2,Y2
224,942,dsc_0001.jpg,0,3672,1271,3956,1417
224,942,dsc_0001.jpg,0,352,1401,497,1551
224,942,dsc_0001.jpg,0,181,1581,322,1690
224,943,dsc_0002.jpg,0,3073,1031,3351,1231
224,943,dsc_0002.jpg,0,3626,1811,3765,1901
224,943,dsc_0002.jpg,0,4784,1830,4900,1967
224,943,dsc_0002.jpg,0,1769,1714,1953,1872
224,943,dsc_0002.jpg,0,3173,1755,3305,1854
224,945,dsc_0004.jpg,0,1512,2012,1948,2304
224,945,dsc_0004.jpg,0,1488,1823,1766,2007
224,946,dsc_0005.jpg,0,3843,1812,4134,2029

I need to convert this to the JSON which would look like follows:

[{
        "image_id": 942,
        "file_name": "dsc_0001",
        "height": 2880,
        "width": 5760,
        "annotations": [{
                "bbox": [3672.0, 1271.0, 3956.0, 1417.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [352.0, 1401.0, 497.0, 1551.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [181.0, 1581.0, 322.0, 1690.0],
                "bbox_mode": 1,
                "category_id": 1
            }
        ]
    }, {
        "image_id": 943,
        "file_name": "dsc_0002",
        "height": 2880,
        "width": 5760,
        "annotations": [{
                "bbox": [3073.0, 1031.0, 3351.0, 1231.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [3626.0, 1811.0, 3765.0, 1901.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [4784.0,1830.0, 4900.0, 1967.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [1769.0, 1714.0, 1953.0, 1872.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [3173.0, 1755.0, 3305.0, 1854.0],
                "bbox_mode": 1,
                "category_id": 0
            }
        ]
    }, {
        "image_id": 945,
        "file_name": "dsc_0004",
        "height": 2880,
        "width": 5760,
        "annotations": [{
                "bbox": [1512.0, 2012.0, 1948.0, 2304.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [1488.0, 1823.0, 1766.0, 2007.0],
                "bbox_mode": 1,
                "category_id": 0
            }
        ]
    }, {
        "image_id": 946,
        "file_name": "dsc_0005",
        "height": 2880,
        "width": 5760,
        "annotations": [{
                "bbox": [3843.0, 1812.0, 4134.0, 2029.0],
                "bbox_mode": 1,
                "category_id": 0
            }
        ]
    }
]

I have tried examples from the following post How to convert csv to json in python? However, I struggled to add several "bbox" elements to the annotations list for the same image. Can anyone help me with this?

CodePudding user response:

You could loop trough each photo and then annotation. The below solution assumes that a picture ID can only have one name and that height, width and bbox_mode are static (as they were not provided in the data sample). I put your data into a csv file called "photo_csv_to_dict.csv" and loaded this with pandas.

import pandas as pd 
import numpy as np
df = pd.read_csv('photo_csv_to_dict.csv')

# get id name set (this assumes photo ID could only have one unique name)
photo_id_name = df[['Photo ID', 'Photo Name']].drop_duplicates().reset_index(drop=True)

# convert numeric measures to a numpy matrix
measures = np.array(df[['Photo ID', 'Category', 'X1', 'Y1', 'X2', 'Y2']])

# intiate list for results
result_list = []
# loop through each photo
for i in range(len(photo_id_name)):
    # get photo ID from id name set
    photo_id = photo_id_name['Photo ID'][i]
    # get photo Name from id name set
    photo_name = photo_id_name['Photo Name'][i]
    # filter measures for specific ID name and only return those
    filtered_measures = measures[measures[:,0 ] == photo_id][:,1:]
    # intiate list for picture annotations
    annotations_list = []
    # loop through each row of measures
    for row in filtered_measures:
        # create a dictionary for an annotation
        annotations_dict = {"bbox": list(row[1:]),
                            "bbox_mode": 1,
                            "category_id": row[0]
                           }
        # add annotation dictionary to a list
        annotations_list.append(annotations_dict)
    
    # create a dictionary for a photo with added id, name a list of annotations
    result_dict = {"image_id": photo_id,
                  "file_name": photo_name,
                  "height": 2880,
                  "width": 5760,
                  "annotations": annotations_list}
    
    # add picture results to a list
    result_list.append(result_dict)
  • Related