How to form JSON from CSV-CodePudding

I'm struggling to build a JSON from a CSV file. My CSV file looks as follows:

Shoot ID,Photo ID,Photo Name,Category,X1,Y1,X2,Y2
224,942,dsc_0001.jpg,0,3672,1271,3956,1417
224,942,dsc_0001.jpg,0,352,1401,497,1551
224,942,dsc_0001.jpg,0,181,1581,322,1690
224,943,dsc_0002.jpg,0,3073,1031,3351,1231
224,943,dsc_0002.jpg,0,3626,1811,3765,1901
224,943,dsc_0002.jpg,0,4784,1830,4900,1967
224,943,dsc_0002.jpg,0,1769,1714,1953,1872
224,943,dsc_0002.jpg,0,3173,1755,3305,1854
224,945,dsc_0004.jpg,0,1512,2012,1948,2304
224,945,dsc_0004.jpg,0,1488,1823,1766,2007
224,946,dsc_0005.jpg,0,3843,1812,4134,2029

I need to convert this to the JSON which would look like follows:

[{
        "image_id": 942,
        "file_name": "dsc_0001",
        "height": 2880,
        "width": 5760,
        "annotations": [{
                "bbox": [3672.0, 1271.0, 3956.0, 1417.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [352.0, 1401.0, 497.0, 1551.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [181.0, 1581.0, 322.0, 1690.0],
                "bbox_mode": 1,
                "category_id": 1
            }
        ]
    }, {
        "image_id": 943,
        "file_name": "dsc_0002",
        "height": 2880,
        "width": 5760,
        "annotations": [{
                "bbox": [3073.0, 1031.0, 3351.0, 1231.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [3626.0, 1811.0, 3765.0, 1901.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [4784.0,1830.0, 4900.0, 1967.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [1769.0, 1714.0, 1953.0, 1872.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [3173.0, 1755.0, 3305.0, 1854.0],
                "bbox_mode": 1,
                "category_id": 0
            }
        ]
    }, {
        "image_id": 945,
        "file_name": "dsc_0004",
        "height": 2880,
        "width": 5760,
        "annotations": [{
                "bbox": [1512.0, 2012.0, 1948.0, 2304.0],
                "bbox_mode": 1,
                "category_id": 0
            }, {
                "bbox": [1488.0, 1823.0, 1766.0, 2007.0],
                "bbox_mode": 1,
                "category_id": 0
            }
        ]
    }, {
        "image_id": 946,
        "file_name": "dsc_0005",
        "height": 2880,
        "width": 5760,
        "annotations": [{
                "bbox": [3843.0, 1812.0, 4134.0, 2029.0],
                "bbox_mode": 1,
                "category_id": 0
            }
        ]
    }
]

I have tried examples from the following post How to convert csv to json in python? However, I struggled to add several "bbox" elements to the annotations list for the same image. Can anyone help me with this?

CodePudding user response：

You could loop trough each photo and then annotation. The below solution assumes that a picture ID can only have one name and that height, width and bbox_mode are static (as they were not provided in the data sample). I put your data into a csv file called "photo_csv_to_dict.csv" and loaded this with pandas.

import pandas as pd 
import numpy as np
df = pd.read_csv('photo_csv_to_dict.csv')

# get id name set (this assumes photo ID could only have one unique name)
photo_id_name = df[['Photo ID', 'Photo Name']].drop_duplicates().reset_index(drop=True)

# convert numeric measures to a numpy matrix
measures = np.array(df[['Photo ID', 'Category', 'X1', 'Y1', 'X2', 'Y2']])

# intiate list for results
result_list = []
# loop through each photo
for i in range(len(photo_id_name)):
    # get photo ID from id name set
    photo_id = photo_id_name['Photo ID'][i]
    # get photo Name from id name set
    photo_name = photo_id_name['Photo Name'][i]
    # filter measures for specific ID name and only return those
    filtered_measures = measures[measures[:,0 ] == photo_id][:,1:]
    # intiate list for picture annotations
    annotations_list = []
    # loop through each row of measures
    for row in filtered_measures:
        # create a dictionary for an annotation
        annotations_dict = {"bbox": list(row[1:]),
                            "bbox_mode": 1,
                            "category_id": row[0]
                           }
        # add annotation dictionary to a list
        annotations_list.append(annotations_dict)
    
    # create a dictionary for a photo with added id, name a list of annotations
    result_dict = {"image_id": photo_id,
                  "file_name": photo_name,
                  "height": 2880,
                  "width": 5760,
                  "annotations": annotations_list}
    
    # add picture results to a list
    result_list.append(result_dict)