Nested Dictionary from CSV-CodePudding

I have a CSV that is formatted this way, notice there are multiple names of the same image.:

image	id	name	xMin	xMax	yMin	yMax
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp	1	Scratch	604	893	230	413
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp	2	Dent	921	1146	720	857
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp	1	Dent	343	2323	334	343

I'm trying to write a function to read this CSV into a nested dictionary that uses the column names as keys. If there are multiple names with same image, it will create a nested dictionary What I have so far is this:

import csv
import itertools
import operator
import json
with open('out1.csv', 'r') as fp:
    reader = csv.DictReader(fp, dialect='excel', skipinitialspace=True)
    new_dict = {}
    for group, records in itertools.groupby(reader, key=operator.itemgetter('image')):
        new_dict[group] = list(records)
json_object = json.dumps(new_dict, indent = 4)
print(json_object)

What I am getting is as follows: `

{
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp": [
    {
        "image": "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp",
        "id": "1",
        "name": "Scratch",
        "xMin": "604",
        "xMax": "893",
        "yMin": "230",
        "yMax": "413"
    },
    {
        "image": "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp",
        "id": "2",
        "name": "Dent",
        "xMin": "921",
        "xMax": "1146",
        "yMin": "720",
        "yMax": "857"
    }
],
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp": [
    {
        "image": "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp",
        "id": "1",
        "name": "Dent",
        "xMin": "343",
        "xMax": "2323",
        "yMin": "334",
        "yMax": "343"
    }
]

}

and the output should be like this for the same image it should create nested dictionary:

{
  "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp": [
    {
      "id": 1,
      "name": "Scratch",
      "xMin": 604,
      "xMax": 893,
      "yMin": 230,
      "yMax": 413
    },
    {
      "id": 2,
      "name": "Dent",
      "xMin": 921,
      "xMax": 1146,
      "yMin": 720,
      "yMax": 857
    },
  ],
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp": [
    {
      "id": 1,
      "name": "Dent",
      "xMin": 343,
      "xMax": 2323,
      "yMin": 334,
      "yMax": 343
    }],
}

CodePudding user response：

Considering your case, I think this is what you are expecting , the below code which I provided will give the exact output what you are looking for.

import csv
file = open('out1.csv')
csvreader = csv.reader(file)
header = []
header = next(csvreader)
dic = {}
for row in csvreader:
    if row[0] not in dic:
        dic[row[0]] = []
    dic[row[0]].append({header[i] :row[i]  for i in range(1, len(row))})
print(dic)

Hope this solution helps. if not please feel free to comment. Thanks

PS: you can add int() for the numbers.

Edit: made the uniq check for image id

CodePudding user response：

Mods to your code (assuming input file is tab delimited Excel)

Code

import csv
import itertools
import operator
import json

def removekey(d, key):
    ' Returns dictionary with entry key removed '
    r = dict(d)
    del r[key]
    return r

with open('out1.csv', 'r') as fp:
    reader = csv.DictReader(fp, dialect='excel', skipinitialspace=True)
    #headers = next(reader)         # Edit -- handled by reader for dict
    new_dict = {}
    for group, records in itertools.groupby(reader, key=operator.itemgetter('image')):
        # Remove image key from each dictionary in records
        new_dict[group] = [removekey(d, 'image') for d in records]
json_object = json.dumps(new_dict, indent = 4)
print(json_object)

Output

{
    "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp": [
        {
            "id": "1",
            "name": "Scratch",
            "xMin": "604",
            "xMax": "893",
            "yMin": "230",
            "yMax": "413"
        },
        {
            "id": "2",
            "name": "Dent",
            "xMin": "921",
            "xMax": "1146",
            "yMin": "720",
            "yMax": "857"
        }
    ],
    "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp": [
        {
            "id": "1",
            "name": "Dent",
            "xMin": "343",
            "xMax": "2323",
            "yMin": "334",
            "yMax": "343"
        }
    ]
}

Note: Input File: (comma delimited values)

image,id,name,xMin,xMax,yMin,yMax
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp,1,Scratch,604,893,230,413
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp,2,Dent,921,1146,720,857
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp,1,Dent,343,2323,334,343