Home > OS >  Nested Dictionary from CSV
Nested Dictionary from CSV

Time:07-16

I have a CSV that is formatted this way, notice there are multiple names of the same image.:

image id name xMin xMax yMin yMax
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp 1 Scratch 604 893 230 413
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp 2 Dent 921 1146 720 857
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp 1 Dent 343 2323 334 343

I'm trying to write a function to read this CSV into a nested dictionary that uses the column names as keys. If there are multiple names with same image, it will create a nested dictionary What I have so far is this:

import csv
import itertools
import operator
import json
with open('out1.csv', 'r') as fp:
    reader = csv.DictReader(fp, dialect='excel', skipinitialspace=True)
    new_dict = {}
    for group, records in itertools.groupby(reader, key=operator.itemgetter('image')):
        new_dict[group] = list(records)
json_object = json.dumps(new_dict, indent = 4)
print(json_object)

What I am getting is as follows: `

{
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp": [
    {
        "image": "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp",
        "id": "1",
        "name": "Scratch",
        "xMin": "604",
        "xMax": "893",
        "yMin": "230",
        "yMax": "413"
    },
    {
        "image": "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp",
        "id": "2",
        "name": "Dent",
        "xMin": "921",
        "xMax": "1146",
        "yMin": "720",
        "yMax": "857"
    }
],
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp": [
    {
        "image": "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp",
        "id": "1",
        "name": "Dent",
        "xMin": "343",
        "xMax": "2323",
        "yMin": "334",
        "yMax": "343"
    }
]

}

`

and the output should be like this for the same image it should create nested dictionary:

`

{
  "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp": [
    {
      "id": 1,
      "name": "Scratch",
      "xMin": 604,
      "xMax": 893,
      "yMin": 230,
      "yMax": 413
    },
    {
      "id": 2,
      "name": "Dent",
      "xMin": 921,
      "xMax": 1146,
      "yMin": 720,
      "yMax": 857
    },
  ],
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp": [
    {
      "id": 1,
      "name": "Dent",
      "xMin": 343,
      "xMax": 2323,
      "yMin": 334,
      "yMax": 343
    }],
}

`

CodePudding user response:

Considering your case, I think this is what you are expecting , the below code which I provided will give the exact output what you are looking for.

import csv
file = open('out1.csv')
csvreader = csv.reader(file)
header = []
header = next(csvreader)
dic = {}
for row in csvreader:
    if row[0] not in dic:
        dic[row[0]] = []
    dic[row[0]].append({header[i] :row[i]  for i in range(1, len(row))})
print(dic)

Hope this solution helps. if not please feel free to comment. Thanks

PS: you can add int() for the numbers.

Edit: made the uniq check for image id

CodePudding user response:

Mods to your code (assuming input file is tab delimited Excel)

Code

import csv
import itertools
import operator
import json

def removekey(d, key):
    ' Returns dictionary with entry key removed '
    r = dict(d)
    del r[key]
    return r

with open('out1.csv', 'r') as fp:
    reader = csv.DictReader(fp, dialect='excel', skipinitialspace=True)
    #headers = next(reader)         # Edit -- handled by reader for dict
    new_dict = {}
    for group, records in itertools.groupby(reader, key=operator.itemgetter('image')):
        # Remove image key from each dictionary in records
        new_dict[group] = [removekey(d, 'image') for d in records]
json_object = json.dumps(new_dict, indent = 4)
print(json_object)

Output

{
    "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp": [
        {
            "id": "1",
            "name": "Scratch",
            "xMin": "604",
            "xMax": "893",
            "yMin": "230",
            "yMax": "413"
        },
        {
            "id": "2",
            "name": "Dent",
            "xMin": "921",
            "xMax": "1146",
            "yMin": "720",
            "yMax": "857"
        }
    ],
    "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp": [
        {
            "id": "1",
            "name": "Dent",
            "xMin": "343",
            "xMax": "2323",
            "yMin": "334",
            "yMax": "343"
        }
    ]
}

Note: Input File: (comma delimited values)

image,id,name,xMin,xMax,yMin,yMax
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp,1,Scratch,604,893,230,413
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp,2,Dent,921,1146,720,857
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp,1,Dent,343,2323,334,343
  • Related