I have a CSV that is formatted this way, notice there are multiple names of the same image.:
image | id | name | xMin | xMax | yMin | yMax |
---|---|---|---|---|---|---|
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp | 1 | Scratch | 604 | 893 | 230 | 413 |
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp | 2 | Dent | 921 | 1146 | 720 | 857 |
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp | 1 | Dent | 343 | 2323 | 334 | 343 |
I'm trying to write a function to read this CSV into a nested dictionary that uses the column names as keys. If there are multiple names with same image, it will create a nested dictionary What I have so far is this:
import csv
import itertools
import operator
import json
with open('out1.csv', 'r') as fp:
reader = csv.DictReader(fp, dialect='excel', skipinitialspace=True)
new_dict = {}
for group, records in itertools.groupby(reader, key=operator.itemgetter('image')):
new_dict[group] = list(records)
json_object = json.dumps(new_dict, indent = 4)
print(json_object)
What I am getting is as follows: `
{
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp": [
{
"image": "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp",
"id": "1",
"name": "Scratch",
"xMin": "604",
"xMax": "893",
"yMin": "230",
"yMax": "413"
},
{
"image": "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp",
"id": "2",
"name": "Dent",
"xMin": "921",
"xMax": "1146",
"yMin": "720",
"yMax": "857"
}
],
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp": [
{
"image": "858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp",
"id": "1",
"name": "Dent",
"xMin": "343",
"xMax": "2323",
"yMin": "334",
"yMax": "343"
}
]
}
`
and the output should be like this for the same image it should create nested dictionary:
`
{
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp": [
{
"id": 1,
"name": "Scratch",
"xMin": 604,
"xMax": 893,
"yMin": 230,
"yMax": 413
},
{
"id": 2,
"name": "Dent",
"xMin": 921,
"xMax": 1146,
"yMin": 720,
"yMax": 857
},
],
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp": [
{
"id": 1,
"name": "Dent",
"xMin": 343,
"xMax": 2323,
"yMin": 334,
"yMax": 343
}],
}
`
CodePudding user response:
Considering your case, I think this is what you are expecting , the below code which I provided will give the exact output what you are looking for.
import csv
file = open('out1.csv')
csvreader = csv.reader(file)
header = []
header = next(csvreader)
dic = {}
for row in csvreader:
if row[0] not in dic:
dic[row[0]] = []
dic[row[0]].append({header[i] :row[i] for i in range(1, len(row))})
print(dic)
Hope this solution helps. if not please feel free to comment. Thanks
PS: you can add int()
for the numbers.
Edit: made the uniq check for image id
CodePudding user response:
Mods to your code (assuming input file is tab delimited Excel)
Code
import csv
import itertools
import operator
import json
def removekey(d, key):
' Returns dictionary with entry key removed '
r = dict(d)
del r[key]
return r
with open('out1.csv', 'r') as fp:
reader = csv.DictReader(fp, dialect='excel', skipinitialspace=True)
#headers = next(reader) # Edit -- handled by reader for dict
new_dict = {}
for group, records in itertools.groupby(reader, key=operator.itemgetter('image')):
# Remove image key from each dictionary in records
new_dict[group] = [removekey(d, 'image') for d in records]
json_object = json.dumps(new_dict, indent = 4)
print(json_object)
Output
{
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp": [
{
"id": "1",
"name": "Scratch",
"xMin": "604",
"xMax": "893",
"yMin": "230",
"yMax": "413"
},
{
"id": "2",
"name": "Dent",
"xMin": "921",
"xMax": "1146",
"yMin": "720",
"yMax": "857"
}
],
"858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp": [
{
"id": "1",
"name": "Dent",
"xMin": "343",
"xMax": "2323",
"yMin": "334",
"yMax": "343"
}
]
}
Note: Input File: (comma delimited values)
image,id,name,xMin,xMax,yMin,yMax
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp,1,Scratch,604,893,230,413
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_0_6.webp,2,Dent,921,1146,720,857
858a0246-2f2d-40a9-9bcb-01ab8a93c7f5_BU26844_1630586024_23.webp,1,Dent,343,2323,334,343