I am having a problem counting the total number of unique ids in a nested list.
Nested list:
[
[
{
"id": "a",
"label": "Truck",
"annotation": "vehicle",
"visible": "No",
"label2": "Truck",
"shape": "rectangle",
"x": 4,
"y": 500,
"height": 200,
"width": 300
},
{
"id": "b",
"label": "Truck",
"annotation": "vehicle",
"visible": "No",
"label2": "Truck",
"shape": "rectangle",
"x": 3,
"y": 400,
"height": 250,
"width": 360
},
...
],
[
{
"id": "a",
"label": "Truck",
"annotation": "vehicle",
"visible": "No",
"label2": "Truck",
"shape": "rectangle",
"x": 4,
"y": 500,
"height": 200,
"width": 300
},
{
"id": "b",
"label": "Truck",
"annotation": "vehicle",
"visible": "No",
"label2": "Truck",
"shape": "rectangle",
"x": 3,
"y": 400,
"height": 250,
"width": 360
},
...
],
...
]
Currently, it keeps on printing out the result below, which is not what I want:
id: 1,
label: 1,
annotation: 1,
visible: 1,
label2: 1,
shape: 1,
x: 1,
y: 1,
height: 1,
width: 1
...
id: 1,
label: 1,
annotation: 1,
visible: 1,
label2: 1,
shape: 1,
x: 1,
y: 1,
height: 1,
width: 1
How can I get this nested list which also contains dictionaries to just count id "a" and "b" once without using pandas?
Output I do want:
Unique id: 2
Code:
import json
import os
import pandas as pd
from itertools import chain
path = 'mypath/json_name.json'
size = os.path.getsize(path)
def func1(data):
c = {}
for key,value in data.items():
try:
c[key].append(value)
except KeyError:
c[key] = [value]
for key,value in c.items():
print("{0}:{1}". format(key, len(set(value))))
def totalUniqueId(data):
for inner_list in data:
for inner_dict in inner_list:
func1(inner_dict)
with open('json_name.json') as json_file:
if size> 13000:
json_file.seek(0)
test_data = json.load(json_file)
totalUniqueId(test_data)
Resources I used:
- Python - List of unique dictionaries
- How can I create a histogram of appearances of values in a dictionary?
CodePudding user response:
If I understand what you need correctly. I think one solution would be to store all of your ids in a temporal list with all the ids and then use Counter to count the ocurrences of each unique id in that list.
Something like this.
from collections import Counter
ids = []
for x in l:
for y in x:
ids.append(y['id'])
print(Counter(ids))
This is the output you would get if youy run that code with an example of a nested list:
l = [
[
{
"id": "a",
"label": "Truck",
"annotation": "vehicle",
},
{
"id": "b",
"label": "Truck",
"annotation": "vehicle",
},
],
[
{
"id": "a",
"label": "Truck",
"annotation": "vehicle",
},
{
"id": "b",
"label": "Truck",
"annotation": "vehicle",
},
],
]
from collections import Counter
ids = []
for x in l:
for y in x:
ids.append(y['id'])
print(Counter(ids))
Will get you:
Counter({'a': 2, 'b': 2})
CodePudding user response:
Then simplest way would be to put the ids in set
and use its length:
import json
with open('json_name.json') as json_file:
data = json.load(json_file)
unique_ids = set()
for sublist in data:
for obj in sublist:
unique_ids.add(obj['id'])
print(f'Unique ids: {len(unique_ids)}')
You could do the same thing with the one-liner which is called a set comprehension:
unique_ids = {obj['id'] for sublist in data for obj in sublist}