I have a JSON that has around 50k items where each has an id and name as follows (I cut the data):
[
{
"id": 2,
"name": "Cannonball"
},
{
"id": 6,
"name": "Cannon base"
},
{
"id": 8,
"name": "Cannon stand"
},
{
"id": 10,
"name": "Cannon barrels"
},
{
"id": 12,
"name": "Cannon furnace"
},
{
"id": 28,
"name": "Insect repellent"
},
{
"id": 30,
"name": "Bucket of wax"
}]
Now, I have an array of item names and I want to find the corresponding id and to add it into an id array.
For example, I have itemName = ['Cannonball', 'Cannon furnace', 'Bucket of wax]
I would like to search inside the JSON and to return id_array = [2, 12, 30]
I wrote the following code which does the work however it seems like a huge waste of energy:
file_name = "database.json"
with open(file_name, 'r') as f:
document = json.loads(f.read())
items = ['Cannonball', 'Cannon furnace','Bucket of wax']
for item_name in items:
for entry in document:
if item_name == entry ['name']:
id_array.append(entry ['id'])
Is there any faster method that can do it?
The example above shows only 3 results but I'm talking about a few thousand and it feels like a waste to iterate over 1k results.
Thank you
CodePudding user response:
Build a lookup
dictionary mapping names to ids and then look up the names on that dictionary:
lookup = { d["name"] : d["id"] for d in document}
items = ['Cannonball', 'Cannon furnace','Bucket of wax']
result = [lookup[item] for item in items]
print(result)
Output
[2, 12, 30]
The time complexity of this approach is O(n m)
where n
is the number of elements in the document (len(document)
) and m
is the number of items (len(items)
), in contrast your approach is O(nm)
.
An alternative approach that uses less space, is to filter out those names that are not in items:
items = ['Cannonball', 'Cannon furnace', 'Bucket of wax']
item_set = set(items)
lookup = {d["name"]: d["id"] for d in document if d["name"] in item_set}
result = [lookup[item] for item in items]
This approach has the same time complexity as the previous one.
CodePudding user response:
You could generate a dict
which maps name
to id
first:
file_name = "database.json"
with open(file_name, 'r') as f:
document = json.loads(f.read())
name_to_id = {item["name"]:item["id"] for item in document}
Now you can just iterate over items
:
items = ['Cannonball', 'Cannon furnace','Bucket of wax']
id_array = [ name_to_id[name] for name in items]