Home > Back-end >  Fastest way to find multiple entries from json using Python
Fastest way to find multiple entries from json using Python

Time:10-11

I have a JSON that has around 50k items where each has an id and name as follows (I cut the data):

[
    {
      "id": 2,
      "name": "Cannonball"
    },
    {
      "id": 6,
      "name": "Cannon base"
    },
    {
      "id": 8,
      "name": "Cannon stand"
    },
    {
      "id": 10,
      "name": "Cannon barrels"
    },
    {
      "id": 12,
      "name": "Cannon furnace"
    },
    {
      "id": 28,
      "name": "Insect repellent"
    },
    {
      "id": 30,
      "name": "Bucket of wax"
    }]

Now, I have an array of item names and I want to find the corresponding id and to add it into an id array.

For example, I have itemName = ['Cannonball', 'Cannon furnace', 'Bucket of wax]

I would like to search inside the JSON and to return id_array = [2, 12, 30]

I wrote the following code which does the work however it seems like a huge waste of energy:

file_name = "database.json"
with open(file_name, 'r') as f:
    document =  json.loads(f.read())

items = ['Cannonball', 'Cannon furnace','Bucket of wax']
for item_name in items:
    for entry in document:
            if item_name == entry ['name']:
                id_array.append(entry ['id'])

Is there any faster method that can do it?

The example above shows only 3 results but I'm talking about a few thousand and it feels like a waste to iterate over 1k results.

Thank you

CodePudding user response:

Build a lookup dictionary mapping names to ids and then look up the names on that dictionary:

lookup = { d["name"] : d["id"] for d in document}

items = ['Cannonball', 'Cannon furnace','Bucket of wax']

result = [lookup[item] for item in items]
print(result)

Output

[2, 12, 30]

The time complexity of this approach is O(n m) where n is the number of elements in the document (len(document)) and m is the number of items (len(items)), in contrast your approach is O(nm).

An alternative approach that uses less space, is to filter out those names that are not in items:

items = ['Cannonball', 'Cannon furnace', 'Bucket of wax']
item_set = set(items)

lookup = {d["name"]: d["id"] for d in document if d["name"] in item_set}
result = [lookup[item] for item in items]

This approach has the same time complexity as the previous one.

CodePudding user response:

You could generate a dict which maps name to id first:

file_name = "database.json"
with open(file_name, 'r') as f:
    document =  json.loads(f.read())

name_to_id = {item["name"]:item["id"] for item in document}

Now you can just iterate over items:

items = ['Cannonball', 'Cannon furnace','Bucket of wax']
id_array = [ name_to_id[name] for name in items]
  • Related