Home > Blockchain >  retrieving data from json with python
retrieving data from json with python

Time:10-15

I have a nested JSON data. I want to get the value of key "name" inside the dictionary "value" based on the key "id" in "key" dictionary (let the user enter the id). I don't want to use indexing which, because places are changing on every url differently. Also data is large, so I need one row solution (without for loop).

Code

import requests, re, json

r = requests.get('https://www.trendyol.com/apple/macbook-air-13-m1-8gb-256gb-ssd-altin-p-67940132').text

json_data1 = json.loads(re.search(r"window.__PRODUCT_DETAIL_APP_INITIAL_STATE__=({.*}});window", r).group(1))

print(json_data1)

print('json_data1:',json_data1['product']['attributes'][0]['value']['name'])

Output

{'product': {'attributes': [{'key': {'name': 'İşlemci Tipi', 'id': 168}, 'value': {'name': 'Apple M1', 'id': 243383}, 'starred': True, 'description': '', 'mediaUrls': []}, {'key': {'name': 'SSD Kapasitesi', 'id': 249}..........
json_data1: Apple M1

JSON Data

{
"product": {
  "attributes": [
    {
      "key": { "name": "İşlemci Tipi", "id": 168 },
      "value": { "name": "Apple M1", "id": 243383 },
      "starred": true,
      "description": "",
      "mediaUrls": []
    },
    {
      "key": { "name": "SSD Kapasitesi", "id": 249 },
      "value": { "name": "256 GB", "id": 3376 },
      "starred": true,
      "description": "",
      "mediaUrls": []
    },
    .
    .
    .
    ]
}
}

Expected Output is getting value by key id: (type must be str)

input >> id: 168

output >> name: Apple M1

CodePudding user response:

Since you originally didn't want a for loop, but now it's a matter of speed,

Here's a solution with for loop, you can test it and see if it's faster than the one you already had

import json

with open("file.json") as f:
 data = json.load(f)

search_key = int(input("Enter id: "))

for i in range(0, len(data['product']['attributes'])):
 if search_key == data['product']['attributes'][i]['key']['id']:
  print(data['product']['attributes'][i]['value']['name'])

Input >> Enter id: 168

Output >> Apple M1

CodePudding user response:

I found the solution with for loop. It works fast so I preferred it.

for i in json_data1['product']['attributes']:
    cpu = list(list(i.values())[0].values())[1]
    if cpu == 168:
        print(list(list(i.values())[1].values())[0])

CodePudding user response:

Iteration is unavoidable if the index is unknown, but the cost can be reduced substantially by using a generator expression and Python's built-in next function:

next((x["value"]["name"] for x in data["product"]["attributes"] if x["key"]["id"] == 168), None)

Edit:

To verify that a generator expression is in fact faster than a for loop, here is a comparison of the running time of xFranco's solution and the above:

import time

def time_func(func):
        
    def timer(*args):
        
        time1 = time.perf_counter()
        func(*args)
        time2 = time.perf_counter()
        
        return (time2 - time1) * 1000
    
    return timer

number_of_attributes = 100000

data = {
    "product": {
        "attributes": [
            {
                "key": { "name": "İşlemci Tipi", "id": i },
                "value": { "name": "name"   str(i), "id": 243383 },
                "starred": True,
                "description": "",
                "mediaUrls": []
            } for i in range(number_of_attributes)
        ]
    }
}

def getName_generator(id):
    return next((x["value"]["name"] for x in data["product"]["attributes"] if x["key"]["id"] == id), None)

def getName_for_loop(id):
    return_value = None
    
    for i in range(0, len(data['product']['attributes'])):
        if id == data['product']['attributes'][i]['key']['id']:
            return_value = data['product']['attributes'][i]['value']['name']
    
    return return_value

print("Generator:", time_func(getName_generator)(0))
print("For loop:", time_func(getName_for_loop)(0))

print()

print("Generator:", time_func(getName_generator)(number_of_attributes - 1))
print("For loop:", time_func(getName_for_loop)(number_of_attributes - 1))

My results:

Generator: 0.0075999999999964984
For loop: 43.73920000000003

Generator: 23.633300000000023
For loop: 49.839699999999986

So apparently a generator expression is faster even if it has to traverse the entire data set.

  • Related