Learning python and stuff due to my need of collecting bunch of data for my project and I am stuck here. I am using scrapy to scrape a json response from an API which looks like this;
"status": "ok",
"status_message": "Query was successful",
"data": {
"product_count": 40993,
"limit": 20,
"page_number": 1,
"products": [
{
"id": 41789,
"url": "https://anything1.com",
"product_name": "product1",
"manufacturing_date": "19.12.2014",
"rating": 5.3,
"material": "something",
"description": "",
"cover_image": "anycover1.com",
"state": "ok",
"variants": [
{
"url": "https://anyvariant1.com",
"product_code": "55BEF7",
"material": "something",
"size": "small",
"dimensions": "" },
{
"url": "https://anyvariant2.com",
"product_code": "55BEF8",
"material": "something",
"size": "medium",
"dimensions": "" },
{
"url": "https://anyvariant3.com",
"product_code": "55BEF9",
"material": "something",
"size": "large",
"dimensions": "" }
]
},
{
"id": 41790,
"url": "https://anything2.com",
"product_name": "product2",
"manufacturing_date": "02.10.2014",
"rating": 7.2,
"material": "something",
"description": "",
"cover_image": "anycover2.com",
"state": "ok",
"variants": [
{
"url": "https://anyvariant4.com",
"product_code": "55BEG7",
"material": "something",
"size": "small",
"dimensions": "" },
{
"url": "https://anyvariant5.com",
"product_code": "55BEG8",
"material": "something",
"size": "medium",
"dimensions": "" },
{
"url": "https://anyvariant6.com",
"product_code": "55BEG9",
"material": "something",
"size": "large",
"dimensions": "" }
]
},
{
_______
},
{
_______
}
]
},
"@meta": {
"server_time": 1651288705,
"execution_time": "0.01 ms"
}
}
And this is how my scraper code looks like;
data = json.loads(response.body)
data_main = data['data']['products']
product_list = []
for item in data_main:
id = item['id']
url = item['url']
product_name = item['product_name']
rating = item['rating']
cover_image = item['cover_image']
description = item['description']
product = {
'id': id,
'url': url,
'name': product_name,
'image': cover_image,
'rating': rating,
'description': description
}
product_list.append(product)
return product_list
With this keys and values of id, url, name, image, rating, description are accessible. But unable to access and modify the nested keys and their values all at once (and ignore some keys and values). So how can I do that? And if there is any other better code to achieve what I need then please suggest. Thanks a lot.
CodePudding user response:
By nested keys and values, I assume you mean the ones under variants
. You can access those in much the same way as you iterated through items:
variant_list = []
for variant in item[variants]:
url = variant['url']
# and so on... for whatever other keys you're interested in
new_variant = {'url':url} # and whatever other keys you want
variant_list.append(new_variant)
I have to wonder, though, why you're reconstructing dictionaries that are similar to ones that JSON gave you? For many purposes, you might as well stick with the dictionary that JSON gave you.