I have a json file with the following structure
[
{
"name": "Collection 1",
"details": [
{
"id": 302,
"description":"Book destined for kids",
},
{
"id": 304,
"description":"Book destined for Teen",
},
{
"id": 305,
"description":"Only for teen under the age of 13",
},
]
},
{
"name": "Collection 1",
"details": [
{
"id": 400,
"description":"books for adults to read",
},
]
},
]
I need to add a new key/value which value should be a substring of description Something like [teen,kids,adults]
Expected output:
[
{
"name": "Collection 1",
"details": [
{
"id": 302,
"description":"Book destined for kids",
"age range":"kids"
},
{
"id": 304,
"description":"Book destined for Teen",
"age range":"teen"
},
{
"id": 305,
"description":"Only for teen under the age of 13",
"age range":"teen"
},
]
},
{
"name": "Collection 2",
"details": [
{
"id": 400,
"description":"books for adults to read",
"age range":"adults"
},
]
},
]
Anyone know a way How to do it in effective way using pandas please(I need to keep the same structure)
CodePudding user response:
I would do it like this:
import json
age_keywords = ["kids", "adults", "teen"] #Extend if needed
#json string to json object
json_string = '[{"name": "Collection 1","details": [{"id": 302,"description":"Book destined for kids"},{"id": 304,"description":"Book destined for Teen"},{"id": 305,"description":"Only for teen under the age of 13"}]},{"name": "Collection 1","details": [{"id": 400,"description":"books for adults to read"}]}]'
json = json.loads(json_string)
#iterate details of every book
for collection in json:
for detail in collection['details']:
description = detail['description'] # get description
for keyword in age_keywords: #iterate every keyword
if keyword in description.lower(): #check if keyword is in description
detail['age_range'] = keyword #if keyword in description --> add age_range to json
print(json)
#save json to file --> data.json = filename
with open('data.json', 'w') as f:
json.dump(data, f)
But i didn't really understand where you want to use pandas?
This is my json output:
[
{
"name":"Collection 1",
"details":[
{
"id":302,
"description":"Book destined for kids",
"age_range":"kids"
},
{
"id":304,
"description":"Book destined for Teen",
"age_range":"teen"
},
{
"id":305,
"description":"Only for teen under the age of 13",
"age_range":"teen"
}
]
},
{
"name":"Collection 1",
"details":[
{
"id":400,
"description":"books for adults to read",
"age_range":"adults"
}
]
}
]
CodePudding user response:
Using pandas
and regular expressions, you can parse this data, clean it up, and then apply a regular expression to extract the mention of your keywords:
out = (
pd.json_normalize(data, 'details', 'name')
.assign(
description=lambda df: df['description'].str.lower(),
age_range=lambda df: df['description'].str.extract(r'(kid|teen|adult)')
)
)
print(out)
id description name age_range
0 302 book destined for kids Collection 1 kid
1 304 book destined for teen Collection 1 teen
2 305 only for teen under the age of 13 Collection 1 teen
3 400 books for adults to read Collection 1 adult