I have a python script which can take a JSON file and pulls out specific columns I want with the Pandas json_normalize function. But I have a nested JSON set of values within the JSON that I am trying to pull out but cant get the code to properly get those values.
Below is the JSON value. The top tier is "cardEditions" within that tier is "cardDetails". I want to grab some of the displayName and value details from this nested json and put them into the csv with the cardEditions and the editionNo value.
Looking for the output to be in a CSV pipe delimited like the following with the displayValues as the headers from the nested json.
editionNo | Name | Edition | Position |
---|---|---|---|
666 | Matt Hu | 1st Edition | Center Field |
{
"cardEditions": [{
"editionNo": 666,
"id": 1111,
"cardDetails": [{
"valueType": "Text",
"displayValueType": "Text",
"displayName": "Name",
"value": "Matt Hu"
},
{
"valueType": "Text",
"displayValueType": "Text",
"displayName": "Edition",
"value": "1st Edition"
},
{
"valueType": "Text",
"displayValueType": "Text",
"displayName": "Position",
"value": "Center Field"
}
],
"cardStatus": "NA"
}]
}
CodePudding user response:
import pandas as pd
d = {
"cardEditions": [{
"editionNo": 666,
"id": 1111,
"cardDetails": [{
"valueType": "Text",
"displayValueType": "Text",
"displayName": "Name",
"value": "Matt Hu"
},
{
"valueType": "Text",
"displayValueType": "Text",
"displayName": "Edition",
"value": "1st Edition"
},
{
"valueType": "Text",
"displayValueType": "Text",
"displayName": "Position",
"value": "Center Field"
}
],
"cardStatus": "NA"
}]
}
df = pd.DataFrame(columns=['editionNo', 'Name', 'Edition', 'Position'])
for i, edition in enumerate(d['cardEditions']):
no = edition['editionNo']
vals = [details['value'] for details in edition['cardDetails']]
df.loc[i, :] = (no, *vals)
print(df)
prints
index | editionNo | Name | Edition | Position |
---|---|---|---|---|
0 | 666 | Matt Hu | 1st Edition | Center Field |
CodePudding user response:
Here is another solution with json_normalize
and pivot
.
# where "j" is your json data
out = (
pd.json_normalize(json.loads(j)['cardEditions'], record_path=['cardDetails'], meta='editionNo')
.drop(['valueType', 'displayValueType'],axis=1)
.pivot(index='editionNo', columns='displayName', values='value')
.rename_axis(columns=None)
.reset_index()
)
print(out)
Output:
editionNo Edition Name Position
0 666 1st Edition Matt Hu Center Field
If somebody knows how to use json_normalize
with record_path
of which we don't want all fields, I'd really like to know. I had to drop to columns from the record_path because I don't know how to skip them in the first place.