I have the following json:
{
"meta": {
"collection": 0
},
"series": {
"default": {
"data": {
"columns": [
{
"columnName": "A",
"columnType": "STRING"
},
{
"columnName": "B",
"columnType": "STRING"
}
],
"rows": [
[
{
"columnName": "A",
"value": "X1"
},
{
"columnName": "B",
"value": "Y1"
}
],
[
{
"columnName": "A",
"value": "X2"
},
{
"columnName": "B",
"value": "Y2"
}
]
]
}
}
}
}
I'm trying to parse this json into a dataframe which should look like this:
A B
--------
X1 Y1
X2 Y2
Here's what I have tried so far:
import pandas as pd
results = {"meta":{"collection":0},"series":{"default":{"data":{"columns":[{"columnName":"A","columnType":"STRING"},{"columnName":"B","columnType":"STRING"}],"rows":[[{"columnName":"A","value":"X1"},{"columnName":"B","value":"Y1"}],[{"columnName":"A","value":"X2"},{"columnName":"B","value":"Y2"}]]}}}}
s = results["series"]["default"]["data"]
df = pd.json_normalize(s, record_path=["rows"])
The problem is the columns in the df contain the json array, and not the values
0 1
0 {'columnName': 'A', 'value': 'X1'} {'columnName': 'B', 'value': 'Y1'}
1 {'columnName': 'A', 'value': 'X2'} {'columnName': 'B', 'value': 'Y2'}
Is there anyway to use json_normalize to achieve the result I want, or is parsing through json myself the way to go?
CodePudding user response:
One option is to stack
it and create a DataFrame with the stacked Series and pivot
it:
out = (pd.DataFrame(my_data['series']['default']['data']['rows'])
.stack()
.pipe(lambda x: pd.DataFrame(x.tolist(), index=x.index))
.reset_index(level=0)
.pivot('level_0','columnName','value')
.rename_axis(index=[None], columns=[None]))
An easier (and probably more efficient) solution is to reformat the data in Python using dict.setdefault
in a nested loop; then build the DataFrame:
tmp = {}
for lst in my_data['series']['default']['data']['rows']:
for d in lst:
tmp.setdefault(d['columnName'], []).append(d['value'])
out = pd.DataFrame(tmp)
Output:
A B
0 X1 Y1
1 X2 Y2