I'm trying to get the metadata out from a json using pandas json_normalize, but it does not work as expected.
I have a json fine with the following structure
data=[
{'a':'aa',
'b':{'b1':'bb1','b2':'bb2'},
'c':[{
'ca':[{'ca1':'caa1'
}]
}]
}]
I'd like to get the following
ca1 | a | b.b1 |
---|---|---|
caa1 | aa | bb1 |
I would expect this to work
pd.json_normalize(data, record_path=['c','ca'], meta = ['a',['b','b1']])
but it doesn't find the key b1. Strangely enough if my record_path is 'c' alone it does find the key. I feel I'm missing something here, but I can't figure out what. I appreciate any help!
CodePudding user response:
Going down first level you grab the meta as a list of columns you want to keep. Record path use a list to map levels that you want to go down. Finally column b is a dict you can apply to a Series concat back into df and pop to remove unpacked dict column.
df = pd.json_normalize(
data=data,
meta=['a', 'b'],
record_path=['c', 'ca']
)
df = pd.concat([df.drop(['b'], axis=1), df['b'].apply(pd.Series)], axis=1)
print(df)
Output:
ca1 a b1 b2
0 caa1 aa bb1 bb2
CodePudding user response:
This is a workaround I used eventually
data=[
{'a':'aa',
'b':{'b1':'bb1','b2':'bb2'},
'c':[{
'ca':[{'ca1':'caa1'
}]
}]
}]
df = pd.json_normalize(data, record_path=['c','ca'], meta = ['a',['b']]
)
df = pd.concat([df,pd.json_normalize(df['b'])],axis = 1)
df.drop(columns='b',inplace = True)
I still think there should be a better way, but it works