I have a json with nested lists @ "ManyActionDateTimes" and @ "Comments" like this:
jframe = [{"LoadRef": 0,
"BookedDate": "2021-10-13T01:15:54.287Z",
"EndDateTime": "2021-10-13T01:15:54.287Z",
"Duration": "string",
"Mp": true,
"ManyActionDateTimes": [
{
"EventTime": "2021-10-13T01:15:54.287Z",
"Type": "string",
"Vehicle": "string"
}
],
"Finished": true,
"Comments": [
{
"Comment": "string",
"DateInserted": "2021-10-13T01:15:54.287Z"
}
],
"Company": "string",
"SiteCode": 0,
"Source": "string"
}]
i'm trying to flatten this into a dataframe, i know for one nested list i can use the 'record_path' argument but it doesnt work for multiples as when i use the code:
cols = []
[d] = jframe
drop = ['Comments','ManyActionDateTimes']
for k, v in d.items():
if k in drop:
pass
else:
cols.append(k)
jdf = pd.json_normalize(jframe,
record_path=['Comments','ManyActionDateTimes'],meta=cols, errors='ignore')
i get the error 'KeyError: 'ManyActionDateTimes':
KeyError: 'ManyActionDateTimes'
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-33-55851049e36a> in <module>
4 cols.append(k)
5
----> 6 jdf = pd.json_normalize(jframe,
7 record_path=['Comments','ManyActionDateTimes'],meta=cols, errors='ignore')
8 #meta=[:], errors='ignore')
C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
502 records.extend(recs)
503
--> 504 _recursive_extract(data, record_path, {}, level=0)
505
506 result = DataFrame(records)
C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _recursive_extract(data, path, seen_meta, level)
472 seen_meta[key] = _pull_field(obj, val[-1])
473
--> 474 _recursive_extract(obj[path[0]], path[1:], seen_meta, level=level 1)
475 else:
476 for obj in data:
C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _recursive_extract(data, path, seen_meta, level)
475 else:
476 for obj in data:
--> 477 recs = _pull_records(obj, path[0])
478 recs = [
479 nested_to_record(r, sep=sep, max_level=max_level)
C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _pull_records(js, spec)
397 if has non iterable value.
398 """
--> 399 result = _pull_field(js, spec)
400
401 # GH 31507 GH 30145, GH 26284 if result is not list, raise TypeError if not
C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _pull_field(js, spec)
388 result = result[field]
389 else:
--> 390 result = result[spec]
391 return result
392
KeyError: 'ManyActionDateTimes'
and none of my other solution attempts have worked either and im out of ideas,
many thanks
CodePudding user response:
You can use this instead:
df = pd.json_normalize(jframe)
df = df.join(df['ManyActionDateTimes'].explode().apply(pd.Series)).join(df['Comments'].explode().apply(pd.Series))
df.drop(columns=['ManyActionDateTimes', 'Comments'], inplace=True)
Output:
LoadRef BookedDate EndDateTime Duration Mp Finished Company SiteCode Source EventTime Type Vehicle Comment DateInserted
0 0 2021-10-13T01:15:54.287Z 2021-10-13T01:15:54.287Z string True True string 0 string 2021-10-13T01:15:54.287Z string string string 2021-10-13T01:15:54.287Z
CodePudding user response:
flatten_json works well with nested json structures
from flatten_json import flatten
dic_flattened = (flatten(d, '.') for d in jframe)
df = pd.DataFrame(dic_flattened)
Output
LoadRef BookedDate EndDateTime Duration Mp ManyActionDateTimes.0.EventTime ... Finished Comments.0.Comment Comments.0.DateInserted Company SiteCode Source
0 0 2021-10-13T01:15:54.287Z 2021-10-13T01:15:54.287Z string True 2021-10-13T01:15:54.287Z ... True string 2021-10-13T01:15:54.287Z string 0 string
[1 rows x 14 columns]