Home > Software engineering >  flattening json with mulitple nested lists
flattening json with mulitple nested lists

Time:10-14

I have a json with nested lists @ "ManyActionDateTimes" and @ "Comments" like this:

jframe = [{"LoadRef": 0,
            "BookedDate": "2021-10-13T01:15:54.287Z",
            "EndDateTime": "2021-10-13T01:15:54.287Z",
            "Duration": "string",
            "Mp": true,
            "ManyActionDateTimes": [
            {
            "EventTime": "2021-10-13T01:15:54.287Z",
            "Type": "string",
            "Vehicle": "string"
            }
            ],
            "Finished": true,
            "Comments": [
            {
            "Comment": "string",
            "DateInserted": "2021-10-13T01:15:54.287Z"
            }
            ],
            "Company": "string",
            "SiteCode": 0,
            "Source": "string"
            }]

i'm trying to flatten this into a dataframe, i know for one nested list i can use the 'record_path' argument but it doesnt work for multiples as when i use the code:

cols = []
[d] = jframe
drop = ['Comments','ManyActionDateTimes']
for k, v in d.items():
    if k in drop:
        pass
    else:
        cols.append(k)

jdf = pd.json_normalize(jframe,
record_path=['Comments','ManyActionDateTimes'],meta=cols, errors='ignore')

i get the error 'KeyError: 'ManyActionDateTimes':

KeyError: 'ManyActionDateTimes'
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-33-55851049e36a> in <module>
      4     cols.append(k)
      5 
----> 6 jdf = pd.json_normalize(jframe,
      7 record_path=['Comments','ManyActionDateTimes'],meta=cols, errors='ignore')
      8 #meta=[:], errors='ignore')

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
    502                 records.extend(recs)
    503 
--> 504     _recursive_extract(data, record_path, {}, level=0)
    505 
    506     result = DataFrame(records)

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _recursive_extract(data, path, seen_meta, level)
    472                         seen_meta[key] = _pull_field(obj, val[-1])
    473 
--> 474                 _recursive_extract(obj[path[0]], path[1:], seen_meta, level=level   1)
    475         else:
    476             for obj in data:

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _recursive_extract(data, path, seen_meta, level)
    475         else:
    476             for obj in data:
--> 477                 recs = _pull_records(obj, path[0])
    478                 recs = [
    479                     nested_to_record(r, sep=sep, max_level=max_level)

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _pull_records(js, spec)
    397         if has non iterable value.
    398         """
--> 399         result = _pull_field(js, spec)
    400 
    401         # GH 31507 GH 30145, GH 26284 if result is not list, raise TypeError if not

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _pull_field(js, spec)
    388                 result = result[field]
    389         else:
--> 390             result = result[spec]
    391         return result
    392 

KeyError: 'ManyActionDateTimes'

and none of my other solution attempts have worked either and im out of ideas,

many thanks

CodePudding user response:

You can use this instead:

df = pd.json_normalize(jframe)
df = df.join(df['ManyActionDateTimes'].explode().apply(pd.Series)).join(df['Comments'].explode().apply(pd.Series))
df.drop(columns=['ManyActionDateTimes', 'Comments'], inplace=True)

Output:

LoadRef                BookedDate               EndDateTime Duration    Mp  Finished Company  SiteCode  Source                 EventTime      Type  Vehicle Comment              DateInserted
0        0  2021-10-13T01:15:54.287Z  2021-10-13T01:15:54.287Z   string  True      True  string         0  string  2021-10-13T01:15:54.287Z  string  string  string  2021-10-13T01:15:54.287Z

CodePudding user response:

flatten_json works well with nested json structures

from flatten_json import flatten
dic_flattened = (flatten(d, '.') for d in jframe)
df = pd.DataFrame(dic_flattened)

Output

   LoadRef                BookedDate               EndDateTime Duration    Mp ManyActionDateTimes.0.EventTime  ... Finished Comments.0.Comment   Comments.0.DateInserted Company SiteCode  Source
0        0  2021-10-13T01:15:54.287Z  2021-10-13T01:15:54.287Z   string  True        2021-10-13T01:15:54.287Z  ...     True             string  2021-10-13T01:15:54.287Z  string        0  string

[1 rows x 14 columns]
  • Related