Home > Net >  Is there a pandas function to unpack a column that's a dict into a long format?
Is there a pandas function to unpack a column that's a dict into a long format?

Time:05-12

So the data has the column [outbreakMap] as a pandas series. I recreated a snippet of the data as a df below:

Initial Data: enter image description here

data = {0: {'id': '53545',
  'latitude': -1.5168500000000003,
  'longitude': 105.2985,
  'outbreakMap': {'ob_102795': {'nationalObReference': '403/KPTS/05/22',
    'oieReference': 'ob_102795',
    'outbreakStartDate': '2022-04-12'},
   'ob_102796': {'nationalObReference': '404/KPTS/05/22',
    'oieReference': 'ob_102796',
    'outbreakStartDate': '2022-04-22'}}},
 1: {'id': '53709',
  'latitude': 36.42056,
  'longitude': 139.9085,
  'outbreakMap': {'ob_101531': {'oieReference': 'ob_101531',
    'outbreakStartDate': '2022-04-12'},
   'ob_101644': {'oieReference': 'ob_101644',
    'outbreakStartDate': '2022-04-14'},
   'ob_102290': {'oieReference': 'ob_102290',
    'outbreakStartDate': '2022-04-21'},
   'ob_100121': {'oieReference': 'ob_100121',
    'outbreakStartDate': '2022-03-24'},
   'ob_102949': {'oieReference': 'ob_102949',
    'outbreakStartDate': '2022-05-09'}}}}

import pandas as pd
df = pd.DataFrame.from_dict(data, orient='index')

Goal: Unpack the dictionary into a long format like-so:

enter image description here

Thank you.

CodePudding user response:

[EDIT]

I think this is what you want.

def into_long(outbreakMap):
    '''
    Extracting key, oieReference, and outbreakStartDate
    from outbreakMap
    '''
    keys = outbreakMap.keys()
    # return list of tuples
    return [(
        key,
        outbreakMap[key]['oieReference'],
        outbreakMap[key]['outbreakStartDate']
    ) for key in keys]

df['outbreakMap_tuple'] = df['outbreakMap'].apply(lambda x: into_long(x))

# list to multiple rows
# further reading: https://stackoverflow.com/questions/39954668/how-to-convert-column-with-list-of-values-into-rows-in-pandas-dataframe
df = df.explode('outbreakMap_tuple')

# tuples to multiple columns
# further reading: https://stackoverflow.com/questions/29550414/how-can-i-split-a-column-of-tuples-in-a-pandas-dataframe
df[['outbreakMap', 'oieReference', 'outbreakStartDate']] = pd.DataFrame(df['outbreakMap_tuple'].tolist(), index=df.index)

df.drop('outbreakMap_tuple', axis=1, inplace=True)

# this is optional
df.reset_index(drop=True, inplace=True)

The outcome looks like

enter image description here

I am sure that there are better ways. This is just one of them.

CodePudding user response:

This will return a list of some of the columns. Hope this helps you get an idea how you can unpack it into dataframe:

[
[
    [data[n]['id'], 
    data[n]['outbreakMap'][i]['latitude'], 
    data[n]['outbreakMap'][i]['longitude']
] 
    for i in data[n]['outbreakMap']]
for n in data]

CodePudding user response:

You can modify data to include the desired keys to the inner dicts, then construct the DataFrame and explode the added columns:

for v in data.values():
    outbreakMap = v.pop('outbreakMap')
    v['outbreak'] = list(outbreakMap)
    v['outbreak_information'] = list(outbreakMap.values())
    
out = pd.DataFrame.from_dict(data, orient='index').explode(['outbreak','outbreak_information'])

Output:

      id  latitude  longitude   outbreak                               outbreak_information
0  53545  -1.51685   105.2985  ob_102795  {'nationalObReference': '403/KPTS/05/22', 'oie...
0  53545  -1.51685   105.2985  ob_102796  {'nationalObReference': '404/KPTS/05/22', 'oie...
1  53709  36.42056   139.9085  ob_101531  {'oieReference': 'ob_101531', 'outbreakStartDa...
1  53709  36.42056   139.9085  ob_101644  {'oieReference': 'ob_101644', 'outbreakStartDa...
1  53709  36.42056   139.9085  ob_102290  {'oieReference': 'ob_102290', 'outbreakStartDa...
1  53709  36.42056   139.9085  ob_100121  {'oieReference': 'ob_100121', 'outbreakStartDa...
1  53709  36.42056   139.9085  ob_102949  {'oieReference': 'ob_102949', 'outbreakStartDa...
  • Related