So the data has the column [outbreakMap] as a pandas series. I recreated a snippet of the data as a df below:
data = {0: {'id': '53545',
'latitude': -1.5168500000000003,
'longitude': 105.2985,
'outbreakMap': {'ob_102795': {'nationalObReference': '403/KPTS/05/22',
'oieReference': 'ob_102795',
'outbreakStartDate': '2022-04-12'},
'ob_102796': {'nationalObReference': '404/KPTS/05/22',
'oieReference': 'ob_102796',
'outbreakStartDate': '2022-04-22'}}},
1: {'id': '53709',
'latitude': 36.42056,
'longitude': 139.9085,
'outbreakMap': {'ob_101531': {'oieReference': 'ob_101531',
'outbreakStartDate': '2022-04-12'},
'ob_101644': {'oieReference': 'ob_101644',
'outbreakStartDate': '2022-04-14'},
'ob_102290': {'oieReference': 'ob_102290',
'outbreakStartDate': '2022-04-21'},
'ob_100121': {'oieReference': 'ob_100121',
'outbreakStartDate': '2022-03-24'},
'ob_102949': {'oieReference': 'ob_102949',
'outbreakStartDate': '2022-05-09'}}}}
import pandas as pd
df = pd.DataFrame.from_dict(data, orient='index')
Goal: Unpack the dictionary into a long format like-so:
Thank you.
CodePudding user response:
[EDIT]
I think this is what you want.
def into_long(outbreakMap):
'''
Extracting key, oieReference, and outbreakStartDate
from outbreakMap
'''
keys = outbreakMap.keys()
# return list of tuples
return [(
key,
outbreakMap[key]['oieReference'],
outbreakMap[key]['outbreakStartDate']
) for key in keys]
df['outbreakMap_tuple'] = df['outbreakMap'].apply(lambda x: into_long(x))
# list to multiple rows
# further reading: https://stackoverflow.com/questions/39954668/how-to-convert-column-with-list-of-values-into-rows-in-pandas-dataframe
df = df.explode('outbreakMap_tuple')
# tuples to multiple columns
# further reading: https://stackoverflow.com/questions/29550414/how-can-i-split-a-column-of-tuples-in-a-pandas-dataframe
df[['outbreakMap', 'oieReference', 'outbreakStartDate']] = pd.DataFrame(df['outbreakMap_tuple'].tolist(), index=df.index)
df.drop('outbreakMap_tuple', axis=1, inplace=True)
# this is optional
df.reset_index(drop=True, inplace=True)
The outcome looks like
I am sure that there are better ways. This is just one of them.
CodePudding user response:
This will return a list of some of the columns. Hope this helps you get an idea how you can unpack it into dataframe:
[
[
[data[n]['id'],
data[n]['outbreakMap'][i]['latitude'],
data[n]['outbreakMap'][i]['longitude']
]
for i in data[n]['outbreakMap']]
for n in data]
CodePudding user response:
You can modify data
to include the desired keys to the inner dicts, then construct the DataFrame and explode
the added columns:
for v in data.values():
outbreakMap = v.pop('outbreakMap')
v['outbreak'] = list(outbreakMap)
v['outbreak_information'] = list(outbreakMap.values())
out = pd.DataFrame.from_dict(data, orient='index').explode(['outbreak','outbreak_information'])
Output:
id latitude longitude outbreak outbreak_information
0 53545 -1.51685 105.2985 ob_102795 {'nationalObReference': '403/KPTS/05/22', 'oie...
0 53545 -1.51685 105.2985 ob_102796 {'nationalObReference': '404/KPTS/05/22', 'oie...
1 53709 36.42056 139.9085 ob_101531 {'oieReference': 'ob_101531', 'outbreakStartDa...
1 53709 36.42056 139.9085 ob_101644 {'oieReference': 'ob_101644', 'outbreakStartDa...
1 53709 36.42056 139.9085 ob_102290 {'oieReference': 'ob_102290', 'outbreakStartDa...
1 53709 36.42056 139.9085 ob_100121 {'oieReference': 'ob_100121', 'outbreakStartDa...
1 53709 36.42056 139.9085 ob_102949 {'oieReference': 'ob_102949', 'outbreakStartDa...