Home > Software design >  Flatten list of dictionaries in dataframe
Flatten list of dictionaries in dataframe

Time:04-30

I'm pulling data with Facebook Insights API and there are nested columns in the data I pull. I tried separating them by index but failed.

column I want to split:

[{'action_type': 'link_click', 'value': '1'}, {'action_type': 'video_view', 'value': '1'}]

the state i want to translate:

actions_video_view  actions_link_click
 1                   1
xx = dataframe['actions'].apply(pd.Series).merge(dataframe["index"],
        right_index=True,
        left_index=True).melt(id_vars=['index'],
            value_name='actions')
xx2 = xx['action_type'].apply(pd.Series).merge(xx["index"],
        right_index=True, 
        left_index=True)
xx2 = xx2.loc[xx2['action_type'] == 'video_view', ["value", "index"]]

when i run this code i get the following error:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc      
    raise KeyError(key) from err
KeyError: 'action_type'

I want to separate the column according to the keys and add it as a dataframe column, which way can I use for this?

An example of how it looks in the data:

actions
[{'action_type': 'link_click', 'value': '1'}, {'action_type': 'video_view', 'value': '1'}]
[{'action_type': 'link_click', 'value': '3'}, {'action_type': 'video_view', 'value': '3'}]
[{'action_type': 'link_click', 'value': '5'}, {'action_type': 'video_view', 'value': '5'}]
[{'action_type': 'link_click', 'value': '6'}, {'action_type': 'video_view', 'value': '6'}]
[{'action_type': 'link_click', 'value': '7'}, {'action_type': 'video_view', 'value': '7'}]

if i want to apply:

actions_link_click  actions_video_view
1                    1
3                    3
5                    5
6                    6
7                    7

CodePudding user response:

I think you should have a look on how that data is generated like that to each row of a dataframe. I think that is not straightforward. But for your current problem, here is a solution:

import pandas as pd
import json

def convert_df(col):
    tmp = col.apply(pd.Series)
    out = (tmp.assign(idx=tmp.groupby('action_type').cumcount())
    .pivot(index='idx', columns='action_type', values='value')
    .add_prefix('actions_').rename_axis(columns=None).reset_index(drop=True))
    return out

rows = [[{'action_type': 'link_click', 'value': '1'}, {'action_type': 'video_view', 'value': '1'}],
[{'action_type': 'link_click', 'value': '3'}, {'action_type': 'video_view', 'value': '3'}],
[{'action_type': 'link_click', 'value': '5'}, {'action_type': 'video_view', 'value': '5'}],
[{'action_type': 'link_click', 'value': '6'}, {'action_type': 'video_view', 'value': '6'}],
[{'action_type': 'link_click', 'value': '7'}, {'action_type': 'video_view', 'value': '7'}],]


df = pd.DataFrame({'actions' : rows})
df = pd.json_normalize(df['actions'])

df = pd.concat([
    convert_df(df[0]),
    convert_df(df[1])
], axis=1)
print(df)

  actions_link_click actions_video_view
0                  1                  1
1                  3                  3
2                  5                  5
3                  6                  6
4                  7                  7

CodePudding user response:

This does the job,

grouped_df = df.groupby("action_type")
new_df = pd.DataFrame([])

for action in df["action_type"].unique():
  new_df[f"actions_{action}"] = list(grouped_df.get_group(action)["value"])

Output -

actions_link_click actions_video_view
0 1 1
1 3 3
2 5 5
3 6 6
4 7 7
  • Related