I'm pulling data with Facebook Insights API and there are nested columns in the data I pull. I tried separating them by index but failed.
column I want to split:
[{'action_type': 'link_click', 'value': '1'}, {'action_type': 'video_view', 'value': '1'}]
the state i want to translate:
actions_video_view actions_link_click
1 1
xx = dataframe['actions'].apply(pd.Series).merge(dataframe["index"],
right_index=True,
left_index=True).melt(id_vars=['index'],
value_name='actions')
xx2 = xx['action_type'].apply(pd.Series).merge(xx["index"],
right_index=True,
left_index=True)
xx2 = xx2.loc[xx2['action_type'] == 'video_view', ["value", "index"]]
when i run this code i get the following error:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3458, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'action_type'
I want to separate the column according to the keys and add it as a dataframe column, which way can I use for this?
An example of how it looks in the data:
actions
[{'action_type': 'link_click', 'value': '1'}, {'action_type': 'video_view', 'value': '1'}]
[{'action_type': 'link_click', 'value': '3'}, {'action_type': 'video_view', 'value': '3'}]
[{'action_type': 'link_click', 'value': '5'}, {'action_type': 'video_view', 'value': '5'}]
[{'action_type': 'link_click', 'value': '6'}, {'action_type': 'video_view', 'value': '6'}]
[{'action_type': 'link_click', 'value': '7'}, {'action_type': 'video_view', 'value': '7'}]
if i want to apply:
actions_link_click actions_video_view
1 1
3 3
5 5
6 6
7 7
CodePudding user response:
I think you should have a look on how that data is generated like that to each row of a dataframe. I think that is not straightforward. But for your current problem, here is a solution:
import pandas as pd
import json
def convert_df(col):
tmp = col.apply(pd.Series)
out = (tmp.assign(idx=tmp.groupby('action_type').cumcount())
.pivot(index='idx', columns='action_type', values='value')
.add_prefix('actions_').rename_axis(columns=None).reset_index(drop=True))
return out
rows = [[{'action_type': 'link_click', 'value': '1'}, {'action_type': 'video_view', 'value': '1'}],
[{'action_type': 'link_click', 'value': '3'}, {'action_type': 'video_view', 'value': '3'}],
[{'action_type': 'link_click', 'value': '5'}, {'action_type': 'video_view', 'value': '5'}],
[{'action_type': 'link_click', 'value': '6'}, {'action_type': 'video_view', 'value': '6'}],
[{'action_type': 'link_click', 'value': '7'}, {'action_type': 'video_view', 'value': '7'}],]
df = pd.DataFrame({'actions' : rows})
df = pd.json_normalize(df['actions'])
df = pd.concat([
convert_df(df[0]),
convert_df(df[1])
], axis=1)
print(df)
actions_link_click actions_video_view
0 1 1
1 3 3
2 5 5
3 6 6
4 7 7
CodePudding user response:
This does the job,
grouped_df = df.groupby("action_type")
new_df = pd.DataFrame([])
for action in df["action_type"].unique():
new_df[f"actions_{action}"] = list(grouped_df.get_group(action)["value"])
Output -
actions_link_click | actions_video_view | |
---|---|---|
0 | 1 | 1 |
1 | 3 | 3 |
2 | 5 | 5 |
3 | 6 | 6 |
4 | 7 | 7 |