Hello Stackoverflowers!
I have a DataFrame, obtained from Facebook Marketing API, and I'd like to "unnest" a column into several rows.
This is the a sample of the data I obtained through the API:
ad_name video_play_curve_actions
ad_1 [{'action_type': 'video_view', 'value': [100, 40, 16, 10, 7, 5, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}]
ad_2 [{'action_type': 'video_view', 'value': [100, 51, 22, 13, 9, 7, 6, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}]
What I am looking for, is an outcome that looks like illustrated below
I'd need a for loop for this because the solution needs to work for many 100 rows.
I've added example data and desired output to this sheet: https://docs.google.com/spreadsheets/d/1jjbtJlfBNZV_wyyAoPY_scyn_jCNFD04XO1-JsztKAg/edit?usp=sharing
Really hope someone here can help me out.
Thanks in advance
CodePudding user response:
Use ast.literal_eval
to convert the strings to a python data structure (list of dict here) then explode and extract 'value' key:
import ast
out = df[['ad_name']].join(
df['video_play_curve_actions'].apply(ast.literal_eval).explode()
.apply(lambda x: x['value']).explode()
).reset_index(drop=True)
Output:
>>> out
ad_name video_play_curve_actions
0 ad_1 100
1 ad_1 40
2 ad_1 16
3 ad_1 10
4 ad_1 7
.. ... ...
105 ad_5 0
106 ad_5 0
107 ad_5 0
108 ad_5 0
109 ad_5 0
[110 rows x 2 columns]
Note: The best way is probably to use directly the response from Facebook Marketing API rather than load data from an excel file.
CodePudding user response:
You're looking for pandas.explode
: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html, plus some necessary preprocessing for your video_play_curve_actions
column.
import pandas as pd
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--filepath')
args = parser.parse_args()
df = pd.read_csv(args.filepath)
df['video_play_curve_actions'] = df['video_play_curve_actions'].apply(lambda x: eval(x)[0]['value'])
df_exploded = df.explode('video_play_curve_actions').rename(columns={'video_play_curve_actions': 'value'})
print(df_exploded.head())
output:
ad_name value
0 ad_1 100
0 ad_1 40
0 ad_1 16
0 ad_1 10
0 ad_1 7
Note that I used eval
here to process the values in video_play_curve_actions
which isn't always considered best practice. If the input contained double quotes "
instead of single quotes '
we could have used json.loads
instead.
CodePudding user response:
One way would be to grab the values of your video_play_curve_actions using str.split()
, and then explode()
:
(
df.set_index('ad_name')\
.video_play_curve_actions.str.split('[').str[-1].str[:-3]\
.str.split(',').explode().str.strip().reset_index()
)
prints:
ad_name video_play_curve_actions
0 ad_1 100
1 ad_1 40
2 ad_1 16
3 ad_1 10
4 ad_1 7
.. ... ...
105 ad_5 0
106 ad_5 0
107 ad_5 0
108 ad_5 0
109 ad_5 0