Home > database >  Convert single column in dataframe to rows | Pandas Facebook Ads
Convert single column in dataframe to rows | Pandas Facebook Ads

Time:09-21

Hello Stackoverflowers!

I have a DataFrame, obtained from Facebook Marketing API, and I'd like to "unnest" a column into several rows.

This is the a sample of the data I obtained through the API:

ad_name      video_play_curve_actions
ad_1         [{'action_type': 'video_view', 'value': [100, 40, 16, 10, 7, 5, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}]
ad_2         [{'action_type': 'video_view', 'value': [100, 51, 22, 13, 9, 7, 6, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}]

What I am looking for, is an outcome that looks like illustrated below

enter image description here

I'd need a for loop for this because the solution needs to work for many 100 rows.

I've added example data and desired output to this sheet: https://docs.google.com/spreadsheets/d/1jjbtJlfBNZV_wyyAoPY_scyn_jCNFD04XO1-JsztKAg/edit?usp=sharing

Really hope someone here can help me out.

Thanks in advance

CodePudding user response:

Use ast.literal_eval to convert the strings to a python data structure (list of dict here) then explode and extract 'value' key:

import ast

out = df[['ad_name']].join(
          df['video_play_curve_actions'].apply(ast.literal_eval).explode()
                                        .apply(lambda x: x['value']).explode()
      ).reset_index(drop=True)

Output:

>>> out
    ad_name video_play_curve_actions
0      ad_1                      100
1      ad_1                       40
2      ad_1                       16
3      ad_1                       10
4      ad_1                        7
..      ...                      ...
105    ad_5                        0
106    ad_5                        0
107    ad_5                        0
108    ad_5                        0
109    ad_5                        0

[110 rows x 2 columns]

Note: The best way is probably to use directly the response from Facebook Marketing API rather than load data from an excel file.

CodePudding user response:

You're looking for pandas.explode: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html, plus some necessary preprocessing for your video_play_curve_actions column.

import pandas as pd
import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--filepath')
    args = parser.parse_args()

    df = pd.read_csv(args.filepath)
    df['video_play_curve_actions'] = df['video_play_curve_actions'].apply(lambda x: eval(x)[0]['value'])
    df_exploded = df.explode('video_play_curve_actions').rename(columns={'video_play_curve_actions': 'value'})
    print(df_exploded.head())

output:

  ad_name value
0    ad_1   100
0    ad_1    40
0    ad_1    16
0    ad_1    10
0    ad_1     7

Note that I used eval here to process the values in video_play_curve_actions which isn't always considered best practice. If the input contained double quotes " instead of single quotes ' we could have used json.loads instead.

CodePudding user response:

One way would be to grab the values of your video_play_curve_actions using str.split(), and then explode():

(
  df.set_index('ad_name')\
        .video_play_curve_actions.str.split('[').str[-1].str[:-3]\
            .str.split(',').explode().str.strip().reset_index()
            )

prints:

    ad_name video_play_curve_actions
0      ad_1                      100
1      ad_1                       40
2      ad_1                       16
3      ad_1                       10
4      ad_1                        7
..      ...                      ...
105    ad_5                        0
106    ad_5                        0
107    ad_5                        0
108    ad_5                        0
109    ad_5                        0
  • Related