Home > OS >  Pandas DataFrame from nested list of dict
Pandas DataFrame from nested list of dict

Time:11-06

I'm recieving data from facebook api to list of dictionaries. I want to create pandas DataFrame from this list.

data = [{'account_id': '1234', 'account_name': 'account1', 'adset_id': '238', 'adset_name': 'Audio-Video', 'impressions': '24140', 'actions': [{'action_type': 'onsite_conversion.post_save', 'value': '1'}, {'action_type': 'link_click', 'value': '1081'}], 'date_start': '2022-11-01', 'date_stop': '2022-11-01'},
{'account_id': '1234', 'account_name': 'account2', 'adset_id': '591', 'adset_name': 'Moto', 'impressions': '16657', 'actions': [{'action_type': 'link_click', 'value': '620'}, {'action_type': 'post', 'value': '1'}], 'date_start': '2022-11-01', 'date_stop': '2022-11-01'} ]

But one of the column (actions) is nested. I want to "unpack" this to new columns.

I'm trying

df = pandas.DataFrame.from_dict(res)

The result is:

  account_id account_name adset_id   adset_name impressions                                                                                                                  actions  date_start   date_stop
0       1234     account1      238  Audio-Video       24140  [{'action_type': 'onsite_conversion.post_save', 'value': '1'}, {'action_type': 'link_click', 'value': '1081'}]  2022-11-01  2022-11-01
1       1234     account2      591         Moto       16657  [{'action_type': 'link_click', 'value': '620'}, {'action_type': 'post', 'value': '1'}]                          2022-11-01  2022-11-01

But the desired result should be:

  account_id account_name adset_id   adset_name impressions  onsite_conversion.post_save  link_click         post  date_start   date_stop
0       1234     account1      238  Audio-Video       24140                            1        1081            0  2022-11-01  2022-11-01
1       1234     account2      591         Moto       16657                            0         620            1  2022-11-01  2022-11-01

What is the most effective way to do it with pandas? Thank you!

CodePudding user response:

IIUC, you can try to json_normalize and pivot:

df = pd.DataFrame(data)

s = df.pop('actions').explode()

df = df.join(pd
   .json_normalize(s).set_axis(s.index)
   .reset_index()
   .pivot_table(index='index', columns='action_type', values='value', fill_value=0)
 )

print(df)

Output:

  account_id account_name adset_id   adset_name impressions  date_start  \
0       1234     account1      238  Audio-Video       24140  2022-11-01   
1       1234     account2      591         Moto       16657  2022-11-01   

    date_stop  link_click  onsite_conversion.post_save  post  
0  2022-11-01        1081                            1     0  
1  2022-11-01         620                            0     1  
  • Related