Home > Software design >  how to split pandas column contain list of dictionaries into keys as columns and fill the values?
how to split pandas column contain list of dictionaries into keys as columns and fill the values?

Time:10-16

I have a dataset that has a column containing a list of dictionaries. I want to split the column into keys as columns and fill in the values

edit: every cell in the column contains a list of dictionaries, I want to split them into the same row

sample of the data:

[{'genre': 'Adventure, Indie, RPG',
  'item_id': '326950',
  'discounted_price': '$8.99',
  'item_url': 'http://store.steampowered.com/app/326950',
  'item_name': 'Sword of Asumi'},
 {'genre': 'Adventure, Indie, RPG',
  'item_id': '331490',
  'discounted_price': '$2.99',
  'item_url': 'http://store.steampowered.com/app/331490',
  'item_name': 'Sword of Asumi - Soundtrack'}]

The output that I want to be like :

genre_0 item_id_0 discounted_price_0 item_url_0 item_name_0 genre_1 item_id_1 ....etc
One Two Three One Two Three One ----

enter image description here

CodePudding user response:

You can use pd.DataFrame.from_dict like this:

import pandas as pd

df = pd.DataFrame()

df['data'] = [[{'genre': 'Adventure, Indie, RPG',
  'item_id': '326950',
  'discounted_price': '$8.99',
  'item_url': 'http://store.steampowered.com/app/326950',
  'item_name': 'Sword of Asumi'},
 {'genre': 'Adventure, Indie, RPG',
  'item_id': '331490',
  'discounted_price': '$2.99',
  'item_url': 'http://store.steampowered.com/app/331490',
  'item_name': 'Sword of Asumi - Soundtrack'}]]

df2 = df.copy()
for _, row in df.iterrows():
    for i, dict_ in enumerate(row['data']):
        new_cols = pd.DataFrame.from_dict([dict_]).rename(columns={key: f"{key}_{i}" for key in dict_.keys()})
        new_row = new_cols if i==0 else pd.concat([new_row, new_cols], axis=1)
    df2 = pd.concat([df2, new_row], axis=1)

print(df2.columns)

Index(['data', 'genre_0', 'item_id_0', 'discounted_price_0', 'item_url_0',
       'item_name_0', 'genre_1', 'item_id_1', 'discounted_price_1',
       'item_url_1', 'item_name_1'],
      dtype='object')

CodePudding user response:

I created a function to solve this problem

function to split the list of dictionaries in a column

def split_list(df, column):
    for _, row in df.iterrows():
        for i , dict_ in enumerate(row[column]):
            for key, value in dict_.items():
                df.loc[_, column   '_'   key   '_'   str(i)] = value
    # drop columns with more than 50% missing values
    df.dropna(axis=1, thresh=df.shape[0]*0.5, inplace=True)
    return df

CodePudding user response:

You can use concat of dataframes created from list comprehension like below:

lst = [{'genre': 'Adventure, Indie, RPG',
  'item_id': '326950',
  'discounted_price': '$8.99',
  'item_url': 'http://store.steampowered.com/app/326950',
  'item_name': 'Sword of Asumi'},
 {'genre': 'Adventure, Indie, RPG',
  'item_id': '331490',
  'discounted_price': '$2.99',
  'item_url': 'http://store.steampowered.com/app/331490',
  'item_name': 'Sword of Asumi - Soundtrack'}]
pd.concat([pd.DataFrame(d, index=[0]).rename(('{}_'   str(i)).format,
          axis=1) for i, d in enumerate(lst)], axis=1)

output:

                 genre_0    item_id_0   discounted_price_0                                item_url_0       item_name_0                genre_1      item_id_1    discounted_price_1  item_url_1  item_name_1
0   Adventure, Indie, RPG      326950                $8.99  http://store.steampowered.com/app/326950    Sword of Asumi  Adventure, Indie, RPG         331490                 $2.99  http://store.steampowered.com/app/331490    Sword of Asumi - Soundtrack
  • Related