I have a dataset that has a column containing a list of dictionaries. I want to split the column into keys as columns and fill in the values
edit: every cell in the column contains a list of dictionaries, I want to split them into the same row
sample of the data:
[{'genre': 'Adventure, Indie, RPG',
'item_id': '326950',
'discounted_price': '$8.99',
'item_url': 'http://store.steampowered.com/app/326950',
'item_name': 'Sword of Asumi'},
{'genre': 'Adventure, Indie, RPG',
'item_id': '331490',
'discounted_price': '$2.99',
'item_url': 'http://store.steampowered.com/app/331490',
'item_name': 'Sword of Asumi - Soundtrack'}]
The output that I want to be like :
genre_0 | item_id_0 | discounted_price_0 | item_url_0 | item_name_0 | genre_1 | item_id_1 | ....etc |
---|---|---|---|---|---|---|---|
One | Two | Three | One | Two | Three | One | ---- |
CodePudding user response:
You can use pd.DataFrame.from_dict
like this:
import pandas as pd
df = pd.DataFrame()
df['data'] = [[{'genre': 'Adventure, Indie, RPG',
'item_id': '326950',
'discounted_price': '$8.99',
'item_url': 'http://store.steampowered.com/app/326950',
'item_name': 'Sword of Asumi'},
{'genre': 'Adventure, Indie, RPG',
'item_id': '331490',
'discounted_price': '$2.99',
'item_url': 'http://store.steampowered.com/app/331490',
'item_name': 'Sword of Asumi - Soundtrack'}]]
df2 = df.copy()
for _, row in df.iterrows():
for i, dict_ in enumerate(row['data']):
new_cols = pd.DataFrame.from_dict([dict_]).rename(columns={key: f"{key}_{i}" for key in dict_.keys()})
new_row = new_cols if i==0 else pd.concat([new_row, new_cols], axis=1)
df2 = pd.concat([df2, new_row], axis=1)
print(df2.columns)
Index(['data', 'genre_0', 'item_id_0', 'discounted_price_0', 'item_url_0',
'item_name_0', 'genre_1', 'item_id_1', 'discounted_price_1',
'item_url_1', 'item_name_1'],
dtype='object')
CodePudding user response:
I created a function to solve this problem
function to split the list of dictionaries in a column
def split_list(df, column):
for _, row in df.iterrows():
for i , dict_ in enumerate(row[column]):
for key, value in dict_.items():
df.loc[_, column '_' key '_' str(i)] = value
# drop columns with more than 50% missing values
df.dropna(axis=1, thresh=df.shape[0]*0.5, inplace=True)
return df
CodePudding user response:
You can use concat
of dataframes created from list comprehension like below:
lst = [{'genre': 'Adventure, Indie, RPG',
'item_id': '326950',
'discounted_price': '$8.99',
'item_url': 'http://store.steampowered.com/app/326950',
'item_name': 'Sword of Asumi'},
{'genre': 'Adventure, Indie, RPG',
'item_id': '331490',
'discounted_price': '$2.99',
'item_url': 'http://store.steampowered.com/app/331490',
'item_name': 'Sword of Asumi - Soundtrack'}]
pd.concat([pd.DataFrame(d, index=[0]).rename(('{}_' str(i)).format,
axis=1) for i, d in enumerate(lst)], axis=1)
output:
genre_0 item_id_0 discounted_price_0 item_url_0 item_name_0 genre_1 item_id_1 discounted_price_1 item_url_1 item_name_1
0 Adventure, Indie, RPG 326950 $8.99 http://store.steampowered.com/app/326950 Sword of Asumi Adventure, Indie, RPG 331490 $2.99 http://store.steampowered.com/app/331490 Sword of Asumi - Soundtrack