I have a dataframe which looks something like this:
ex = [{"A": "Germany", "word": "Danke", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]},
{"A": "French", "word": "merci", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]},
{"A": "English", "word": "Thank you", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}]
df = pd.DataFrame(ex)
A word explode1 explode2
Germany Danke [[berlin, frankfurt], [Paris], [London, Glassgow]] [[food, cars], [fashion], [pound, cricket]]
French merci [[berlin, frankfurt], [Paris], [London, Glassgow]] [[food, cars], [fashion], [pound, cricket]]
English Thank you [[berlin, frankfurt], [Paris], [London, Glassgow]] [[food, cars], [fashion], [pound, cricket]]
I want to explode the dataframe based on the columns "explode1" and "explode2". If you notice they have the same number of list items ie:3, but the list contains all the items for the rows. So I want to explode the dataframe to look something like this:
A word explode1 explode2
Germany Danke [berlin] [food]
Germany Danke [frankfurt] [cars]
French merci [Paris] [fashion]
English Thank you [London] [pound]
English Thank you [Glassgow] [cricket]
How do I do this? Was looking at the pandas.explode option but how we handle the complete list in every row because for eg: I only want the first row to expand based on the first item of the list in the "explode1" and "explode2" column not on the others items in the list.
CodePudding user response:
explode
won't really work since the list items have to merged by index first. A possible workflow it to do this first, then explode on to_explode
, finally creating a new df using concat
:
import pandas as pd
ex = [{"A": "Germany", "word": "Danke", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]},
{"A": "French", "word": "merci", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]},
{"A": "English", "word": "Thank you", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}]
df = pd.DataFrame(ex)
df['to_explode'] = df.apply(lambda row: list(zip(row['explode1'][row.name], row['explode2'][row.name])), axis=1)
df = df[['A', 'word', 'to_explode']].explode('to_explode').reset_index(drop=True)
result_df = pd.concat([df[['A', 'word']], pd.DataFrame(df['to_explode'].values.tolist(), columns=['explode1', 'explode2'])], axis=1)
Result:
A | word | explode1 | explode2 | |
---|---|---|---|---|
0 | Germany | Danke | berlin | food |
1 | Germany | Danke | frankfurt | cars |
2 | French | merci | Paris | fashion |
3 | English | Thank you | London | pound |
4 | English | Thank you | Glassgow | cricket |