Home > database >  How to explode columns in a dataframe which has the complete list for every row which needs to be ex
How to explode columns in a dataframe which has the complete list for every row which needs to be ex

Time:11-02

I have a dataframe which looks something like this:

ex = [{"A": "Germany", "word": "Danke", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}, 
  {"A": "French", "word": "merci", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}, 
  {"A": "English", "word": "Thank you", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}]

df = pd.DataFrame(ex)

A       word        explode1                                            explode2
Germany Danke       [[berlin, frankfurt], [Paris], [London, Glassgow]]  [[food, cars], [fashion], [pound, cricket]]
French  merci       [[berlin, frankfurt], [Paris], [London, Glassgow]]  [[food, cars], [fashion], [pound, cricket]]
English Thank you   [[berlin, frankfurt], [Paris], [London, Glassgow]]  [[food, cars], [fashion], [pound, cricket]]

I want to explode the dataframe based on the columns "explode1" and "explode2". If you notice they have the same number of list items ie:3, but the list contains all the items for the rows. So I want to explode the dataframe to look something like this:

A       word        explode1    explode2
Germany Danke       [berlin]    [food]
Germany Danke       [frankfurt] [cars]
French  merci       [Paris]     [fashion]
English Thank you   [London]    [pound]
English Thank you   [Glassgow]  [cricket]

How do I do this? Was looking at the pandas.explode option but how we handle the complete list in every row because for eg: I only want the first row to expand based on the first item of the list in the "explode1" and "explode2" column not on the others items in the list.

CodePudding user response:

explode won't really work since the list items have to merged by index first. A possible workflow it to do this first, then explode on to_explode, finally creating a new df using concat:

import pandas as pd 

ex = [{"A": "Germany", "word": "Danke", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}, 
  {"A": "French", "word": "merci", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}, 
  {"A": "English", "word": "Thank you", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}]

df = pd.DataFrame(ex)

df['to_explode'] = df.apply(lambda row: list(zip(row['explode1'][row.name], row['explode2'][row.name])), axis=1)
df = df[['A', 'word', 'to_explode']].explode('to_explode').reset_index(drop=True)

result_df = pd.concat([df[['A', 'word']], pd.DataFrame(df['to_explode'].values.tolist(), columns=['explode1', 'explode2'])], axis=1)

Result:

A word explode1 explode2
0 Germany Danke berlin food
1 Germany Danke frankfurt cars
2 French merci Paris fashion
3 English Thank you London pound
4 English Thank you Glassgow cricket
  • Related