I have a DataFrame that looks somewhat like this:
df = pd.DataFrame({'A': ['a', 'b', 'x', 'y'],
'B': ['{c,d}', '{e,f,g}', '', '{}']})
I want to remove the braces/curly brackets and explode each of the elements into its own row. So it would look something like this in the end:
df = pd.DataFrame({'A': ['a', 'a', 'b', 'b', 'b', 'x', 'y'],
'B': ['c', 'd', 'e', 'f', 'g', '', '']})
I have tried to first eliminate the curly brackets with
df['B'] = df['B'].str[1:-1] #this works
then expand/explode the elements with
df.set_index('A').B.str.split(',', expand=True).stack().reset_index('A') #this doesn't work
I have tried more ways to fix the latter part. However, I think even if it works, the code still does a very inefficient job as it takes a bit long (~ 2mins) on my dataset of around 10k rows. Is there a better approach to this?
CodePudding user response:
You can just use a combination of apply
and explode
functions:
import pandas as pd
df = pd.DataFrame({'A': ['a', 'b'],
'B': ['{c,d}', '{e,f,g}']})
df["B"] = df["B"].apply(lambda x: x[1:-1].split(","))
df = df.explode("B", ignore_index=True)
print(df)
[Out]:
A B
0 a c
1 a d
2 b e
3 b f
4 b g
EDIT: Figured out the explode
has an ignore_index
keyword that can be useful in this case.