Home > Enterprise >  Explode elements in braces/curly brackets separated by comma and no space (e.g. {a,b})
Explode elements in braces/curly brackets separated by comma and no space (e.g. {a,b})

Time:06-17

I have a DataFrame that looks somewhat like this:

df = pd.DataFrame({'A': ['a', 'b', 'x', 'y'], 
                   'B': ['{c,d}', '{e,f,g}', '', '{}']})

I want to remove the braces/curly brackets and explode each of the elements into its own row. So it would look something like this in the end:

df = pd.DataFrame({'A': ['a', 'a', 'b', 'b', 'b', 'x', 'y'], 
                   'B': ['c', 'd', 'e', 'f', 'g', '', '']})

I have tried to first eliminate the curly brackets with

df['B'] = df['B'].str[1:-1] #this works

then expand/explode the elements with

df.set_index('A').B.str.split(',', expand=True).stack().reset_index('A') #this doesn't work

I have tried more ways to fix the latter part. However, I think even if it works, the code still does a very inefficient job as it takes a bit long (~ 2mins) on my dataset of around 10k rows. Is there a better approach to this?

CodePudding user response:

You can just use a combination of apply and explode functions:

import pandas as pd 

df = pd.DataFrame({'A': ['a', 'b'], 
                   'B': ['{c,d}', '{e,f,g}']})
df["B"]  = df["B"].apply(lambda x: x[1:-1].split(","))
df = df.explode("B", ignore_index=True)
print(df)
[Out]:
   A  B
0  a  c
1  a  d
2  b  e
3  b  f
4  b  g

EDIT: Figured out the explode has an ignore_index keyword that can be useful in this case.

  • Related