I have this data
df = pd.DataFrame(data={'var1':['A','B','C', 'D'], 'var2':['something','something#else','something#else','something']})
var1 var2
0 A something
1 B something#else
2 C something#else
3 D something
I need to split the cells by '#' in two different rows. I also need to update de var1 column so the dataframe ends like this:
var1 var2
0 A something
1 B.1 something
1 B.2 else
1 C.1 something
2 C.2 else
3 D something
What I have tryed so far is working around str.split() method. I have splited correctly the string but I don't know how to continue in orther to achive the rest.
CodePudding user response:
IIUC, you could do:
(df
.assign(var2=df['var2'].str.split('#'))
.explode('var2')
.assign(var1=lambda d: d['var1'].mask(d['var1'].duplicated(keep=False),
d['var1'] '.' d.groupby('var1').cumcount().add(1).astype(str)))
)
Alternative syntax:
df['var2'] = df['var2'].str.split('#')
df = df.explode('var2')
g = df.groupby('var1')['var1']
suffix = '.' g.cumcount().add(1).astype(str)
df['var1'] = suffix.where(g.transform('size').gt(1), '')
output:
var1 var2
0 A something
1 B.1 something
1 B.2 else
2 C.1 something
2 C.2 else
3 D something