Home > Net >  Split a cell in two rows by delimiter
Split a cell in two rows by delimiter

Time:03-15

I have this data

df = pd.DataFrame(data={'var1':['A','B','C', 'D'], 'var2':['something','something#else','something#else','something']})
    var1   var2
0   A      something
1   B      something#else
2   C      something#else
3   D      something

I need to split the cells by '#' in two different rows. I also need to update de var1 column so the dataframe ends like this:

    var1   var2
0   A      something
1   B.1    something
1   B.2    else
1   C.1    something
2   C.2    else
3   D      something

What I have tryed so far is working around str.split() method. I have splited correctly the string but I don't know how to continue in orther to achive the rest.

CodePudding user response:

IIUC, you could do:

(df
 .assign(var2=df['var2'].str.split('#'))
 .explode('var2')
 .assign(var1=lambda d: d['var1'].mask(d['var1'].duplicated(keep=False),
                                       d['var1'] '.' d.groupby('var1').cumcount().add(1).astype(str)))
)

Alternative syntax:

df['var2'] = df['var2'].str.split('#')
df = df.explode('var2')
g = df.groupby('var1')['var1']
suffix = '.' g.cumcount().add(1).astype(str)
df['var1']  = suffix.where(g.transform('size').gt(1), '')

output:

  var1       var2
0    A  something
1  B.1  something
1  B.2       else
2  C.1  something
2  C.2       else
3    D  something
  • Related