Home > Mobile >  Split a column with two different delimiters in three column in one line of code
Split a column with two different delimiters in three column in one line of code

Time:05-17

Can I split a column into three columns in one line even when the delimiter is different?

Example


ColA
chr2:000001-000002

Expected


Chr    Start      end
chr2   000001     000002

The code I am looking for should be something like this

df[['Chr','Start','End']] = ...

I have been told that this is impossible. I have been trying part of the day without luck.

CodePudding user response:

Try this using pd.Series.str.split:

df = pd.DataFrame({'ColA':'chr2:000001-000002'}, index=[0])

df[['Chr', 'Start', 'End']] = df['ColA'].str.split(':|-', expand=True)

Output:

                 ColA   Chr   Start     End
0  chr2:000001-000002  chr2  000001  000002

CodePudding user response:

df[['Chr', 'Start', 'End']] = df['ColA'].str.split('[:-]',expand = True)

df 
                 ColA   Chr   Start     End
0  chr2:000001-000002  chr2  000001  000002

CodePudding user response:

df2[['Chr','Start','End']] = df['ColA'].str.replace(":","-").str.split("-", expand=True)

out:

    Chr   Start     End
0  chr2  000001  000002

CodePudding user response:

Not directly using split, but if your format is always consistent (which seems to be your case), a nice approach might be to use str.extract with named capturing groups:

df['ColA'].str.extract(r'(?P<Chr>\w ):(?P<Start>\d )-(?P<End>\d )')

Output:

    Chr   Start     End
0  chr2  000001  000002
  • Related