Can I split a column into three columns in one line even when the delimiter is different?
Example
ColA
chr2:000001-000002
Expected
Chr Start end
chr2 000001 000002
The code I am looking for should be something like this
df[['Chr','Start','End']] = ...
I have been told that this is impossible. I have been trying part of the day without luck.
CodePudding user response:
Try this using pd.Series.str.split:
df = pd.DataFrame({'ColA':'chr2:000001-000002'}, index=[0])
df[['Chr', 'Start', 'End']] = df['ColA'].str.split(':|-', expand=True)
Output:
ColA Chr Start End
0 chr2:000001-000002 chr2 000001 000002
CodePudding user response:
df[['Chr', 'Start', 'End']] = df['ColA'].str.split('[:-]',expand = True)
df
ColA Chr Start End
0 chr2:000001-000002 chr2 000001 000002
CodePudding user response:
df2[['Chr','Start','End']] = df['ColA'].str.replace(":","-").str.split("-", expand=True)
out:
Chr Start End
0 chr2 000001 000002
CodePudding user response:
Not directly using split
, but if your format is always consistent (which seems to be your case), a nice approach might be to use str.extract
with named capturing groups:
df['ColA'].str.extract(r'(?P<Chr>\w ):(?P<Start>\d )-(?P<End>\d )')
Output:
Chr Start End
0 chr2 000001 000002