I have a column of strings that include two unique columns - a column of two letters, and then a column of Y or E. In both cases they're not always filled. I'm hoping to split them into two new columns:
import pandas as pd
d={'col1':['AD','Y','E','AD E','WZ E'],
'col1_new':['AD','','','AD','WZ'],
'col2_new':['','Y','E','E','E']}
pd.DataFrame(d)
Any tips?
CodePudding user response:
You can use a regex to extract the two parts:
df[['col1_new', 'col2_new']] = (df['col1'].str.extract('(\w{2})?\s*(\w)?')
.fillna('')
)
output:
col1 col1_new col2_new
0 AD AD
1 Y Y
2 E E
3 AD E AD E
4 WZ E WZ E
regex
(\w{2})? # capture 2 ({2}) "word" characters (\w), optionally (?)
\s* # match zero or more (*) spaces (\s)
(\w)? # capture 1 "word" character, optionally