I load in data from a CSV, one of the columns has this format:
!Color1:Color2:Color3!
!White:Green:Black!
!Green:Blue:Yellow!!Red:Brown:Blue!!White:Green:Black!
!Green:Blue:Yellow!!White:Green:Black!
!Red:Brown:Blue!!White:Green:Black!
I want to discard all of the other columns, pick this one out, then split this one into this:
0 1 2
0 White:Green:Black None None
1 Green:Blue:Yellow Red:Brown:Blue White:Green:Black
2 Green:Blue:Yellow White:Green:Black None
3 Red:Brown:Blue White:Green:Black None
Below is how I tried to do it:
df = pd.read_csv(csv_path, index_col=False)
new_df = df['!Color1:Color2:Color3!'].str.split('!', expand=True)
But it ends up like this:
0 1 2 3 4 5
0 None White:Green:Black None None None None
1 None Green:Blue:Yellow None Red:Brown:Blue None White:Green:Black
2 None Green:Blue:Yellow None White:Green:Black None None
3 None Red:Brown:Blue None White:Green:Black None None
So it interprets the first "!" as a field of its own, and so it adds empty fields between the "parts".
Bonus question:
After that is achieved, how do I pick out the middle color in each column, like this?:
0 1 2
0 Green None None
1 Blue Brown Green
2 Blue Green None
3 Brown Green None
CodePudding user response:
Add Series.str.strip
for avoid last and first columns filled by empty strings and regex !{1,}
for split 1 or multiple !
:
new_df = df['!Color1:Color2:Color3!'].str.strip('!').str.split('!{1,}', expand=True)
print (new_df)
0 1 2
0 White:Green:Black None None
1 Green:Blue:Yellow Red:Brown:Blue White:Green:Black
2 Green:Blue:Yellow White:Green:Black None
3 Red:Brown:Blue White:Green:Black None
Also if need second splitted values by :
use custom lambda function:
new_df = (df['!Color1:Color2:Color3!']
.str.strip('!')
.str.split('!{1,}', expand=True)
.apply(lambda x: x.str.split(':').str[1]))
print (new_df)
0 1 2
0 Green None None
1 Blue Brown Green
2 Blue Green None
3 Brown Green None