Say I have df as follows:
MyCol
Red Motor
Blue Taxi
Green Taxi-1
Light blue small Taxi-1
Light blue big Taxi-2
I would like to split the color and the vehicle into two columns. I used this command to split the last word (could be any character).
The last word (could be any character, like taxi or taxi-1) refers to the vehicle. Sometimes, there is a 'big' or 'small' associated with the car name. The first few words (can be one or more than one words) refers to the color.
This is what I have tried. It only works when the last word is a word without special characters. How can I include the case when special characters in the last word too?
df['MyCol'].str.extract('^(.*?)\s((?:small|big)?\s?\w ).*$')
CodePudding user response:
df['MyCol'].str.extract('^(.*?)\s((?:small|big|)\s?\S )$')
resulting in:
0 | 1 | |
---|---|---|
0 | Red | Motor |
1 | Blue | Taxi |
2 | Green | Taxi-1 |
3 | Light blue | small Taxi-1 |
4 | Light blue | big Taxi |