Home > Blockchain >  Splitting dataframe with not just words
Splitting dataframe with not just words

Time:02-04

Say I have df as follows:

MyCol
Red Motor
Blue Taxi
Green Taxi-1
Light blue small Taxi-1 
Light blue big Taxi-2

I would like to split the color and the vehicle into two columns. I used this command to split the last word (could be any character).

The last word (could be any character, like taxi or taxi-1) refers to the vehicle. Sometimes, there is a 'big' or 'small' associated with the car name. The first few words (can be one or more than one words) refers to the color.

This is what I have tried. It only works when the last word is a word without special characters. How can I include the case when special characters in the last word too?

df['MyCol'].str.extract('^(.*?)\s((?:small|big)?\s?\w ).*$')

CodePudding user response:

df['MyCol'].str.extract('^(.*?)\s((?:small|big|)\s?\S )$')

resulting in:

0 1
0 Red Motor
1 Blue Taxi
2 Green Taxi-1
3 Light blue small Taxi-1
4 Light blue big Taxi
  • Related