Say I have df as follows:
MyCol
Red Motor
Green Taxi
Light blue small Taxi
Light blue big Taxi
I would like to split the color and the vehicle into two columns. I used this command to split the last word. But sometimes, there is a 'big' or 'small' associated with the car name. How can do the splitting with conditions?
df[['color','vehicle']] = df.myCol.str.rsplit(pat=' ', n=1, expand=True)
CodePudding user response:
I think the best approach is to use extract
with a regex
pattern
df['MyCol'].str.extract('^(.*?)\s((?:small|big)?\s?\w )$')
0 1
0 Red Motor
1 Green Taxi
2 Light blue small Taxi
3 Light blue big Taxi
Regex details:
^
: Matches start of the string(.*?)
: first capturing group.*?
: matches any character zero or more times but as few times as possible (lazy match)
\s
: Matches the space((?:small|big)?\s?\w )
: Second capturing group(?:small|big)?
: matches small or big zero or one time\s?
: matches space zero or one time\w
: matches word characters oner or more times
$
: matches end of the string
The Series.str.extract
is used here to extracts two groups using a regular expression. The first group is before a whitespace and the second group is after the whitespace. The second group may contain the word "small" or "big" and returns a new DataFrame with two columns containing the extracted groups.
CodePudding user response:
import pandas as pd
# create the dataframe
data = {'MyCol': ['Red Motor', 'Green Taxi', 'Light blue small Taxi', 'Light blue big Taxi']}
df = pd.DataFrame(data)
# create new columns for color and vehicle
df['color'] = ''
df['vehicle'] = ''
# iterate through rows of the dataframe
for i, row in df.iterrows():
words = row['MyCol'].split()
if words[-1] == 'big' or words[-1] == 'small':
# if last word is 'big' or 'small'
df.at[i, 'color'] = ' '.join(words[:-2])
df.at[i, 'vehicle'] = words[-2] ' ' words[-1]
else:
# if last word is not 'big' or 'small'
df.at[i, 'color'] = ' '.join(words[:-1])
df.at[i, 'vehicle'] = words[-1]
# print the resulting dataframe
print(df)
str.split()
method to split the string into words, then it checks if the last word is "big" or "small" and assigns the color and vehicle accordingly.
I could have done it with regular expression but it has some annoying test cases.