I have a dataset looks like this:
Car | Make | Model | Engine |
---|---|---|---|
Toyota Rav 4 8cyl6L | Toyota | 8cyl6L | |
Mitsubishi Eclipse 2.1T | Mitsubishi | 2.1T | |
Monster Gravedigger 25Lsc | Monster | 25Lsc |
The data was clearly concatenated from Make Model Engine at some point but the car Model was not provided to me.
I've been trying to use Pandas to say that if we take Car, replace instances of Make with a nothing, replace instances of Engine with nothing, then trim the spaces around the result, we will have Model.
Car | Make | Model | Engine |
---|---|---|---|
Toyota Rav 4 8cyl6L | Toyota | Rav 4 | 8cyl6L |
Mitsubishi Eclipse 2.1T | Mitsubishi | Eclipse | 2.1T |
Monster Gravedigger 25Lsc | Monster | Gravedigger | 25Lsc |
There's something I'm doing wrong when I'm trying to reference another column in this manner.
df['Model'] = df['Car'].str.replace(df['Make'],'')
gives me an error of "unhashable type: 'Series'". I'm guessing I'm accidentally inputting the entire 'Make' column.
At every row I want to make a different substitution using data from other columns in that row. How would I accomplish this?
CodePudding user response:
you can use:
df['Model']=df.apply(lambda x: x['Car'].replace(x['Make'],"").replace(x['Engine'],""),axis=1)
print(df)
'''
Car Make Model Engine
0 Toyota Rav 4 8cyl6L Toyota Rav 4 8cyl6L
1 Mitsubishi Eclipse 2.1T Mitsubishi Eclipse 2.1T
2 Monster Gravedigger 25Lsc Monster Gravedigger 25Lsc
'''
CodePudding user response:
A regex
proposition using re.sub
:
import re
df['Model'] = [re.sub(f'{b}|{c}', '', a) for a,b,c in zip(df['Car'], df['Make'], df["Engine"])]
# Output :
print(df)
Car Make Model Engine
0 Toyota Rav 4 8cyl6L Toyota Rav 4 8cyl6L
1 Mitsubishi Eclipse 2.1T Mitsubishi Eclipse 2.1T
2 Monster Gravedigger 25Lsc Monster Gravedigger 25Lsc