I have a list of five car brands (BMW, Ford, Porsche, Skoda, Mazda) the data in my dataframe will only be from one of these five brands.
Model column will always contain the Brand but sometimes Brand column is not populated. Is there a way to update Brand based on the Model containing the string required. The model may also not always be at the start of the string.
My Dataframe:
Brand Model
0 nan Mazda CX-5
1 BMW BMW X5
2 BMW BMW 2 Series
3 Ford Ford Focus
4 Porsche Porsche Cayenne
5 Skoda Skoda Octavia
6 nan 911 Porsche
7 nan 8 Series BMW
My Desired Dataframe:
Brand Model
0 Mazda Mazda CX-5
1 BMW BMW X5
2 BMW BMW 2 Series
3 Ford Ford Focus
4 Porsche Porsche Cayenne
5 Skoda Skoda Octavia
6 Porsche 911 Porsche
7 BMW 8 Series BMW
CodePudding user response:
You can use a regex to extract the first (letters-only) word and fillna
.
df['Brand'] = df['Brand'].fillna(df['Model'].str.extract('([a-zA-Z] )',
expand=False))
output:
Brand Model
0 Mazda Mazda CX-5
1 BMW BMW X5
2 BMW BMW 2 Series
3 Ford Ford Focus
4 Porsche Porsche Cayenne
5 Skoda Skoda Octavia
6 Porsche 911 Porsche
NB. If the brand name is not always the first letters-only word, you need to give more details on how to identify it.
using a known list of brands
brands = ['BMW', 'Ford', 'Porsche', 'Skoda', 'Mazda']
regex = '(%s)' % '|'.join(brands)
df['Brand'] = df['Brand'].fillna(df['Model'].str.extract(regex, expand=False))