Home > other >  Python/Pandas: Update a field in same row if another field in row contains criteria in string
Python/Pandas: Update a field in same row if another field in row contains criteria in string

Time:04-13

I have a list of five car brands (BMW, Ford, Porsche, Skoda, Mazda) the data in my dataframe will only be from one of these five brands.

Model column will always contain the Brand but sometimes Brand column is not populated. Is there a way to update Brand based on the Model containing the string required. The model may also not always be at the start of the string.

My Dataframe:

  Brand    Model
0 nan      Mazda CX-5
1 BMW      BMW X5
2 BMW      BMW 2 Series
3 Ford     Ford Focus
4 Porsche  Porsche Cayenne
5 Skoda    Skoda Octavia
6 nan      911 Porsche
7 nan      8 Series BMW

My Desired Dataframe:

  Brand    Model
0 Mazda    Mazda CX-5
1 BMW      BMW X5
2 BMW      BMW 2 Series
3 Ford     Ford Focus
4 Porsche  Porsche Cayenne
5 Skoda    Skoda Octavia
6 Porsche  911 Porsche
7 BMW      8 Series BMW

CodePudding user response:

You can use a regex to extract the first (letters-only) word and fillna.

df['Brand'] = df['Brand'].fillna(df['Model'].str.extract('([a-zA-Z] )',
                                                         expand=False))

output:

     Brand            Model
0    Mazda       Mazda CX-5
1      BMW           BMW X5
2      BMW     BMW 2 Series
3     Ford       Ford Focus
4  Porsche  Porsche Cayenne
5    Skoda    Skoda Octavia
6  Porsche      911 Porsche

NB. If the brand name is not always the first letters-only word, you need to give more details on how to identify it.

using a known list of brands
brands = ['BMW', 'Ford', 'Porsche', 'Skoda', 'Mazda']
regex = '(%s)' % '|'.join(brands)
df['Brand'] = df['Brand'].fillna(df['Model'].str.extract(regex, expand=False))
  • Related