Home > Net >  Efficiently using np.where
Efficiently using np.where

Time:11-17

I am using pandas and np.where to fill a new column if multiple conditions are met.

For this I am using the following database (but then 100 times bigger).

enter image description here

What I am doing now is:

df['new_column'] = np.where((df['year'] == 2018) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 12, 10)
df['new_column'] = np.where((df['year'] == 2019) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 15, 10)
df['new_column'] = np.where((df['year'] == 2020) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 18, 10)
df['new_column'] = np.where((df['year'] == 2021) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 21, 10)
df['new_column'] = np.where((df['year'] == 2022) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 24, 10)

As you can see I am only changing the condition for the column: "year".

I am looking for an efficient way to use the other two conditions (price and fuel description) because I am just copying them now.

Looking forward to your answers!

CodePudding user response:

Try this,

condition = (df['year'] >= 2018) & (df['year'] <= 2022) & (df['price'] > 30000) \
            & (df['fuel description'] == Petrol)
df['new_column'] = np.where(condition, 12   (df['year']-2018)*3, 10)

CodePudding user response:

You can always wrap your duplicate code into a function to avoid repeating yourself, something like this:

def get_year_condition(df, year):
    return df['year'] == year & df['price'] > 30000 & df['fuel description'] == 'Petrol'

And then use like this:

df['new_column'] = np.where(get_year_condition(df, 2021), 21, 10)
df['new_column'] = np.where(get_year_condition(df, 2022), 24, 10)
...
  • Related