I am using pandas and np.where to fill a new column if multiple conditions are met.
For this I am using the following database (but then 100 times bigger).
What I am doing now is:
df['new_column'] = np.where((df['year'] == 2018) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 12, 10)
df['new_column'] = np.where((df['year'] == 2019) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 15, 10)
df['new_column'] = np.where((df['year'] == 2020) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 18, 10)
df['new_column'] = np.where((df['year'] == 2021) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 21, 10)
df['new_column'] = np.where((df['year'] == 2022) & (df['price'] > 30000) & (df['fuel description'] == Petrol), 24, 10)
As you can see I am only changing the condition for the column: "year".
I am looking for an efficient way to use the other two conditions (price and fuel description) because I am just copying them now.
Looking forward to your answers!
CodePudding user response:
Try this,
condition = (df['year'] >= 2018) & (df['year'] <= 2022) & (df['price'] > 30000) \
& (df['fuel description'] == Petrol)
df['new_column'] = np.where(condition, 12 (df['year']-2018)*3, 10)
CodePudding user response:
You can always wrap your duplicate code into a function to avoid repeating yourself, something like this:
def get_year_condition(df, year):
return df['year'] == year & df['price'] > 30000 & df['fuel description'] == 'Petrol'
And then use like this:
df['new_column'] = np.where(get_year_condition(df, 2021), 21, 10)
df['new_column'] = np.where(get_year_condition(df, 2022), 24, 10)
...