Home > Software design >  Np.where with optional condition
Np.where with optional condition

Time:12-03

For my database I need to create a new column based on a condition. In a seperate file I have add all the conditions like this:

conditions = [{year: 2016, price: 30000, fuel: Petrol, result: 12},
              {year: 2017, price: 45000, fuel: Elektricity, result: 18},
              {year: 2018, price: None, fuel: Petrol, result: 14},

I am using the following code to add a new column

df['new_column'] = np.where((df['year'] == dict[year]) & (df['price'] > dict[price]) & (df['fuel description'] == dict[fuel])), dict[result], df['new_column'

As you see in the conditions, it is possible that one of the values is none. This means (in this case price: none) that the formula as follows:

df['new_column'] = np.where((df['year'] == dict[year]) & (df['fuel description'] == dict[fuel])), dict[result], df['new_column'

It is possible that multiple conditions are none, I want to prevent working with to much if and else statements but I can't figure out a way to do something in the formula that when it is none it needs to remove the condition.

Is this possible or should I just work with:

If dict[price] != None:
    df['new_column'] = np.where((df['year'] == dict[year]) & (df['price'] > dict[price]) & (df['fuel description'] == dict[fuel])), dict[result], df['new_column'
else:
    df['new_column'] = np.where((df['year'] == dict[year]) & (df['fuel description'] == dict[fuel])), dict[result], df['new_column'

CodePudding user response:

IIUC, you can use a merge_sof:

df['update_result'] = (
 pd.merge_asof(df.fillna({'price': -1}).reset_index()
                 .sort_values(by='price').drop(columns='result'),
               pd.DataFrame(conditions).fillna({'price': -1})
                 .sort_values(by='price'),
               by=['year', 'fuel'], on='price')
   .set_index('index')['result'].fillna(df['result'])
)

Used input:

   year     price         fuel  result
0  2016  456789.0       Petrol       1
1  2017   20000.0  Elektricity       2
2  2018       NaN       Petrol       3

Output:

   year     price         fuel  result
0  2016  456789.0       Petrol    12.0
1  2017   20000.0  Elektricity     2.0
2  2018       NaN       Petrol    14.0

CodePudding user response:

you can reduce the conditions to separate the creation of each condition from their application, you will need an if conditions for each optional condition, but not their combinations.

from operator import and_
from functools import reduce

conditions = []
if dict['year'] != None:
    conditions.append(df['year'] == dict['year'])
if dict['price'] != None:
    conditions.append(df['price'] > dict['price'])
...

if len(conditions) != 0:
    # similar to conditions[0] & conditions[1] & ...
    df['new_column'] = np.where(reduce(and_, conditions), dict['result'], df['new_column'])  
else:
    ...
  • Related