For my database I need to create a new column based on a condition. In a seperate file I have add all the conditions like this:
conditions = [{year: 2016, price: 30000, fuel: Petrol, result: 12},
{year: 2017, price: 45000, fuel: Elektricity, result: 18},
{year: 2018, price: None, fuel: Petrol, result: 14},
I am using the following code to add a new column
df['new_column'] = np.where((df['year'] == dict[year]) & (df['price'] > dict[price]) & (df['fuel description'] == dict[fuel])), dict[result], df['new_column'
As you see in the conditions, it is possible that one of the values is none. This means (in this case price: none) that the formula as follows:
df['new_column'] = np.where((df['year'] == dict[year]) & (df['fuel description'] == dict[fuel])), dict[result], df['new_column'
It is possible that multiple conditions are none, I want to prevent working with to much if and else statements but I can't figure out a way to do something in the formula that when it is none it needs to remove the condition.
Is this possible or should I just work with:
If dict[price] != None:
df['new_column'] = np.where((df['year'] == dict[year]) & (df['price'] > dict[price]) & (df['fuel description'] == dict[fuel])), dict[result], df['new_column'
else:
df['new_column'] = np.where((df['year'] == dict[year]) & (df['fuel description'] == dict[fuel])), dict[result], df['new_column'
CodePudding user response:
IIUC, you can use a merge_sof
:
df['update_result'] = (
pd.merge_asof(df.fillna({'price': -1}).reset_index()
.sort_values(by='price').drop(columns='result'),
pd.DataFrame(conditions).fillna({'price': -1})
.sort_values(by='price'),
by=['year', 'fuel'], on='price')
.set_index('index')['result'].fillna(df['result'])
)
Used input:
year price fuel result
0 2016 456789.0 Petrol 1
1 2017 20000.0 Elektricity 2
2 2018 NaN Petrol 3
Output:
year price fuel result
0 2016 456789.0 Petrol 12.0
1 2017 20000.0 Elektricity 2.0
2 2018 NaN Petrol 14.0
CodePudding user response:
you can reduce
the conditions to separate the creation of each condition from their application, you will need an if conditions for each optional condition, but not their combinations.
from operator import and_
from functools import reduce
conditions = []
if dict['year'] != None:
conditions.append(df['year'] == dict['year'])
if dict['price'] != None:
conditions.append(df['price'] > dict['price'])
...
if len(conditions) != 0:
# similar to conditions[0] & conditions[1] & ...
df['new_column'] = np.where(reduce(and_, conditions), dict['result'], df['new_column'])
else:
...