I have a dataframe df, need to create a new column, which is a product of price
with metric
(int calculated before).
df['cost'] = df['price'] * metric if (df['status'] == 'online')
df['cost'] = 0 if df['status'] == 'offline'
CodePudding user response:
You can use np.where
:
import numpy as np
df['cost'] = np.where(df['status']=='online', df['price']*metric, 0)
CodePudding user response:
We can leverage the point that True
is 1 and False
is 0 when used in multiplication.
3 * True -> 3
3 * False -> 0
We have to check if values are equal to online
in the status
column.
df['cost'] = df['price'] * df['status'].eq('online') * metric
Wherever, status
is offline
cost
value is 0.
The above solution relies on the fact you want to set offline
values to 0. If you want to set offline
to let's 999
. Then we can use Series.where
here.
df['cost'] = df['price'].mul(metric).where(df['status'].eq('online'), 999)
Now, every offline
value to set to 999
.
Useful links:
CodePudding user response:
If you want plain pandas
you can use loc
:
df['cost'] = 0
df.loc[df['status']== "online" , 'cost'] = df['price'] * metric
Or with a list comprehension:
df['cost'] = [df['price']*metric if status == 'online' else 0 for status in df['status']]
CodePudding user response:
Although, all of the answers here solve your problem but using np.where or np.select (already showed in above answers) is best to use because it can solve multiple conditions.
However, above solutions with np.where and np.select can give wrong output if data contains nan values.
Let me simplify and explain these two (np.select and pd.cut) for future cases also.
df = pd.DataFrame({'status':['online','offline','away','busy',np.nan]})
df['price'] = 100
df
status price
0 online 100
1 offline 100
2 away 100
3 busy 100
4 NaN 100
Let's say you have 4 conditions:
status = online then df['cost'] =df['price']*metric
status = offline then df['cost'] = 0
status = away then df['cost'] = 1
status = busy then df['cost'] = 2
then you can just put all these conditions simply in one segment like this:
col = 'status'
conditions = [ df[col] == 'online' , (df[col] == 'offline') , (df[col]== 'away'), df[col] == 'busy']
options = [df['price']*metric, 0, 1, 2 ]
df['cost'] = np.select(conditions, options)
after running this.
status price cost
0 online 100 500
1 offline 100 0
2 away 100 1
3 busy 100 2
4 NaN 100 0
you can observe the value is 0 in front of nan. To tackle this problem, use:
df['cost'] = np.select(conditions, options,default = np.nan)