I have a dataframe with multiple text columns and some values and i'm sure there is a better way to do this with built in functions but i haven't got the answer yet.
ID A B C q value
1 Day N Wed 20 30
2 Noon N Wed 12 50
3 Day S Mon 30 13
4 Night W Sun 5 6
5 Night E Sun 20 20
6 Day S Wed 40 20
I want to make an operation over q and value over the different [A,B,C] keys, like q * value if value < 30, else q * value / 2
for uniqueA in df['A'].unique():
for uniqueB in df.query('A == @uniqueA')['B'].unique():
for uniqueC in df.query('A== @uniqueA and B == @uniqueB')['C'].unique():
aux = df.query('A== @uniqueA and B == @uniqueB and C == @uniqueC')[['A','B','C','q',value]]
aux = df.groupby(['A','B','C','q'])['value'].sum().reset_index()
if value < 30:
result = df.query('A== @uniqueA and B == @uniqueB and C == @uniqueC')['q'] * value
else:
result = df.query('A== @uniqueA and B == @uniqueB and C == @uniqueC')['q'] * (value/2)
# Save on results another df
CodePudding user response:
res = df.apply(lambda r: r['q'] * r['value'] if r['value'] < 30 else r['q'] * r['value'] /2, axis=1)
CodePudding user response:
There are many ways to do so.
One way is using df.loc method:
condition = df['q'] < 30 # condition
df.loc[condition, "ans"] = df['q'] * df['value'] # when true
df.loc[~condition, "ans"] = df['q'] * df['value'] / 2 # when false
Another way is using Numpy's where method:
import numpy as np
df['ans'] = np.where(df['q']<30, df['q'] * df['value'], df['q'] * df['value'] / 2)
The syntax so you understand what's happening is
np.where(condition, operation when true, operation when false)
CodePudding user response:
For a vectorised operation you can use numpy.where
, which returns an array:
import numpy as np
np.where(df['value'].lt(30),
df['q'].mul(df['value']),
df['q'].mul(df['value']).div(2))
Out[162]: array([300., 300., 390., 30., 400., 800.])
To assign your result back to a new column in your existing dataframe:
df['result'] = np.where(df['value'].lt(30),
df['q'].mul(df['value']),
df['q'].mul(df['value']).div(2))
>>> df
ID A B C q value result
0 1 Day N Wed 20 30 300.0
1 2 Noon N Wed 12 50 300.0
2 3 Day S Mon 30 13 390.0
3 4 Night W Sun 5 6 30.0
4 5 Night E Sun 20 20 400.0
5 6 Day S Wed 40 20 800.0