I'm trying to calculate the summation of some elements in my data frame, if another column in my data has a non-zero value. I came up with this:
data = {'Type': ['A', 'A', 'B', 'C', 'C'] ,
'Vol' : [10, 20, 15, 15, 15] ,
'Cost' : [500, 300, 200, 400, 400] ,
'IsSold' : [1, 0, 1, 1, 0]}
totalA = 0
totalB = 0
totalC = 0
totalD = 0
for idx, el in enumerate(df.iloc[:,:]):
if df['Type'][idx] == "A" and df['IsSold'][idx] == 1 :
totalA = df['Vol'][idx]
It's working but I'm wondering if there is a better way to calculate desired parameters. because the number of different groups in my data (based on Type column) is high and I think it's not the optimum solution. I also tried using groupby function as below:
df.groupby('Type')[['Vol']].sum()
but I don't know how to check the condition in this method.
CodePudding user response:
You can do a filtering before the .groupby
:
out = df[df.IsSold == 1].groupby("Type")["Vol"].sum()
print(out)
Prints:
Type
A 10
B 15
C 15
Name: Vol, dtype: int64
CodePudding user response:
You can either groupby Type
and IsSold
or you filter first the rows where InSold == 1 and do your groupby. In code that could look like as
df.query('IsSold==1').groupby('Type')['Vol'].sum()