I am working on a groupby operation using the attribute
column but I want to exclude the desc_type 1 and 2
that will be used to calculate total discount inside each attrib.
pd.DataFrame({'ID':[10,10,10,20,30,30],'attribute':['attrib_1','desc_type1','desc_type2','attrib_1','attrib_2','desc_type1'],'value':[100,0,0,100,30,0],'discount':[0,6,2,0,0,13.3]})
output:
ID attribute value discount
10 attrib_1 100 0
10 desc_type1 0 6
10 desc_type2 0 2
20 attrib_1 100 0
30 attrib_2 30 0
30 desc_type1 0 13.3
I want to groupby this dataframe by attribute
but excluding the desc_type1 and desc_type2
.
The desired output:
attribute ID_count value_sum discount_sum
attrib_1 2 200 8
attrib_2 1 30 13.3
explanations:
attrib_1
has discount_sum=8 because ID 30 that belongs to attrib_1
has two desc_type
attrib_2
has discount_sum=13.3 because ID 10 has one desc_type
ID=20
has no discounts types.
What I did so far:
df.groupby('attribute').agg({'ID':'count','value':'sum','discount':'sum'})
But the line above does not exclude the desc_type 1 and 2
from the groupby
Important: an ID may have a discount or not.
CodePudding user response:
Hello I think this helps :
df.loc[(df['attribute'] != 'desc_type1') &( df['attribute'] != 'desc_type2')].groupby('attribute').agg({'ID':'count','value':'sum','discount':'sum'})
Output :
ID value discount
attribute
attrib_1 2 200 0.0
attrib_2 1 30 0.0
CodePudding user response:
You can use any method to filter, then groupby:
(df[df['attribute'].str.startswith('attrib')]
.groupby('attribute', as_index=False)
.agg({'ID':'count','value':'sum','discount':'sum'})
)
or:
(df[df['attribute'].isin(['attrib_1', 'attrib_1'])]
.groupby('attribute', as_index=False)
.agg({'ID':'count','value':'sum','discount':'sum'})
)
output:
attribute ID value discount
0 attrib_1 2 200 0.0
1 attrib_2 1 30 0.0