Home > Enterprise >  How to exclude rows from a groupby operation
How to exclude rows from a groupby operation

Time:10-06

I am working on a groupby operation using the attribute column but I want to exclude the desc_type 1 and 2 that will be used to calculate total discount inside each attrib.

pd.DataFrame({'ID':[10,10,10,20,30,30],'attribute':['attrib_1','desc_type1','desc_type2','attrib_1','attrib_2','desc_type1'],'value':[100,0,0,100,30,0],'discount':[0,6,2,0,0,13.3]})

output:

ID       attribute       value      discount
10       attrib_1        100          0
10       desc_type1       0           6
10       desc_type2       0           2
20       attrib_1         100         0
30       attrib_2         30          0
30       desc_type1       0           13.3

I want to groupby this dataframe by attribute but excluding the desc_type1 and desc_type2.

The desired output:

attribute     ID_count    value_sum   discount_sum
attrib_1         2          200          8
attrib_2         1          30          13.3

explanations:

attrib_1 has discount_sum=8 because ID 30 that belongs to attrib_1has two desc_type

attrib_2 has discount_sum=13.3 because ID 10 has one desc_type

ID=20 has no discounts types.

What I did so far:

df.groupby('attribute').agg({'ID':'count','value':'sum','discount':'sum'})

But the line above does not exclude the desc_type 1 and 2 from the groupby

Important: an ID may have a discount or not.

CodePudding user response:

Hello I think this helps :

df.loc[(df['attribute'] != 'desc_type1') &( df['attribute'] != 'desc_type2')].groupby('attribute').agg({'ID':'count','value':'sum','discount':'sum'})

Output :

    ID  value   discount
attribute           
attrib_1    2   200 0.0
attrib_2    1   30  0.0

CodePudding user response:

You can use any method to filter, then groupby:

(df[df['attribute'].str.startswith('attrib')]
    .groupby('attribute', as_index=False)
    .agg({'ID':'count','value':'sum','discount':'sum'})
)

or:

(df[df['attribute'].isin(['attrib_1', 'attrib_1'])]
    .groupby('attribute', as_index=False)
    .agg({'ID':'count','value':'sum','discount':'sum'})
)

output:

  attribute  ID  value  discount
0  attrib_1   2    200       0.0
1  attrib_2   1     30       0.0
  • Related