panda groupby agg and calculated function together-CodePudding

I have below contents in csv file

key1    key2    Key3    key4    key5

Val1    A        51     True    25
Val1    A        50     False   25
Val1    A        49     True    25
Val1    A        48     True    25
Val2    A        47     False   25
Val2    A        46     True    25
Val2    A        45     False   25
Val2    A        44     True    25
Val2    A        43     True    25

I have to group by key1 and key2 then find the max of key5 and count of total rows of key4 and count of total true rows of key4

What I am trying

print(json_data.groupby(["key1","key2"]).agg(
        max_key5=('key5','max'),
        total_key4=('key4','count'),
        total_true_key4 = ("key4", lambda x: count(x == TRUE))   
    ))

I also tried like belwo

print(json_data.groupby(["section","row"]).agg(
    max_key5=('key5','max'),
    total_skey4=('key4','count'),
    total_true_key4 = pd.NamedAgg(
         column="key4", 
         aggfunc=lambda x: (x == True))    
     ))

I am not getting total_true_key4 proper value

Any suggestion highly appreciated..

thanks

CodePudding user response：

An easy way to count the number of Trues in any array is to take the sum of that array (since they're typically based on the integer values 0 & 1; True == 1, and False == 0).

out = (
    json_data.groupby(['key1', 'key2'])
    .agg(
        maxkey5=('key5', 'max'), 
        total_key4=('key4', 'count'), 
        total_true_key4=('key4', 'sum')
    )
)

print(out)
           maxkey5  total_key4  total_true_key4
key1 key2                                      
Val1 A          25           4                3
Val2 A          25           5                3

CodePudding user response：

I would probably simply use the sum function:

print(json_data.groupby(["key1","key2"]).agg(
    max_key5=('key5','max'),
    total_key4=('key4','count'),
    total_true_key4 = ("key4", "sum")   
))

It counts the True values as 1 and the False values as 0.