I have below contents in csv file
key1 key2 Key3 key4 key5
Val1 A 51 True 25
Val1 A 50 False 25
Val1 A 49 True 25
Val1 A 48 True 25
Val2 A 47 False 25
Val2 A 46 True 25
Val2 A 45 False 25
Val2 A 44 True 25
Val2 A 43 True 25
I have to group by key1 and key2 then find the max of key5 and count of total rows of key4 and count of total true rows of key4
What I am trying
print(json_data.groupby(["key1","key2"]).agg(
max_key5=('key5','max'),
total_key4=('key4','count'),
total_true_key4 = ("key4", lambda x: count(x == TRUE))
))
I also tried like belwo
print(json_data.groupby(["section","row"]).agg(
max_key5=('key5','max'),
total_skey4=('key4','count'),
total_true_key4 = pd.NamedAgg(
column="key4",
aggfunc=lambda x: (x == True))
))
I am not getting total_true_key4 proper value
Any suggestion highly appreciated..
thanks
CodePudding user response:
An easy way to count the number of True
s in any array is to take the sum of that array (since they're typically based on the integer values 0 & 1; True == 1, and False == 0).
out = (
json_data.groupby(['key1', 'key2'])
.agg(
maxkey5=('key5', 'max'),
total_key4=('key4', 'count'),
total_true_key4=('key4', 'sum')
)
)
print(out)
maxkey5 total_key4 total_true_key4
key1 key2
Val1 A 25 4 3
Val2 A 25 5 3
CodePudding user response:
I would probably simply use the sum function:
print(json_data.groupby(["key1","key2"]).agg(
max_key5=('key5','max'),
total_key4=('key4','count'),
total_true_key4 = ("key4", "sum")
))
It counts the True values as 1 and the False values as 0.