I have a excel like this:
year | a | b |
---|---|---|
2021 | 12 | 23 |
2021 | 31 | 0 |
2021 | 15 | 21 |
2021 | 14 | 0 |
2022 | 32 | 0 |
2022 | 24 | 15 |
2022 | 28 | 29 |
2022 | 33 | 0 |
I wanna get count of condition: a>=30 and b==0 group by year the final output like this:
2021 1
2022 2
I wanna use pandas dataframe to implement this, can anyone help? I'm quite new to python
CodePudding user response:
For count matched rows chain both conditions by &
for bitwise AND
and aggregate sum
, True
s are processing like 1
and False
like 0
:
df1 = ((df.a>=30) & (df.b==0)).astype(int)
.groupby(df['year']).sum().reset_index(name='count')
print (df1)
year count
0 2021 1
1 2022 2
Similar idea with helper column:
df1 = (df.assign(count = ((df.a>=30) & (df.b==0)).astype(int))
.groupby('year', as_index=False)['count']
.sum())