I have a pandas dataframe that looks something like this:
Category | Score | ID |
---|---|---|
A | nan | 1 |
A | 95 | 1 |
A | nan | 2 |
B | nan | 2 |
B | nan | 2 |
B | nan | 3 |
C | 95 | 3 |
C | nan | 3 |
I want to find the count of those IDs with only nan values across each category. Based on the table above, my result would look like:
Category | Count |
---|---|
A | 1 |
B | 2 |
C | 0 |
Here's what I have so far but I'm not sure this is providing the correct numbers:
out = (df[df['score'].isna()]
.groupby('category')
.apply(lambda g: g.groupby('id')
.filter(lambda x: len(x) >= 1)['id'].nunique())
.to_frame('Count')
.reset_index())
print(out)
Thanks for your help
CodePudding user response:
Try this:
First you groupby Category
and ID
and check with isna()
and all()
if every value in a group is nan
, which will return True and False. Converting True and False to integer will change the values to 1 and 0. Then groupby again, but only the Categorys
and sum the values (sum the 1's which came from True)
out = (df
.groupby(['Category', 'ID'])['Score']
.apply(lambda x: x.isna().all().astype(int))
.groupby('Category')
.sum()
.reset_index(name='count')
)
print(out)
Category count
0 A 1
1 B 2
2 C 0