Count only NAs across categories-CodePudding

I have a pandas dataframe that looks something like this:

Category	Score	ID
A	nan	1
A	95	1
A	nan	2
B	nan	2
B	nan	2
B	nan	3
C	95	3
C	nan	3

I want to find the count of those IDs with only nan values across each category. Based on the table above, my result would look like:

Category	Count
A	1
B	2
C	0

Here's what I have so far but I'm not sure this is providing the correct numbers:

out = (df[df['score'].isna()]
        .groupby('category')
        .apply(lambda g: g.groupby('id')
        .filter(lambda x: len(x) >= 1)['id'].nunique())
        .to_frame('Count')
        .reset_index())
print(out)

Thanks for your help

CodePudding user response：

Try this:

First you groupby Category and ID and check with isna() and all() if every value in a group is nan, which will return True and False. Converting True and False to integer will change the values to 1 and 0. Then groupby again, but only the Categorys and sum the values (sum the 1's which came from True)

out = (df
       .groupby(['Category', 'ID'])['Score']
       .apply(lambda x: x.isna().all().astype(int))
       .groupby('Category')
       .sum()
       .reset_index(name='count')
      )
print(out)

  Category  count
0        A      1
1        B      2
2        C      0