Home > Back-end >  Count only NAs across categories
Count only NAs across categories

Time:09-23

I have a pandas dataframe that looks something like this:

Category Score ID
A nan 1
A 95 1
A nan 2
B nan 2
B nan 2
B nan 3
C 95 3
C nan 3

I want to find the count of those IDs with only nan values across each category. Based on the table above, my result would look like:

Category Count
A 1
B 2
C 0

Here's what I have so far but I'm not sure this is providing the correct numbers:

out = (df[df['score'].isna()]
        .groupby('category')
        .apply(lambda g: g.groupby('id')
        .filter(lambda x: len(x) >= 1)['id'].nunique())
        .to_frame('Count')
        .reset_index())
print(out)

Thanks for your help

CodePudding user response:

Try this:

First you groupby Category and ID and check with isna() and all() if every value in a group is nan, which will return True and False. Converting True and False to integer will change the values to 1 and 0. Then groupby again, but only the Categorys and sum the values (sum the 1's which came from True)

out = (df
       .groupby(['Category', 'ID'])['Score']
       .apply(lambda x: x.isna().all().astype(int))
       .groupby('Category')
       .sum()
       .reset_index(name='count')
      )
print(out)
  Category  count
0        A      1
1        B      2
2        C      0
  • Related