Count number of elements greater than 1 in pandas dataframe-CodePudding

Hi I hope you could help me with this. I have a dataframe df_test

import pandas as pd
import numpy as np
df_test = pd.DataFrame(data=[[np.nan,2,3,"male"],
                             [4,5,6,"female"],
                             [1,2,np.nan,"female"]],
                       columns=['a','b','c','sex'])


df_test 
         a  b  c       sex
    0  NaN  2  3.0     male
    1  4.0  5  6.0     female
    2  1.0  2  NaN     female

and for each category in the sex column I would like to count the number of values greater than 1 in each column of the dataframe (a,b,c).

df_results 
         a_count  b_count  c_count       sex
    0    NaN      1        1             male
    1    2        2        1             female

CodePudding user response：

You can use groupby and aggregate to count the values :

df_test = pd.DataFrame(data=[[np.nan,2,3,"male"],
                             [4,5,6,"female"],
                             [1,2,np.nan,"female"]],
                       columns=['a','b','c','sex'])

df_test.groupby('sex', sort=False).agg(lambda x : len(x.dropna()>1))

This gives us expected output :

        a  b  c
sex            
male    0  1  1
female  2  2  1

If at all you want those values the values to be Nan, then you can do

df_test.groupby('sex', sort=False).agg(lambda x : np.nan if len(x.dropna()) == 0 else len(x.dropna()))

          a  b  c
sex              
male    NaN  1  1
female  2.0  2  1

Since the column contains NaN values, pandas does some internal optimization to convert int to float internally. So you may have to explicitly convert the column to int.

CodePudding user response：

Check Below code (I have assumed all values will be greater than or equal to 1 or nan based upon the example data)

pd.DataFrame(np.where(df_test.values == 1, np.nan, df_test.values), columns = df_test.columns).groupby(“sex”).count().reset_index()

CodePudding user response：

please correct

columns=[['a','b','c','sex']]

and replace with

columns=['a','b','c','sex']

then

pd.concat([df_test.sex, df_test.drop(columns=["sex"]) >= 1], axis=1).groupby("sex").sum().replace(0, np.nan).rename(columns=lambda x: x   "_count").reset_index()

CodePudding user response：

Let ust try:

(df_test.drop('sex', axis=1).ge(1)           # compare the data with `1`
   .groupby(df_test['sex'],sort=False).sum() # count the number of `True` with sum
   .add_suffix('_count')                     # add the suffix
   .reset_index()                            # make `sex` a column
)

Output:

      sex  a_count  b_count  c_count
0    male        0        1        1
1  female        2        2        1