Home > Net >  Count number of positive columns from large dataframe
Count number of positive columns from large dataframe

Time:03-05

I have a large DataFrame with 500 columns out of which 300 columns col1, col2, ..col300 appears as follows:

idx col1    col2
a   -1      4
b    2      1
c   -1     -1

I want to get the following for the 300 columns. Other 200 columns are variables I am not interested in:

idx col1    col2  numPos
a   -1      4     1
b    2      1     2
c   -1     -1     0

where for each row I want to get the number of positive values. I don't want to use the apply method as there are about 2 million rows in the DataFrame. Is there a pythonic way to do this?

CodePudding user response:

You could select the columns gt (which creates a boolean DataFrame where it's True if a value is positive) sum on axis:

df['numPos'] = df[['col1','col2']].gt(0).sum(axis=1)

Maybe you could filter them too, like:

df['numPos'] = df.filter(like='col').gt(0).sum(axis=1)

Output:

  idx  col1  col2  numPos
0   a    -1     4       1
1   b     2     1       2
2   c    -1    -1       0

CodePudding user response:

Best way out is to exclude object columns, query what is greater than 0 in the df and sum along the row axis

  df['numPos']= df.select_dtypes(exclude='object').gt(0).sum(1)



  idx  col1  col2  status
0   a    -1     4       2
1   b     2     1       3
2   c    -1    -1       0

CodePudding user response:

df['numPos'] = (df[cols] > 0).sum(axis=1) where cols is a list of column names. If the 300 columns are consecutive, then in place of df[cols] you can use df.iloc[:,start_offset:start_offset 300] where start_offset is the index of the first column.

  • Related