Home > Blockchain >  efficient way to calculate between columns with conditions
efficient way to calculate between columns with conditions

Time:03-22

I have a dataframe looks like

            Cnt_A  Cnt_B  Cnt_C  Cnt_D
ID_1          0      1      3      0
ID_2          1      0      0      0 
ID_3          5      2      0      8  
...

I'd like to count columns that are not zero and put the result into new column like this,

          Total_Not_Zero_Cols   Cnt_A  Cnt_B  Cnt_C  Cnt_D
ID_1               2              0      1      3      0
ID_2               1              1      0      0      0 
ID_3               3              5      2      0      8  
...

I did loop to get the result, but it took very long time (of course).

I can't figure out the most efficient way to calculate between columns with condition :(

Thank you in advance

CodePudding user response:

Check if each value not equals to 0 then sum on columns axis:

df['Total_Not_Zero_Cols'] = df.ne(0).sum(axis=1)
print(df)

# Output
      Cnt_A  Cnt_B  Cnt_C  Cnt_D  Total_Not_Zero_Cols
ID_1      0      1      3      0                    2
ID_2      1      0      0      0                    3
ID_3      5      2      0      8                    1

CodePudding user response:

Use ne to generate a DataFrame of booleans with True for non-zeros values, then aggregate the rows as integers using sum:

df['Total_Not_Zero_Cols'] = df.ne(0).sum(axis=1)

CodePudding user response:

Numpy based -

Use -

np.sum(df!=0, axis=1)

Output

ID_1    2
ID_2    1
ID_3    3
dtype: int64
  • Related