Home > Software engineering >  Calculate the sum of column values per row but not include every value
Calculate the sum of column values per row but not include every value

Time:11-14

So I have this dataset which looks like this.

id something number1 number2 number3 number4 number5 number6 sum_columns
1 105 1 NaN NaN 2 3 4 4
2 300 2 1 1 33 6 2 6
3 20 1 NaN NaN NaN 5 3 3

Now I need to calculate the sum of columns values starting with 'number' but only include the values that are in range of 1-5.

So the final dataset would look like this:

id something number1 number2 number3 number4 number5 number6 sum_columns sum_values
1 105 1 NaN NaN 2 3 4 4 10
2 300 2 1 1 33 6 2 6 6
3 20 1 NaN NaN NaN 5 3 3 9

I know I can calculate the sum of columns containing number like this:

df['values_sum']=df.filter(like='number').sum(axis=1)

But how to include only values in columns that are in range of 1-5 is the problem.

CodePudding user response:

Use DataFrame.where for filter values less like 6 and replace not matched to 0:

df1 = df.filter(like='number')
df['values_sum'] = df1.where(df1.lt(6),0).sum(axis=1)

#if need values only range(1,6)
#df['values_sum'] = df1.where(df1.isin(range(1, 6)),0).sum(axis=1)
print (df)
   id  something  number1  number2  number3  number4  number5  number6  \
0   1        105        1      NaN      NaN      2.0        3        4   
1   2        300        2      1.0      1.0     33.0        6        2   
2   3         20        1      NaN      NaN      NaN        5        3   

   sum_columns  values_sum  
0            4        10.0  
1            6         6.0  
2            3         9.0  
  • Related