So I have this dataset which looks like this.
id | something | number1 | number2 | number3 | number4 | number5 | number6 | sum_columns |
---|---|---|---|---|---|---|---|---|
1 | 105 | 1 | NaN | NaN | 2 | 3 | 4 | 4 |
2 | 300 | 2 | 1 | 1 | 33 | 6 | 2 | 6 |
3 | 20 | 1 | NaN | NaN | NaN | 5 | 3 | 3 |
Now I need to calculate the sum of columns values starting with 'number' but only include the values that are in range of 1-5.
So the final dataset would look like this:
id | something | number1 | number2 | number3 | number4 | number5 | number6 | sum_columns | sum_values |
---|---|---|---|---|---|---|---|---|---|
1 | 105 | 1 | NaN | NaN | 2 | 3 | 4 | 4 | 10 |
2 | 300 | 2 | 1 | 1 | 33 | 6 | 2 | 6 | 6 |
3 | 20 | 1 | NaN | NaN | NaN | 5 | 3 | 3 | 9 |
I know I can calculate the sum of columns containing number like this:
df['values_sum']=df.filter(like='number').sum(axis=1)
But how to include only values in columns that are in range of 1-5 is the problem.
CodePudding user response:
Use DataFrame.where
for filter values less like 6
and replace not matched to 0
:
df1 = df.filter(like='number')
df['values_sum'] = df1.where(df1.lt(6),0).sum(axis=1)
#if need values only range(1,6)
#df['values_sum'] = df1.where(df1.isin(range(1, 6)),0).sum(axis=1)
print (df)
id something number1 number2 number3 number4 number5 number6 \
0 1 105 1 NaN NaN 2.0 3 4
1 2 300 2 1.0 1.0 33.0 6 2
2 3 20 1 NaN NaN NaN 5 3
sum_columns values_sum
0 4 10.0
1 6 6.0
2 3 9.0