Calculate mean of row using only certain columns in pandas-CodePudding

I have a pandas dataframe containing 5 scores for each row and then the standard deviation of these 5 scores. This was easy to calculate by using df['std'] = df.std(axis=1, ddof=0). However, when I also want to add the mean of these 5 scores, I do not know how I can exclude the std column in the calculation? Using df['mean'] = df.mean(axis=1) would result in pandas using the 5 scores AND the stddev in the calculation of the mean, which is obviously not what I want.

To summarise, the current df.head looks like this and I would like to add a column representing the mean of the 5 scores:

    score1  score2  score3  score4  score5  std
0   0.714286    0.689076    0.718487    0.683544    0.708861    0.013956
1   0.756303    0.704641    0.746835    0.734177    0.704641    0.021338
2   0.689076    0.722689    0.710084    0.760504    0.776371    0.032220
3   0.670833    0.704167    0.732218    0.690377    0.728033    0.023035
4   0.733333    0.758333    0.753138    0.769874    0.774059    0.014358
5   0.733333    0.825000    0.786611    0.786611    0.765690    0.029978

CodePudding user response：

Use agg instead of df['std'] = df.std(axis=1, ddof=0)

df[['mean', 'std']] = df.filter(like='score').agg((np.mean, np.std), axis=1)

Note: I use np.std instead of df.std because ddof is 0 by default in numpy.

CodePudding user response：

Use DataFrame.assign for add multiple new columns without exclude some columns:

df = df.assign(mean = df.mean(axis=1), std = df.std(axis=1, ddof=0))

CodePudding user response：

If you want to select specific columns, you can call those columns or use i.loc.

df["means"] = df[['score1', 'score2', 'score3', 'score4', 'score2']].mean(axis=1)

I haven't tried this because you did not provide a Minimal, Reproducible Example! but must be close what you need.

CodePudding user response：

I would do it following way:

import pandas as pd
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6],'col3':[7,8,9]})
df['std'], df['mean'] = df.std(axis=1, ddof=0), df.mean(axis=1)
print(df)

output

   col1  col2  col3      std  mean
0     1     4     7  2.44949   4.0
1     2     5     8  2.44949   5.0
2     3     6     9  2.44949   6.0

I harness that when you do something like x,y = a,b in python firstly values to right to = are computed and then they are inserted in corresponding variable. This is feature that allow you do swap as follows

a = 1
b = 2
a, b = b, a
print(a)  # 2
print(b)  # 1

Note: I used simpler data for my example for brevity sake