I have a pandas dataframe containing 5 scores for each row and then the standard deviation of these 5 scores. This was easy to calculate by using df['std'] = df.std(axis=1, ddof=0)
.
However, when I also want to add the mean of these 5 scores, I do not know how I can exclude the std
column in the calculation? Using df['mean'] = df.mean(axis=1)
would result in pandas using the 5 scores AND the stddev in the calculation of the mean, which is obviously not what I want.
To summarise, the current df.head
looks like this and I would like to add a column representing the mean of the 5 scores:
score1 score2 score3 score4 score5 std
0 0.714286 0.689076 0.718487 0.683544 0.708861 0.013956
1 0.756303 0.704641 0.746835 0.734177 0.704641 0.021338
2 0.689076 0.722689 0.710084 0.760504 0.776371 0.032220
3 0.670833 0.704167 0.732218 0.690377 0.728033 0.023035
4 0.733333 0.758333 0.753138 0.769874 0.774059 0.014358
5 0.733333 0.825000 0.786611 0.786611 0.765690 0.029978
CodePudding user response:
Use agg
instead of df['std'] = df.std(axis=1, ddof=0)
df[['mean', 'std']] = df.filter(like='score').agg((np.mean, np.std), axis=1)
Note: I use np.std
instead of df.std
because ddof is 0 by default in numpy.
CodePudding user response:
Use DataFrame.assign
for add multiple new columns without exclude some columns:
df = df.assign(mean = df.mean(axis=1), std = df.std(axis=1, ddof=0))
CodePudding user response:
If you want to select specific columns, you can call those columns or use i.loc.
df["means"] = df[['score1', 'score2', 'score3', 'score4', 'score2']].mean(axis=1)
I haven't tried this because you did not provide a Minimal, Reproducible Example! but must be close what you need.
CodePudding user response:
I would do it following way:
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6],'col3':[7,8,9]})
df['std'], df['mean'] = df.std(axis=1, ddof=0), df.mean(axis=1)
print(df)
output
col1 col2 col3 std mean
0 1 4 7 2.44949 4.0
1 2 5 8 2.44949 5.0
2 3 6 9 2.44949 6.0
I harness that when you do something like x,y = a,b
in python
firstly values to right to =
are computed and then they are inserted in corresponding variable. This is feature that allow you do swap as follows
a = 1
b = 2
a, b = b, a
print(a) # 2
print(b) # 1
Note: I used simpler data for my example for brevity sake