Here is how my df kind of looks like (with many more rows, and many more columns):
Index | WTG1 | WTG2 | WTG3 |
---|---|---|---|
1.5 | 61.25 | -7.57 | 7.18 |
2 | 19.69 | 25.95 | 28.67 |
2.5 | 59.51 | 81.22 | 78.22 |
3 | 131.81 | 154.07 | 142.92 |
My objective is to get:
Index | WTG1 | WTG2 | WTG3 | 25th Percentile | 75th Percentile | Mean |
---|---|---|---|---|---|---|
1.5 | 61.25 | -7.57 | 7.18 | (25th Percentile of 61.2, -7.57, 7.18) | (75th Percentile of 61.2, -7.57, 7.18) | (Avg. of 61.2, -7.57, 7.18) |
2 | 19.6 | 25.95 | 28.67 | (25th Percentile of 19.69, 25.95, 28.67) | (75th Percentile of 19.69, 25.95, 28.67) | (AVG. of 19.69, 25.95, 28.67) |
2.5 | 59.51 | 81.22 | 78.22 | (25th Percentile of 59.51, 81.22, 78.22) | (75th Percentile of 59.51, 81.22, 78.22) | (AVG. of 59.51, 81.22, 78.22) |
3 | 131.81 | 154.07 | 142.92 | (25th Percentile of 131.81, 154.07, 142.92) | (75th Percentile of 131.81, 154.07, 142.92) | (AVG. of 131.81, 154.07, 142.92) |
I have been looking for a long time now and the best I can do it :
df['mean'] = df[['WTG1','WTG2','WTG3'].mean(axis=1)
df['25th Percentile'] = np.nanpercentile(df[['WTG1','WTG2','WTG3']],25)
df['75th Percentile'] = np.nanpercentile(df[['WTG1','WTG2','WTG3']],75)
The mean seems to work, have not been checking the values yet though.
But the percentiles are the real issues here... it seems that nanpercentile function works only on columns. It returns the same value on every line (which I guess is the respective 25th and 75th percentile value but of the whole df) for both percentiles columns, which is not what I attend to do.
I was able to find some alternatives but could not adapt them to my need, as:
perc75 = np.vectorize(lambda x: np.percentile(x, 75))
df['75th_percentile'] = perc75(df['WTG01'].values)
which work but only for one column.
or
df['25th_percentile'] = df['WTG1','WTG2','WTG3'].apply(lambda x: np.percentile(x, 25))
which does not work...
CodePudding user response:
I think you could transpose the DataFrame and apply df.describe()
import pandas as pd
df = pd.DataFrame({'WTG1': [61.25, 19.69, 59.51, 131.81],
'WTG2': [-7.57, 25.95, 81.22, 154.07],
'WTG3': [7.18, 28.67, 78.22, 142.92]
})
print(df)
print(df.T)
Output
WTG1 WTG2 WTG3
0 61.25 -7.57 7.18
1 19.69 25.95 28.67
2 59.51 81.22 78.22
3 131.81 154.07 142.92
0 1 2 3
WTG1 61.25 19.69 59.51 131.81
WTG2 -7.57 25.95 81.22 154.07
WTG3 7.18 28.67 78.22 142.92
In the question, you're trying to get the statistics for each row. After you transpose the DataFrame, you could look at columns instead, so you'd get the summary statistics of each column conveniently
print(df.T.describe())
Output
0 1 2 3
count 3.000000 3.000000 3.000000 3.000000
mean 20.286667 24.770000 72.983333 142.933333
std 36.233778 4.604824 11.764269 11.130006
min -7.570000 19.690000 59.510000 131.810000
25% -0.195000 22.820000 68.865000 137.365000
50% 7.180000 25.950000 78.220000 142.920000
75% 34.215000 27.310000 79.720000 148.495000
max 61.250000 28.670000 81.220000 154.070000