I have the following matrix in pandas:
import numpy as np
import pandas as pd
df_matrix = pd.DataFrame(np.random.random((10, 10)))
I need to get a vector that contains 10 median values, 1 value across each blue line as shown in the picture below:
The last number in the output vector is basically 1 number rather than a median.
CodePudding user response:
X = np.random.random((10, 10))
fX = np.fliplr(X) # to get the "other" diagonal
np.array([np.median(np.diag(fX, k=-k)) for k in range(X.shape[0])])
CodePudding user response:
The diagonals are such that row_num col_num = constant
. So you can use stack
and sum the rows/cols and groupby
:
(df_matrix.stack().reset_index(name='val')
.assign(diag=lambda x: x.level_0 x.level_1) # enumerate the diagonals
.groupby('diag')['val'].median() # median by diagonal
.loc[len(df_matrix):] # lower triangle diagonals
)
Output (for np.random.seed(42)
):
diag
9 0.473090
10 0.330898
11 0.531382
12 0.440152
13 0.548075
14 0.325330
15 0.580145
16 0.427541
17 0.248817
18 0.107891
Name: val, dtype: float64