I have a Pandas dataframe containing Numpy ndarrays:
import numpy as np, pandas as pd
x = pd.DataFrame(columns=['a', 'b'])
x.loc['t1'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t2'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t3'] = [np.random.rand(2000, 500), np.random.rand(2000)]
print(x)
a b
# t1 [[0.8613174378493778, 0.5959214775442211, 0.62... [0.4603835101674928, 0.3552761341266353, 0.949...
# t2 [[0.15792328922236398, 0.4274550633264813, 0.5... [0.20059737978647396, 0.9445869962005252, 0.38...
# t3 [[0.43047697993868284, 0.7127140849172484, 0.4... [0.6868215656323862, 0.14146376237438463, 0.51...
This works and computes the mean of the column b
numpy arrays, over each of the 3 rows (vertical axis mean):
x.loc[:, 'b'].mean()
# [0.44926749 0.4804423 0.61566989 ... 0.4717142 0.70605732 0.55848075]
But how to compute the mean on the other axis? This fails:
x.loc[:, 'b'].mean(axis=1) # or axis="b"
Expected result:
b
t1 0.46
t2 0.31
t3 0.79
CodePudding user response:
You could always apply a mean function on the column, creating a new column in x
, like this:
import numpy as np, pandas as pd
x = pd.DataFrame(columns=['a', 'b'])
x.loc['t1'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t2'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t3'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x["b_mean"] = x["b"].apply(lambda y: np.mean(y))
# or just:
x["b_mean"] = x["b"].apply(np.mean)
Which results in:
t1 0.506371
t2 0.501433
t3 0.493867
Name: b_mean, dtype: float64