I have a dataFrame
like:
a b
0 4 7
1 3 2
2 1 9
3 3 4
4 2 Nan
I need to calculate min, mean, std, sum
, for all dataFrame as a single list of numbers. (e.g minimum here is 1)
EDIT: The data may have Nans or different size columns.
df.to_numpy().mean()
Produce Nan
, because there are nans in the arrays and they have different length.
How to calculate all normal math stuff on all of these numbers ?
CodePudding user response:
Pandas
solution is with reshape by DataFrame.stack
and Series.agg
:
def std_ddof0(x):
return x.std(ddof=0)
out = df.stack().agg(['mean','sum',std_ddof0, 'min'])
print (out)
mean 3.888889
sum 35.000000
std_ddof0 2.424158
min 1.000000
dtype: float64
Numpy
solution with np.nanmean
, np.nansum
, np.nanstd
, np.nanmin
:
totalp = df.to_numpy().reshape(-1)
out = np.nanmean(totalp), np.nansum(totalp), np.nanstd(totalp), np.nanmin(totalp)
print (out)
(3.888888888888889, 35.0, 2.4241582476968255, 1.0)
Another idea is remove missing values first:
totalp = df.to_numpy().reshape(-1)
totalp = totalp[~np.isnan(totalp)]
print (totalp)
[4. 7. 3. 2. 1. 9. 3. 4. 2.]
out = np.mean(totalp), np.sum(totalp), np.std(totalp), np.min(totalp)
print (out)
(3.888888888888889, 35.0, 2.4241582476968255, 1.0)