I am trying to adjust some columns to have a mean of zero and one SD. But I am not sure how to do that.
E.g. given the following dataframe, how do you create a new column with mean 0 and sd 1?
df = pd.DataFrame([8.2,18,15,9], columns=['temp'])
Here is something I have tried with Standard Scaler
from sklearn.preprocessing import StandardScaler
df = pd.DataFrame([[8.2,57],[18,60],[15,45],[9,30]], columns=['temp','rh'])
print(df)
scaler = StandardScaler(copy=False, with_mean=True, with_std=True)
scaler.fit(df)
print(f"Means: {scaler.mean_}")
df2 = scaler.transform(df)
print(f"Transformed Data Frame:\n{df2}")
m = np.mean(df2, axis=0)
s = np.std(df2, axis=0)
print(f"Column means:\n{m}")
print(f"Column SD:\n{s}")
But the results are not a mean of zero or sd=1 at all.
temp rh
0 8.2 57
1 18.0 60
2 15.0 45
3 9.0 30
Means: [12.55 48. ]
Transformed Data Frame:
[[-1.06105451 0.76200076]
[ 1.32936715 1.01600102]
[ 0.59760542 -0.25400025]
[-0.86591805 -1.52400152]]
Column means:
[-2.49800181e-16 0.00000000e 00]
Column SD:
[1. 1.]
CodePudding user response:
from sklearn.preprocessing import StandardScaler
df1 = StandardScaler().fit_transform(df)
Will do the trick.