Home > Enterprise >  custom mean and var for standard_scaler
custom mean and var for standard_scaler

Time:01-13

How we can use custom mean and var in standard_scaler? I need to calculate mean and var for all data in the dataset (train set test set) and then use these values to standardize the train set and test set (and later input data) separately. How can I do this?

I couldn't find any example of it.

CodePudding user response:

from sklearn.preprocessing import StandardScaler
import numpy as np

# Your training data
X_train = ...

# Your test data
X_test = ...

# Concatenate the training and test data
X_all = np.concatenate((X_train, X_test))

# Initialize the scaler
scaler = StandardScaler()

# Fit the scaler on the combined data set
scaler.fit(X_all)

# Transform the training data
X_train_scaled = scaler.transform(X_train)

# Transform the test data
X_test_scaled = scaler.transform(X_test)

If you want to use input data instead of traning set you could include this:

scaler = StandardScaler(with_mean=True, with_std=True, mean=mean_all, 
scale=var_all)
input_data = #input data
input_data_scaled = scaler.transform(input_data)

CodePudding user response:

The simplest one is the best one!

I found that the normal StandardScaler is the best answer to my question. StandardScaler(with_mean=False,with_std=False) that means mean=0 and var=1. These values is fix for train set, test set and input data. so it's OK!

  • Related