I am attempting to scale a dateset to train a machine learning model on using python and scikit-learn. I want to scale a dataset but maintain that all the raw values that are negative remain negative post scaling and all the raw values that are positive remain positive after scaling.
Something like this pseudo code for a single feature:
from sklearn import preprocessing
array = [[-5.0, 0.0, 1.25, 2.5]]
scaler = preprocessing.SomeScaler(feature_range=(-1,1), center=0)
scaler = scaler.fit(array)
print("Scaled:", scaler.transform(array))
#should print Scaled: [[-1.0, 0.0, 0.25, 0.5]]
#data arriving after initial scaling might look like this:
scaler = scaler.fit([[-0.1, 0.1, 7.0, 10.0]])
print("Scaled:", scaler.transform(array))
#should print Scaled: [[-0.02, 0.02, 1.4, 2.0]]
I am new to machine learning, so I am hoping I just don't know the terms / functions of scikit-learn well enough yet. Does something like my above SomeScaler
exist in scikit-learn or perhaps another python library?
CodePudding user response:
- Firstly, if you want an array of 1 feature, 4 values you need to reshape your array.
import numpy as np
print('This is an array of 1-value for 4-features', np.array([[-5.0, 0.0, 1.25, 2.5]]).shape)
print('This is an array of 4-values for 1-feature', np.array([-5.0, 0.0, 1.25, 2.5]).shape)
#[output] This is an array of 1-value for 4-features (1, 4)
#[output] This is an array of 4-values for 1-feature (4,)
- Secondly, you can scale using MaxAbsScaler:
Scale each feature by its maximum absolute value. This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.
from sklearn import preprocessing
import numpy as np
array = [-5.0, 0.0, 1.25, 2.5]
#Reshape your data using array.reshape(-1, 1) if your data has a single feature or array to make it 2D
array = np.array(array).reshape(-1, 1)
print("Array shape after reshaping it to a 2D array: ",array.shape)
scaler = preprocessing.MaxAbsScaler()
scaler = scaler.fit(array)
print("Scaled:", scaler.transform(np.array(array).reshape(-1, 1)))
#[output] Array shape after reshaping it to a 2D array: (4, 1)
#[output] Scaled: [[-1. ]
# [ 0. ]
# [ 0.25]
# [ 0.5 ]]
Imputers usually require arrays to be 2D that is why we used reshape to add a 2nd dimension.