there is two panda dataframe . I used MinMaxScaler to normalize first dataframe in purpose of training a neural network . and for test dataset i need to do the same but how can I scale dataframe base on min and max of the first dataframe ?
and because test data should not effect training , can not merge two dataframes , then scale and split again
datasets have a lot of columns
example :
first dataframe :
| | colA |
| --- |---- |
| 1 | 3 |
| 2 | 10 |
| 3 | 4 |
| 4 | 0 |
second dataframe:
| | colA |
| --- |--- |
| 1 | 2 |
| 2 | 5 |
expected scaling :
| | colA |
| --- | --- |
| 1 | 0.2 |
| 2 | 0.5 |
CodePudding user response:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
train_df = pd.DataFrame({'colA': [3, 10, 4, 0]})
test_df = pd.DataFrame({'colA': [2, 5]})
scaler = MinMaxScaler()
scaler.fit(train_df)
train_df = scaler.transform(train_df)
test_df = scaler.transform(test_df)