I have a dataframe df
:
df = pd.DataFrame({'A': [1, 2, 5, 3], 'B': [10, 0, 3, 7], 'C': [100, 200, 50, 500]})
df
A B C
0 1 10 100
1 2 0 200
2 5 3 50
3 3 7 500
Now I use the following command to normalize the columns of df
:
df[['A', 'B', 'C']] = df[['A', 'B', 'C']].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
df
A B C
0 0.00 1.0 0.111111
1 0.25 0.0 0.333333
2 1.00 0.3 0.000000
3 0.50 0.7 1.000000
Also, I get the min and max parameters using the following command:
min_params = dict(df[['A', 'B', 'C']].min())
max_params = dict(df[['A', 'B', 'C']].max())
I use df
for training phase. For inference, consider new dataframe df_new
like this:
df_new = pd.DataFrame({'A': [10, 15, 20], 'B': [18, 17, 15], 'C': [250, 300, 150]})
df_new
A B C
0 10 18 250
1 15 17 300
2 20 15 150
Now, I want to normalize the df_new
like the above procedure with the min_params
and max_params
. What is the best and efficient way to do it with pandas?
CodePudding user response:
Use MinMaxScaler.
df = pd.DataFrame({'A': [1, 2, 5, 3], 'B': [10, 0, 3, 7], 'C': [100, 200, 50, 500]})
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler = scaler.fit(df)
scaler.transform(df)
Results
array([[0. , 1. , 0.11111111],
[0.25 , 0. , 0.33333333],
[1. , 0.3 , 0. ],
[0.5 , 0.7 , 1. ]])
Now using the same scaler on new data
df_new = pd.DataFrame({'A': [10, 15, 20], 'B': [18, 17, 15], 'C': [250, 300, 150]})
scaler.transform(df_new)
Results
array([[2.25 , 1.8 , 0.44444444],
[3.5 , 1.7 , 0.55555556],
[4.75 , 1.5 , 0.22222222]])
CodePudding user response:
You can also apply the min, max directly using the pd.Series (not a dict)
min_params = df[['A', 'B', 'C']].min()
max_params = df[['A', 'B', 'C']].max()
on df
without the lambda function:
df[['A', 'B', 'C']] = (df[['A', 'B', 'C']] - min_params) / (max_params- min_params)
A B C
0 0.00 1.0 0.111111
1 0.25 0.0 0.333333
2 1.00 0.3 0.000000
3 0.50 0.7 1.000000
and on df_new
:
df_new[['A', 'B', 'C']] = (df_new[['A', 'B', 'C']] - min_params) / (max_params- min_params)
Output:
A B C
0 2.25 1.8 0.444444
1 3.50 1.7 0.555556
2 4.75 1.5 0.222222
Of course this is the exact same job MinMaxScaler is doing.