I have multiple variables in my data frame with negative and positive values. Thus I'd like to normalize/scale the variables between -1, 1. I didnt find a working solution. Any suggestions? Thanks a lot!
I scaled other variables with the sklearn MinMaxScaler 0, 1. Didn't find an additional -1, 1 solution there.
CodePudding user response:
Min max scaler uses a mathematical formula that converts values between 0,1 not -1,1
if you want values between -1,1 try sklean's
StandardScaler
.
Hope this helps.
CodePudding user response:
Here is a mathematical answer to your question:
result = (row - min_col) * (high - low) / (max_col - min_col) low
Where:
- row = the number to be transformed in the row
- min_col = the minimum value in the column
- max_col = the maximum value in the column
- low = the minimum value of the transformed results (-1)
- high = the maximum value of the transformed results ( 1)
Here is the code:
import random
import pandas as pd
# Generate some random numbers (-300 to 300) and place into a dataframe (df)
res = [random.randint(-300,300) for i in range(100)]
df = pd.DataFrame({"data":res})
# Function to transform the rows of a column (between -1 and 1)
def transform(row, min_col, max_col, low = -1, high=1):
result = (row - min_col) * (high - low) / (max_col - min_col) low;
return result
# Identify the minimum and the maximum of the column in question
column_min = min(df['data'])
column_max = max(df['data'])
# Generate a new column with the transformed values
df['transformed'] = df['data'].apply(transform, min_col = column_min, max_col=column_max)
# Print the dataframe
print(df)
df OUTPUT: