Home > Blockchain >  Data frame normalization center = 0 solution (-1, 1)?
Data frame normalization center = 0 solution (-1, 1)?

Time:10-29

I have multiple variables in my data frame with negative and positive values. Thus I'd like to normalize/scale the variables between -1, 1. I didnt find a working solution. Any suggestions? Thanks a lot!

I scaled other variables with the sklearn MinMaxScaler 0, 1. Didn't find an additional -1, 1 solution there.

CodePudding user response:

Min max scaler uses a mathematical formula that converts values between 0,1 not -1,1

if you want values between -1,1 try sklean's StandardScaler.

Hope this helps.

CodePudding user response:

Here is a mathematical answer to your question:

result = (row - min_col) * (high - low) / (max_col - min_col) low

Where:

  • row = the number to be transformed in the row
  • min_col = the minimum value in the column
  • max_col = the maximum value in the column
  • low = the minimum value of the transformed results (-1)
  • high = the maximum value of the transformed results ( 1)


Here is the code:

import random
import pandas as pd

# Generate some random numbers (-300 to 300) and place into a dataframe (df)
res = [random.randint(-300,300) for i in range(100)]
df = pd.DataFrame({"data":res})


# Function to transform the rows of a column (between -1 and 1)
def transform(row, min_col, max_col, low = -1, high=1):
    result = (row - min_col) * (high - low) / (max_col - min_col)   low;
    return result


# Identify the minimum and the maximum of the column in question
column_min = min(df['data'])
column_max = max(df['data'])

# Generate a new column with the transformed values
df['transformed'] = df['data'].apply(transform, min_col = column_min, max_col=column_max)

# Print the dataframe
print(df)


df OUTPUT:

enter image description here

  • Related