I have a data frame and I want to normalize each number based on the minimum of that row and the maximum of that row based on this formulation.
x_normalized = (x_unnormalized-x_min)/(x_max-x_min).
I've check the scikit-learn package and I could not find any function for that. Could you help me with this? I also provide a sample as follows and what I want.
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['id'] = [a, b, c]
df['c1'] = [2, 5, 3]
df['c2'] = [0, 5, 6]
df['c3'] = [8, 7, 9]
print(df)
#here is the dataframe which i want
df = pd.DataFrame()
df['id'] = [a, b, c]
df['c1'] = [1/4, 0, 0]
df['c2'] = [0, 0, 0.5]
df['c3'] = [1, 1, 1]
df
CodePudding user response:
It looks like there is a typo in your output.
You can use simple vectorial operations:
def norm(df):
MIN = df.min(1)
MAX = df.max(1)
return df.sub(MIN, 0).div(MAX-MIN, 0)
df2 = norm(df)
output:
c1 c2 c3
0 0.25 0.0 1.0
1 0.00 0.0 1.0
2 0.00 0.5 1.0
axis-aware version:
def norm(df, axis=1):
MIN = df.min(axis)
MAX = df.max(axis)
return df.sub(MIN, 1-axis).div(MAX-MIN, 1-axis)
norm(df, axis=0)
output:
c1 c2 c3
0 0.000000 0.000000 0.5
1 1.000000 0.833333 0.0
2 0.333333 1.000000 1.0