Home > Net >  pandas how to applied fit_transform on standard scaler on group by data
pandas how to applied fit_transform on standard scaler on group by data

Time:07-28

There is a dataframe like this

df

group data other
A 1 a
A 2 b
A 3 ad
A 4 aw
A 5 ad
B 100 ta
B 200 as
B 300 ab
B 400 ax
B 500 ad

I would like to groupby("group") then apply standard_scaler().fit_transform() in each group

( I test with only data with a single group in data from with this code which is working but I having problem when group data >1

df['data'] = pd.DataFrame(scaler.fit_transform(df.groupby('group').data.values.reshape(-1,1)))

)

I was wondering is there a way to solve this with multiple group and apply scaler in each group?

Edited: My Desire output would be

group data other
A -1.414 a
A -0.7071 b
A 0 ad
A 0.7071 aw
A 1.414 ad
B -1.414 ta
B -0.7071 as
B 0 ab
B 0.7071 ax
B 1.414 ad

where data is transform data of normalized data

CodePudding user response:

Use:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
def sc(row):
    return scaler.fit_transform(row.values.reshape(-1,1))

df.groupby('group').agg(sc)

Please, note that your other column is not numeric so you can not apply standard scaler on that. I tried the above code with two numeric columns.

Demonstration for two numeric columns:

d = '''group    data    other
A   1   a
A   2   b
A   3   ad
A   4   aw
A   5   ad
B   100 ta
B   200 as
B   300 ab
B   400 ax
B   500 ad'''
data = [x.split('   ') for x in d.split('\n') if x!='']
df = pd.DataFrame(data[1:], columns = data[0])
df['other'] = df['data']*2

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
def sc(row):
    return scaler.fit_transform(row.values.reshape(-1,1))

df.groupby('group').agg(sc)

Output:

    data    other
group       
A   [[-1.414213562373095], [-0.7071067811865475], ...   [[-1.4142135623730951], [-0.7071067811865476],...
B   [[-1.414213562373095], [-0.7071067811865475], ...   [[-1.4142135623730951], [-0.7071067811865476],...

Based on the comment:

df['new'] = df.groupby('group')['data'].apply(sc).explode().values.astype(float)
  • Related