pandas how to applied fit_transform on standard scaler on group by data-CodePudding

There is a dataframe like this

group	data	other
A	1	a
A	2	b
A	3	ad
A	4	aw
A	5	ad
B	100	ta
B	200	as
B	300	ab
B	400	ax
B	500	ad

I would like to groupby("group") then apply standard_scaler().fit_transform() in each group

( I test with only data with a single group in data from with this code which is working but I having problem when group data >1

df['data'] = pd.DataFrame(scaler.fit_transform(df.groupby('group').data.values.reshape(-1,1)))

)

I was wondering is there a way to solve this with multiple group and apply scaler in each group?

Edited: My Desire output would be

group	data	other
A	-1.414	a
A	-0.7071	b
A	0	ad
A	0.7071	aw
A	1.414	ad
B	-1.414	ta
B	-0.7071	as
B	0	ab
B	0.7071	ax
B	1.414	ad

where data is transform data of normalized data

CodePudding user response：

Use:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
def sc(row):
    return scaler.fit_transform(row.values.reshape(-1,1))

df.groupby('group').agg(sc)

Please, note that your other column is not numeric so you can not apply standard scaler on that. I tried the above code with two numeric columns.

Demonstration for two numeric columns:

d = '''group    data    other
A   1   a
A   2   b
A   3   ad
A   4   aw
A   5   ad
B   100 ta
B   200 as
B   300 ab
B   400 ax
B   500 ad'''
data = [x.split('   ') for x in d.split('\n') if x!='']
df = pd.DataFrame(data[1:], columns = data[0])
df['other'] = df['data']*2

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
def sc(row):
    return scaler.fit_transform(row.values.reshape(-1,1))

df.groupby('group').agg(sc)

Output:

    data    other
group       
A   [[-1.414213562373095], [-0.7071067811865475], ...   [[-1.4142135623730951], [-0.7071067811865476],...
B   [[-1.414213562373095], [-0.7071067811865475], ...   [[-1.4142135623730951], [-0.7071067811865476],...

Based on the comment:

df['new'] = df.groupby('group')['data'].apply(sc).explode().values.astype(float)