There is a dataframe like this
df
group | data | other |
---|---|---|
A | 1 | a |
A | 2 | b |
A | 3 | ad |
A | 4 | aw |
A | 5 | ad |
B | 100 | ta |
B | 200 | as |
B | 300 | ab |
B | 400 | ax |
B | 500 | ad |
I would like to groupby("group") then apply standard_scaler().fit_transform() in each group
( I test with only data with a single group in data from with this code which is working but I having problem when group data >1
df['data'] = pd.DataFrame(scaler.fit_transform(df.groupby('group').data.values.reshape(-1,1)))
)
I was wondering is there a way to solve this with multiple group and apply scaler in each group?
Edited: My Desire output would be
group | data | other |
---|---|---|
A | -1.414 | a |
A | -0.7071 | b |
A | 0 | ad |
A | 0.7071 | aw |
A | 1.414 | ad |
B | -1.414 | ta |
B | -0.7071 | as |
B | 0 | ab |
B | 0.7071 | ax |
B | 1.414 | ad |
where data is transform data of normalized data
CodePudding user response:
Use:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
def sc(row):
return scaler.fit_transform(row.values.reshape(-1,1))
df.groupby('group').agg(sc)
Please, note that your other column is not numeric so you can not apply standard scaler on that. I tried the above code with two numeric columns.
Demonstration for two numeric columns:
d = '''group data other
A 1 a
A 2 b
A 3 ad
A 4 aw
A 5 ad
B 100 ta
B 200 as
B 300 ab
B 400 ax
B 500 ad'''
data = [x.split(' ') for x in d.split('\n') if x!='']
df = pd.DataFrame(data[1:], columns = data[0])
df['other'] = df['data']*2
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
def sc(row):
return scaler.fit_transform(row.values.reshape(-1,1))
df.groupby('group').agg(sc)
Output:
data other
group
A [[-1.414213562373095], [-0.7071067811865475], ... [[-1.4142135623730951], [-0.7071067811865476],...
B [[-1.414213562373095], [-0.7071067811865475], ... [[-1.4142135623730951], [-0.7071067811865476],...
Based on the comment:
df['new'] = df.groupby('group')['data'].apply(sc).explode().values.astype(float)