from sklearn.preprocessing import scale
df = pd.DataFrame({'x':['a','a','a','a','b','b','b','b'], 'y':[1,1,2,2,1,1,2,2], 'z':[12,32,14,64,24,67,44,33]})
I'm trying to scale column z
for each combination of x
and y
:
x y z z2
0 a 1 1 -1.22
1 a 1 2 0
2 a 1 3 1.22
3 a 2 3 -1.07
4 a 2 4 -0.27
5 a 2 6 1.34
6 b 1 4 0.71
7 b 1 2 -1.41
8 b 1 4 0.71
9 b 2 6 1.34
10 b 2 4 -0.27
11 b 2 3 -1.07
I tried
df['z2'] = df.groupby(['x','y'])['z'].apply(scale)
but this returns the error
TypeError: incompatible index of inserted column with frame index
CodePudding user response:
A possible solution is to use pandas.DataFrame.transform
:
df['z3'] = df.groupby(['x', 'y'])['z'].transform(scale)
Output:
x y z z2 z3
0 a 1 1 -1.22 -1.224745
1 a 1 2 0.00 0.000000
2 a 1 3 1.22 1.224745
3 a 2 3 -1.07 -1.069045
4 a 2 4 -0.27 -0.267261
5 a 2 6 1.34 1.336306
6 b 1 4 0.71 0.707107
7 b 1 2 -1.41 -1.414214
8 b 1 4 0.71 0.707107
9 b 2 6 1.34 1.336306
10 b 2 4 -0.27 -0.267261
11 b 2 3 -1.07 -1.069045
EXPLATION OF THE ERROR YOU GOT
The instruction
df.groupby(['x','y'])['z'].apply(scale)
produces the following output:
x y
a 1 [-1.224744871391589, 0.0, 1.224744871391589]
2 [-1.0690449676496974, -0.2672612419124242, 1.3...
b 1 [0.7071067811865474, -1.4142135623730951, 0.70...
2 [1.3363062095621223, -0.2672612419124242, -1.0...
Name: z, dtype: object
Given the discrepancy between the index of df
and the index of the dataframe above (it is a MultiIndex), the inconsistency emerges when running
df['z2'] = df.groupby(['x','y'])['z'].apply(scale)
which causes the error.