Consider this code snippet:
import pandas as pd
cols = ['x1', 'x2']
df = pd.DataFrame([
['s1', 'a', 'a', 12],
['s2', 'a', 'b', 7],
['s3', 'b', 'a', 14],
['s4', 'b', 'b', 8],
['s5', 'a', 'a', 19],
['s6', 'a', 'b', 16],
['s7', 'b', 'a', 14],
['s8', 'b', 'b', 10]
], columns=['s', 'x1', 'x2', 'y'])
y_ = df.groupby(cols).mean()
y_.rename(columns={'y': 'y_'}, inplace=True)
print(df, y_)
print(pd.merge(df, y_, on=cols))
The printed data frames are:
df:
s x1 x2 y
0 s1 a a 12
1 s2 a b 7
2 s3 b a 14
3 s4 b b 8
4 s5 a a 19
5 s6 a b 16
6 s7 b a 14
7 s8 b b 10
y_:
x1 x2 y_
a a 15.5
b 11.5
b a 14.0
b 9.0
merged:
s x1 x2 y y_
0 s1 a a 12 15.5
1 s5 a a 19 15.5
2 s2 a b 7 11.5
3 s6 a b 16 11.5
4 s3 b a 14 14.0
5 s7 b a 14 14.0
6 s4 b b 8 9.0
7 s8 b b 10 9.0
As one can see, the merged data frame reordered rows, and the s
column does not preserve the original order.
How could I preserve the order of rows when merging two data frames? More specifically, I would expect this output:
s x1 x2 y y_
0 s1 a a 12 15.5
1 s2 a b 7 11.5
2 s3 b a 14 14.0
3 s4 b b 8 9.0
4 s5 a a 19 15.5
5 s6 a b 16 11.5
6 s7 b a 14 14.0
7 s8 b b 10 9.0
CodePudding user response:
Use left merge on the original df
, should be like this:
output = df.merge(y_, how='left', on=cols)
the output:
s x1 x2 y y_
0 s1 a a 12 15.5
1 s2 a b 7 11.5
2 s3 b a 14 14.0
3 s4 b b 8 9.0
4 s5 a a 19 15.5
5 s6 a b 16 11.5
6 s7 b a 14 14.0
7 s8 b b 10 9.0