Home > Software engineering >  pandas merge and merge_ordered does not preserve row order?
pandas merge and merge_ordered does not preserve row order?

Time:07-15

Consider this code snippet:

import pandas as pd

cols = ['x1', 'x2']
df = pd.DataFrame([
    ['s1', 'a', 'a', 12],
    ['s2', 'a', 'b', 7],
    ['s3', 'b', 'a', 14],
    ['s4', 'b', 'b', 8],
    ['s5', 'a', 'a', 19],
    ['s6', 'a', 'b', 16],
    ['s7', 'b', 'a', 14],
    ['s8', 'b', 'b', 10]
], columns=['s', 'x1', 'x2', 'y'])
y_ = df.groupby(cols).mean()
y_.rename(columns={'y': 'y_'}, inplace=True)
print(df, y_)
print(pd.merge(df, y_, on=cols))

The printed data frames are:

df:

    s x1 x2   y
0  s1  a  a  12
1  s2  a  b   7
2  s3  b  a  14
3  s4  b  b   8
4  s5  a  a  19
5  s6  a  b  16
6  s7  b  a  14
7  s8  b  b  10          

y_:

x1 x2  y_
a  a   15.5
   b   11.5
b  a   14.0
   b    9.0

merged:

    s x1 x2   y    y_
0  s1  a  a  12  15.5
1  s5  a  a  19  15.5
2  s2  a  b   7  11.5
3  s6  a  b  16  11.5
4  s3  b  a  14  14.0
5  s7  b  a  14  14.0
6  s4  b  b   8   9.0
7  s8  b  b  10   9.0

As one can see, the merged data frame reordered rows, and the s column does not preserve the original order.

How could I preserve the order of rows when merging two data frames? More specifically, I would expect this output:

    s x1 x2   y    y_
0  s1  a  a  12  15.5
1  s2  a  b   7  11.5
2  s3  b  a  14  14.0
3  s4  b  b   8   9.0
4  s5  a  a  19  15.5
5  s6  a  b  16  11.5
6  s7  b  a  14  14.0
7  s8  b  b  10   9.0

CodePudding user response:

Use left merge on the original df, should be like this:

output = df.merge(y_, how='left', on=cols)

the output:

    s x1 x2   y    y_
0  s1  a  a  12  15.5
1  s2  a  b   7  11.5
2  s3  b  a  14  14.0
3  s4  b  b   8   9.0
4  s5  a  a  19  15.5
5  s6  a  b  16  11.5
6  s7  b  a  14  14.0
7  s8  b  b  10   9.0
  • Related