Home > Net >  Why merging 2 data frames gives me one with triple the rows
Why merging 2 data frames gives me one with triple the rows

Time:11-20

I have df1:

       x            y no.
0  -17.7    -0.785430  y1
1  -15.0 -3820.085000  y4
2  -12.5     2.138833  y3
..  ....     ........  ..
40  15.6     5.486901  y2
41  19.2     1.980686  y3
42  19.6     9.364718  y2

and df2:

       delta y     x
0     0.053884 -17.7
1     0.085000 -15.0
2     0.143237 -12.5
..    ........  ....
40    0.113099  15.6
41    0.102245  19.2
42    0.235282  19.6

They both have 43 rows, and x column is exactly the same on both.

Somehow when I merge them on x I get a df with 123 rows:

 x            y no.   delta y
0   -17.7    -0.785430  y1  0.053884
1   -15.0 -3820.085000  y4  0.085000
2   -12.5     2.138833  y3  0.143237
3   -12.4     1.721205  y3  0.251180
4   -12.1     2.227343  y2  0.127343
..    ...          ...  ..       ...
118  12.1     1.642526  y3  0.143886
119  14.4  2576.435000  y4  0.171000
120  15.6     5.486901  y2  0.113099
121  19.2     1.980686  y3  0.102245
122  19.6     9.364718  y2  0.235282

My input: final = df1.merge(df2, on="x") x float64 y float64 no. object dtype: object

delta y float64 x float64 dtype: object

x float64 y float64 no. object dtype: object

delta y float64 x float64 dtype: object

x float64 y float64 no. object dtype: object

delta y float64 x float64 dtype: object

df1 = pd.DataFrame({'x': {0: -17.7, 1: -15.0, 2: -12.5, 3: -12.4, 4: -12.1, 5: -11.2, 6: -8.9, 7: -7.5, 8: -7.5, 9: -6.0, 10: -6.0, 11: -4.7, 12: -4.1, 13: -3.8, 14: -3.4, 15: -3.4, 16: -1.9, 17: -1.5, 18: -1.1, 19: -0.4, 20: -0.1, 21: 3.5, 22: 3.8, 23: 5.3, 24: 5.3, 25: 5.3, 26: 5.3, 27: 5.3, 28: 5.3, 29: 5.3, 30: 5.3, 31: 5.3, 32: 6.4, 33: 6.8, 34: 6.8, 35: 10.2, 36: 10.3, 37: 11.9, 38: 12.1, 39: 14.4, 40: 15.6, 41: 19.2, 42: 19.6}, 'y': {0: -0.7854295, 1: -3820.085, 2: 2.1388333, 3: 1.7212046, 4: 2.227343, 5: 0.04315967, 6: -0.9616607, 7: -1.9878536, 8: -0.52237016, 9: -283.27216, 10: -282.5332, 11: -0.4335017, 12: -1.1585577, 13: -0.008831219, 14: 848.92303, 15: -57.407845, 16: -9.010686, 17: -3.2473037, 18: 0.5536767, 19: 1.8351307, 20: 4.8347697, 21: -6.45842, 22: -1.5683812, 23: 0.9338831, 24: 0.9338831, 25: 97.65833, 26: 1.6500127, 27: 1.6500127, 28: 97.65833, 29: 97.65833, 30: 1.6500127, 31: 97.65833, 32: -3.655422, 33: 1.9058462, 34: 227.5592, 35: 857.7455, 36: -0.68584794, 37: 1.6785516, 38: 1.6425261, 39: 2576.435, 40: 5.4869013, 41: 1.9806856, 42: 9.364718}, 'no.': {0: 'y1', 1: 'y4', 2: 'y3', 3: 'y3', 4: 'y2', 5: 'y3', 6: 'y2', 7: 'y2', 8: 'y2', 9: 'y4', 10: 'y4', 11: 'y1', 12: 'y3', 13: 'y1', 14: 'y4', 15: 'y4', 16: 'y4', 17: 'y4', 18: 'y1', 19: 'y3', 20: 'y4', 21: 'y2', 22: 'y3', 23: 'y3', 24: 'y3', 25: 'y4', 26: 'y3', 27: 'y3', 28: 'y4', 29: 'y3', 30: 'y4', 31: 'y4', 32: 'y2', 33: 'y3', 34: 'y3', 35: 'y4', 36: 'y3', 37: 'y3', 38: 'y3', 39: 'y4', 40: 'y2', 41: 'y3', 42: 'y2'}})

df2 = pd.DataFrame({'delta y': {0: 0.05388353000000001, 1: 0.08500000000003638, 2: 0.14323679999999994, 3: 0.25117999999999996, 4: 0.12734299999999976, 5: 0.36285006000000003, 6: 0.13833930000000005, 7: 0.5121464, 8: 1.97762984, 9: 0.2721599999999853, 10: 0.4667999999999779, 11: 0.2692114, 12: 0.00890970000000002, 13: 0.314458351, 14: 906.34703, 15: 0.0161549999999977, 16: 0.06831400000000087, 17: 0.3723036999999998, 18: 0.2988478, 19: 0.006991300000000145, 20: 0.14423030000000026, 21: 0.04157999999999973, 22: 0.013554200000000183, 23: 0.17486560000000007, 24: 0.17486560000000007, 25: 0.03866999999999621, 26: 0.541264, 27: 0.541264, 28: 0.03866999999999621, 29: 96.5495813, 30: 96.0469873, 31: 0.03866999999999621, 32: 0.05542200000000008, 33: 0.1670513, 34: 225.82040510000002, 35: 0.38250000000005, 36: 0.59580486, 37: 0.10641100000000003, 38: 0.14388610000000002, 39: 0.17099999999982174, 40: 0.11309869999999922, 41: 0.10224489999999986, 42: 0.23528199999999977}, 'x': {0: -17.7, 1: -15.0, 2: -12.5, 3: -12.4, 4: -12.1, 5: -11.2, 6: -8.9, 7: -7.5, 8: -7.5, 9: -6.0, 10: -6.0, 11: -4.7, 12: -4.1, 13: -3.8, 14: -3.4, 15: -3.4, 16: -1.9, 17: -1.5, 18: -1.1, 19: -0.4, 20: -0.1, 21: 3.5, 22: 3.8, 23: 5.3, 24: 5.3, 25: 5.3, 26: 5.3, 27: 5.3, 28: 5.3, 29: 5.3, 30: 5.3, 31: 5.3, 32: 6.4, 33: 6.8, 34: 6.8, 35: 10.2, 36: 10.3, 37: 11.9, 38: 12.1, 39: 14.4, 40: 15.6, 41: 19.2, 42: 19.6}})

final = df1.merge(df2, on="x")

CodePudding user response:

try the following: df1.join(df2)

join is a column-wise left join

pd.merge is a column-wise inner join

pd.concat is a row-wise outer join

pd.concat: takes Iterable arguments. Thus, it cannot take DataFrames directly (use [df,df2]) Dimensions of DataFrame should match along axis

Join and pd.merge: can take DataFrame arguments

ref: Merge two dataframes by index

CodePudding user response:

Try the following syntax and I encourage you to thoroughly read the official documentation of python, the link is given at the bottom. I think you might have different x values in df1 and df2 and they are not 100% identical. This could be perhaps because of the decimals.

import pandas as pd

left = pd.DataFrame(
   {
       "key": ["K0", "K1", "K2", "K3"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
     }
 )


right = pd.DataFrame(
     {
        "key": ["K0", "K1", "K2", "K3"],
         "C": ["C0", "C1", "C2", "C3"],
         "D": ["D0", "D1", "D2", "D3"],
     }
 )
 

result = pd.merge(left, right, on="key")

Result Image

Python Merge,Join, Concatenate Official Guide

  • Related