Home > other >  Merge two pandas dataframe with small margin of errors on each row's selected value?
Merge two pandas dataframe with small margin of errors on each row's selected value?

Time:03-17

Two dataframes' key1 are increasing at linear rate, but there might be some error.

df1

key1 key2 key 3
  0   1    2
 20   3    4
 40   5    6    <- this row
 60   7    8

df2

key1 key4 key5
  0   9    10
 20   11   12
 39   13   14    <- this row
 60   15   16

After merging df1 and df2, the result should be:

df3

key1 key2 key3 key4 key5
  0   1    2    9    10
 20   3    4    11   12
 40   5    6    13   14    <- this row
 60   7    8    15   16

The value used for merged df on the row should be a int that is rounded arithmetic mean.

I tried to create a nested for loop going through df1 and df2 and check the values on key1, if the difference on value is larger than 1, I'll just append a new row to df3 with nan in the missing part.

eg:

some row in df1:

key1 key2 key 3
100   20   21

some row in df2:

key1 key4 key5
105   22   23

What should be added to df3:

key1 key2 key3 key4 key5
100   20   21   nan  nan
105   nan  nan  22   23

This approach is very slow and my computer only finished 4% of the processing overnight.

CodePudding user response:

pd.merge_asof was intended for this:

df3 = pd.merge_asof(df1, df2, on='key1')

Output:

>>> df3
   key1  key2  key3  key4  key5
0     0     1     2     9    10
1    20     3     4    11    12
2    40     5     6    13    14
3    60     7     8    15    16
  • Related