Two dataframes' key1
are increasing at linear rate, but there might be some error.
df1
key1 key2 key 3
0 1 2
20 3 4
40 5 6 <- this row
60 7 8
df2
key1 key4 key5
0 9 10
20 11 12
39 13 14 <- this row
60 15 16
After merging df1 and df2, the result should be:
df3
key1 key2 key3 key4 key5
0 1 2 9 10
20 3 4 11 12
40 5 6 13 14 <- this row
60 7 8 15 16
The value used for merged df on the row should be a int that is rounded arithmetic mean.
I tried to create a nested for loop going through df1 and df2 and check the values on key1
, if the difference on value is larger than 1, I'll just append a new row to df3 with nan
in the missing part.
eg:
some row in df1:
key1 key2 key 3
100 20 21
some row in df2:
key1 key4 key5
105 22 23
What should be added to df3:
key1 key2 key3 key4 key5
100 20 21 nan nan
105 nan nan 22 23
This approach is very slow and my computer only finished 4% of the processing overnight.
CodePudding user response:
pd.merge_asof
was intended for this:
df3 = pd.merge_asof(df1, df2, on='key1')
Output:
>>> df3
key1 key2 key3 key4 key5
0 0 1 2 9 10
1 20 3 4 11 12
2 40 5 6 13 14
3 60 7 8 15 16