Home > Enterprise >  the usage of pd.concat and pd.merage_asof
the usage of pd.concat and pd.merage_asof

Time:10-07

I once saw the usage of pd.concat and pd.merge_asof as follows:

df = pd.concat([
    pd.merge_asof(
        df1,
        df2,
        left_on = "m",
        right_on = "m",
        direction = "nearest",
        tolerance = 3
    )
 ])

I am not very clear what it is trying to do, and why we need pd.concat over the result of pd.merge_asof. How is tolerance = 3 used for?

CodePudding user response:

The code here is not finish

df = pd.concat([
    pd.merge_asof(
        df1,
        df2,
        left_on = "m",
        right_on = "m",
        direction = "nearest",
        tolerance = 3
    ),somedf])

Also , for the tolerance, if the different is less than 3 it will return match , if more than 3 then it do not consider as match

CodePudding user response:

pd.merge_asof works like a left join, but it is not usually used to match only the exact value.

Explaining your code: it is creating a "left join" between df1 (left) and df2 (right) using the "m" column as key. What pd.merge_asof does is to look for an exact match. If it wasn't found it brings the new value looking backward by default (direction = "backward" is the default and the dfs must be sorted).

The direction: In this case in particular the direction value is defined as "nearest". So, It will look for a match and if it was not found here will look backward and forward and calculate the difference (distance). The result with the lower difference will be the winner!

What is tolerance: When looking backward and forward, here maximum difference allowed is 3. If one value exceed this limit will not be allowed to join and the other one will be the winner. If both values exceed it will return NaN.

About pd.concat: Well, it is just concatenating somedf below this one (default when note declaring the axis)

  • Related