Merging on closest value Pandas-CodePudding

I am currently working on a project where I match as well as merge two excel documents by meter indication (km). The two dataframes contain the same type of data but are slightly different. My solution to this problem was to divide each dataframe by 100 to more easily match the columns. However, as I predicted there will be occasions where this doesn't work, making this a temporary solution.

I have two dataframes:

Meter_indication      Fuel1
1180784               275
1181278               280
1181791               300
1182285               280
1182801               300
1183295               280
1183717               250

Meter_indication      Fuel2
1180785               278
1181282               282
1181800               310
1182401               282
1182824               320
1183310               215
1183727               250

As you can see these dataframes are containing the same type of information but are slightly off. I have previously used the method of merge merged_df = df1filt2.merge(df2filt, on='Meter_indication') which does only merge values that perfectly match. However, in this scenario this method is irrelevant.

My desired output is:

Meter_indication      Fuel1      Fuel2
1180784               275        278
1181278               280        282
1181791               300        310
1182285               280        282
1182801               300        320
1183295               280        215
1183717               250        250

As you can see, the dataframes have merged on "Meter_indication" and have merged by finding the closest value compared to itself.

I have looked around for others with similar problems and have tried many different suggestions such as

Desired graph output:

With the data (used for graph) sometimes looks like this:

CodePudding user response：

merge_asof() does work. matches your expected output
have not made assumptions about sorting, so did sort_values()
for transparency include right Meter_indication in output

df1 = pd.read_csv(io.StringIO("""Meter_indication      Fuel1
1180784               275
1181278               280
1181791               300
1182285               280
1182801               300
1183295               280
1183717               250"""), sep="\s ")

df2 = pd.read_csv(io.StringIO("""Meter_indication      Fuel2
1180785               278
1181282               282
1181800               310
1182401               282
1182824               320
1183310               215
1183727               250"""), sep="\s ")

pd.merge_asof(
    df1.sort_values("Meter_indication"),
    df2.sort_values("Meter_indication").assign(mi=lambda d: d["Meter_indication"]),
    on="Meter_indication",
    direction="nearest",
)

Meter_indication	Fuel1	Fuel2	mi
1180784	275	278	1180785
1181278	280	282	1181282
1181791	300	310	1181800
1182285	280	282	1182401
1182801	300	320	1182824
1183295	280	215	1183310
1183717	250	250	1183727