Home > Back-end >  Merging on closest value Pandas
Merging on closest value Pandas

Time:10-06

I am currently working on a project where I match as well as merge two excel documents by meter indication (km). The two dataframes contain the same type of data but are slightly different. My solution to this problem was to divide each dataframe by 100 to more easily match the columns. However, as I predicted there will be occasions where this doesn't work, making this a temporary solution.

I have two dataframes:

1.

Meter_indication      Fuel1
1180784               275
1181278               280
1181791               300
1182285               280
1182801               300
1183295               280
1183717               250
Meter_indication      Fuel2
1180785               278
1181282               282
1181800               310
1182401               282
1182824               320
1183310               215
1183727               250

As you can see these dataframes are containing the same type of information but are slightly off. I have previously used the method of merge merged_df = df1filt2.merge(df2filt, on='Meter_indication') which does only merge values that perfectly match. However, in this scenario this method is irrelevant.

My desired output is:

Meter_indication      Fuel1      Fuel2
1180784               275        278
1181278               280        282
1181791               300        310
1182285               280        282
1182801               300        320
1183295               280        215
1183717               250        250

As you can see, the dataframes have merged on "Meter_indication" and have merged by finding the closest value compared to itself.

I have looked around for others with similar problems and have tried many different suggestions such as GRAPH 1 (CURRENT)

Desired graph output:

GRAPH 2 (DESIRED)

With the data (used for graph) sometimes looks like this:

DATA USED FOR CURRENT GRAPH (where yellow numbers are "fuel1" and the numbers to the right of it are "Meter_indication")

CodePudding user response:

  • merge_asof() does work. matches your expected output
  • have not made assumptions about sorting, so did sort_values()
  • for transparency include right Meter_indication in output
df1 = pd.read_csv(io.StringIO("""Meter_indication      Fuel1
1180784               275
1181278               280
1181791               300
1182285               280
1182801               300
1183295               280
1183717               250"""), sep="\s ")

df2 = pd.read_csv(io.StringIO("""Meter_indication      Fuel2
1180785               278
1181282               282
1181800               310
1182401               282
1182824               320
1183310               215
1183727               250"""), sep="\s ")

pd.merge_asof(
    df1.sort_values("Meter_indication"),
    df2.sort_values("Meter_indication").assign(mi=lambda d: d["Meter_indication"]),
    on="Meter_indication",
    direction="nearest",
)

Meter_indication Fuel1 Fuel2 mi
1180784 275 278 1180785
1181278 280 282 1181282
1181791 300 310 1181800
1182285 280 282 1182401
1182801 300 320 1182824
1183295 280 215 1183310
1183717 250 250 1183727
  • Related