Home > Software engineering >  Can we use 'pd.merge_asof' to find closest matching latitude and longitude coordinates?
Can we use 'pd.merge_asof' to find closest matching latitude and longitude coordinates?

Time:12-11

I just came across the merge_asof recently and found it to be great for merging two dataframes with similar but slightly different times. Can we use this technique to merge two dataframes based on lat & lon coordinates, rather than times? One of my data frames looks like this.

            Latitude          Longitude                    geometry  
0           40.457794         -86.914398                   POINT (40.45779 -86.91440)  
123         40.457794         -86.914398                   POINT (40.45779 -86.91440)  
246         40.457794         -86.914398                   POINT (40.45779 -86.91440)  
369         40.457794         -86.914398                   POINT (40.45779 -86.91440)  
492         40.457794         -86.914398                   POINT (40.45779 -86.91440) 

The other looks like this.

        Vehicle_ID      Latitude          Longitude                    geometry
0       1233            39.355            -85.220                      POINT (39.35500 -85.22000)
1       3033            40.429            -84.346                      POINT (40.42900 -84.34600)
2       2202            39.125            -84.823                      POINT (39.12500 -84.82300)
3       4011            40.892            -85.974                      POINT (40.89200 -85.97400)
4       4432            40.862            -84.371                      POINT (40.86200 -84.37100)

I'm trying to follow the documentation here.

https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.merge_asof.html

I tried the following ideas.

df_final = pd.merge_asof(gdf1,gdf2[['geometry']],on='geometry',direction='nearest')

df_final = pd.merge_asof(gdf1, gdf2, on='geometry', direction='nearest')

df_final = pd.merge_asof(df_merged,df_gps['Circuit_Latitude'].sort_values('Circuit_Latitude'),on='Circuit_Latitude')

Nothing is working. I tried to use geopandas to do the merge, but I couldn't get the library installed. BTW, this doesn't have to be super accurate. If the lat & lon are 3, 4, or 5 miles away, it's fine. I'm just trying to get something in the ballpark area to match up! Or, is there a better way to do this kind of thing? Thanks.

CodePudding user response:

I am guessing that the issue here is the different types of dataframes or some sort of incompatibility between the libraries

What I would do is to check the types of your dataframes, see if they are actually pandas Dataframe, if not I would convert them to the type the method is expecting

gpd_pd1 = pd.DataFrame(gdf1)
gpd_pd2 = pd.DataFrame(gdf2)

And then do the merge_asof method, the usage of the method itself looks alright to me

pd.merge_asof(gpd_pd1, gpd_pd2, on='geometry', direction='nearest')

CodePudding user response:

I don't think 'pd-merge-asof' handles lat & lon coordinates. This worked for me

import pandas as pd
df1 = pd.read_csv('C:\\Users\\ryans\\Desktop\\df1.csv')
df2 = pd.read_csv('C:\\Users\\ryans\\Desktop\\df2.csv')

# must be float64
print(df1.dtypes)
print(df2.dtypes)

import geopandas

gdf_merged = geopandas.GeoDataFrame(df_merged, geometry=geopandas.points_from_xy(df_merged.Latitude, df_merged.Longitude))
gdf_gps = geopandas.GeoDataFrame(df_gps, geometry=geopandas.points_from_xy(df_gps.Latitude, df_gps.Longitude))

df_final = geopandas.sjoin_nearest(gdf_merged, gdf_gps)
df_final.head()
  • Related