Home > Software design >  Compare two geodataframes, use the within function to compare the geometry on both dataframe and cre
Compare two geodataframes, use the within function to compare the geometry on both dataframe and cre

Time:08-15

I have two Geodataframes. First GeoDataframe contains polygons columns, and the other one contains points (latitude and longitude). I want to do this to check whether the coordinates I was provided are inside the polygon of the city. Please click on the image below to see both dataframe.

GDF_1 contains polygon/multi-polygon

GDF_2 contains city points (coordinate)

When the ID is in gdf_1.id and gdf_2.id equal to each then uses the within the function to see if the coordinate in gdf_2 is inside the polygon in the gdf_1. For example, the code below would result in True because the coordinate is within the polygon.

poly = gdf_1[gdf_1.id == '17085']['geometry'] 
p1 = gdf_2[gdf_2.id == '17085']['geometry']
p1.within(poly, align=False)

I've been having hard time to iterate both Dataframe and compare them to each other. Is there anyways for me to compare both Dataframe to each other?

Desired output: (This is just an example)

id gdf_2.geometry bool
17085 POINT(19.82092 41.32791) True
4505 POINT(153.02560 -2746738) True
18526 POINT(145.12103 -37.85048) True
5049 POINT(146.36182 -41.18134) False
4484 POINT(150.84249 -33.80261) False

CodePudding user response:

A similar question was answered some years ago. That solution matches points in a list to polygons in a series. I am copying a modified version of that solution which will work with two data frames.

In this code sample polydf is equivalent to your GDF_1 and pointdf is equivalent to your GDF_2. The "any" column in the example output is equivalent to your "bool" column.

from shapely.geometry import Point, Polygon
import geopandas

polydf = geopandas.GeoDataFrame({
    "polygons": ["A","B"],
    "geometry":[Polygon([(5, 5), (5, 13), (13, 13), (13, 5)]),
                Polygon([(10, 10), (10, 15), (15, 15), (15, 10)])]
})

pointdf = geopandas.GeoDataFrame({
    "points": ["a","b","c"],
    "geometry":[Point(3, 3), Point(8, 8), Point(11, 11)]
})

pointdf = pointdf.assign(**{row["polygons"]:pointdf.within(row["geometry"]) for index, row in polydf.iterrows()})

pointdf["any"] = pointdf.any(axis=1,bool_only=True).values

print(pointdf.to_markdown())

Output:

points geometry A B any
0 a POINT (3 3) False False False
1 b POINT (8 8) True False True
2 c POINT (11 11) True True True

This solution iterates over a table row-by-row. This practice is not optimal, so it's possible that a faster solution exists.

CodePudding user response:

The geopandas api docs on geopandas.GeoSeries.within are really great and I recommend giving them a close read. If the two dataframes have an index which can be aligned, passing align=True to many of the spatial ops will tell geopandas to join on the index, acting essentially like a normal pandas join, while performing the spatial op (in this case, within) on each pair of geometries.

So I think the following should do exactly what you’re looking for:

gdf1.set_index("id").within(
    gdf2.set_index("id"),
    align=True,
)

This will be significantly faster than iterating over all rows.

  • Related