I have two Geodataframes. First GeoDataframe contains polygons columns, and the other one contains points (latitude and longitude). I want to do this to check whether the coordinates I was provided are inside the polygon of the city. Please click on the image below to see both dataframe.
GDF_1 contains polygon/multi-polygon
GDF_2 contains city points (coordinate)
When the ID is in gdf_1.id and gdf_2.id equal to each then uses the within the function to see if the coordinate in gdf_2 is inside the polygon in the gdf_1. For example, the code below would result in True because the coordinate is within the polygon.
poly = gdf_1[gdf_1.id == '17085']['geometry']
p1 = gdf_2[gdf_2.id == '17085']['geometry']
p1.within(poly, align=False)
I've been having hard time to iterate both Dataframe and compare them to each other. Is there anyways for me to compare both Dataframe to each other?
Desired output: (This is just an example)
id | gdf_2.geometry | bool |
---|---|---|
17085 | POINT(19.82092 41.32791) | True |
4505 | POINT(153.02560 -2746738) | True |
18526 | POINT(145.12103 -37.85048) | True |
5049 | POINT(146.36182 -41.18134) | False |
4484 | POINT(150.84249 -33.80261) | False |
CodePudding user response:
A similar question was answered some years ago. That solution matches points in a list to polygons in a series. I am copying a modified version of that solution which will work with two data frames.
In this code sample polydf
is equivalent to your GDF_1
and pointdf
is equivalent to your GDF_2
. The "any" column in the example output is equivalent to your "bool" column.
from shapely.geometry import Point, Polygon
import geopandas
polydf = geopandas.GeoDataFrame({
"polygons": ["A","B"],
"geometry":[Polygon([(5, 5), (5, 13), (13, 13), (13, 5)]),
Polygon([(10, 10), (10, 15), (15, 15), (15, 10)])]
})
pointdf = geopandas.GeoDataFrame({
"points": ["a","b","c"],
"geometry":[Point(3, 3), Point(8, 8), Point(11, 11)]
})
pointdf = pointdf.assign(**{row["polygons"]:pointdf.within(row["geometry"]) for index, row in polydf.iterrows()})
pointdf["any"] = pointdf.any(axis=1,bool_only=True).values
print(pointdf.to_markdown())
Output:
points | geometry | A | B | any | |
---|---|---|---|---|---|
0 | a | POINT (3 3) | False | False | False |
1 | b | POINT (8 8) | True | False | True |
2 | c | POINT (11 11) | True | True | True |
This solution iterates over a table row-by-row. This practice is not optimal, so it's possible that a faster solution exists.
CodePudding user response:
The geopandas api docs on geopandas.GeoSeries.within
are really great and I recommend giving them a close read. If the two dataframes have an index which can be aligned, passing align=True
to many of the spatial ops will tell geopandas to join on the index, acting essentially like a normal pandas join, while performing the spatial op (in this case, within) on each pair of geometries.
So I think the following should do exactly what you’re looking for:
gdf1.set_index("id").within(
gdf2.set_index("id"),
align=True,
)
This will be significantly faster than iterating over all rows.