I have a GeoDataframe of about 3200 polygons, and another GeoDataframe of about 26,000 points. I want to get a third GeoDataframe of only the polygons that contain at least one point. This seems like it should be a simple sjoin
, but geopandas.sjoin(polygons, points, predicate='contains')
returns a GeoDataframe with more polygons than I started with (and very near the number of input points). Examining this GeoDataframe shows that there seem to be some duplicate polygons, perhaps explaining why I have more polygons than I expected. How do I find only the polygons that contain any point without duplicates?
CodePudding user response:
The how argument in the sjoin
method seems to give the solution to this problem. It allows you to choose on which geodaframe you apply it. Here we want to have only the polygons so we use the indexes of the polygons geodaframe: geopandas.sjoin(polygons, points, how='left', op='contains')
. This link in the doc provides more specific information: https://geopandas.org/en/stable/docs/user_guide/mergingdata.html#binary-predicate-joins
CodePudding user response:
I found a workaround, although I feel like it's not the best solution. My polygons have a unique ID column on which I was able to remove duplicates:
geopandas.sjoin(polygons, points, predicate='contains').drop_duplicates(subset=['UNIQUE_ID'], keep='first')