Home > other >  Pandas DataFrame to GeoDataFrame with Polygon geometry using groupby and lambda
Pandas DataFrame to GeoDataFrame with Polygon geometry using groupby and lambda

Time:10-22

I have a pandas DataFrame like this

name    loc_x    loc_y    grp_name
a1        1.0        2.0    set1
a2        2.0        3.0    set1
a3        3.2        4.1    set2
a4        7.9        4.2    set2

I want to generate a GeoDataFrame that generates a polygon using loc_x and loc_y grouped on grp_name and also includes a column name that has the values in my original data frame concatenated by |? The result should look like this

        name    geometry
set1    a1|a2   POLYGON ((1.0, 2.0)...)
set2    a3|a4   POLYGON ((3.2, 4.1)...)

I do this to get the geometry column but how do I also get an additional column with name concatenated from my base data frame?

gdf = gpd.GeoDataFrame(geometry=df.groupby('grp_name').apply(
      lambda g: Polygon(gpd.points_from_xy(g['loc_x'], g['loc_y']))))

CodePudding user response:

  • required a modification to your test data. A polygon has a minimum of three points
  • this comes down to knowing pandas. groupby().apply() provides a reference to dataframe for each group. It's then simple to construct the two outputs you want per group
import pandas as pd
import geopandas as gpd
import shapely.geometry
import io

df = pd.read_csv(io.StringIO("""name    loc_x    loc_y    grp_name
a1        1.0        2.0    set1
a2        2.0        3.0    set1
a2.5      3.0        4.0    set1
a3        3.2        4.1    set2
a4        7.9        4.2    set2
a4.5      8.1        4.3    set2"""),sep="\s ",)

gpd.GeoDataFrame(
    df.groupby("grp_name").apply(
        lambda d: pd.Series(
            {
                "name": "|".join(d["name"].tolist()),
                "geometry": shapely.geometry.Polygon(
                    d.loc[:, ["loc_x", "loc_y"]].values
                ),
            }
        )
    )
)
  • Related