I have a pandas DataFrame
like this
name loc_x loc_y grp_name
a1 1.0 2.0 set1
a2 2.0 3.0 set1
a3 3.2 4.1 set2
a4 7.9 4.2 set2
I want to generate a GeoDataFrame
that generates a polygon
using loc_x
and loc_y
grouped on grp_name
and also includes a column name
that has the values in my original data frame concatenated by |
? The result should look like this
name geometry
set1 a1|a2 POLYGON ((1.0, 2.0)...)
set2 a3|a4 POLYGON ((3.2, 4.1)...)
I do this to get the geometry column but how do I also get an additional column with name
concatenated from my base data frame?
gdf = gpd.GeoDataFrame(geometry=df.groupby('grp_name').apply(
lambda g: Polygon(gpd.points_from_xy(g['loc_x'], g['loc_y']))))
CodePudding user response:
- required a modification to your test data. A polygon has a minimum of three points
- this comes down to knowing pandas.
groupby().apply()
provides a reference to dataframe for each group. It's then simple to construct the two outputs you want per group
import pandas as pd
import geopandas as gpd
import shapely.geometry
import io
df = pd.read_csv(io.StringIO("""name loc_x loc_y grp_name
a1 1.0 2.0 set1
a2 2.0 3.0 set1
a2.5 3.0 4.0 set1
a3 3.2 4.1 set2
a4 7.9 4.2 set2
a4.5 8.1 4.3 set2"""),sep="\s ",)
gpd.GeoDataFrame(
df.groupby("grp_name").apply(
lambda d: pd.Series(
{
"name": "|".join(d["name"].tolist()),
"geometry": shapely.geometry.Polygon(
d.loc[:, ["loc_x", "loc_y"]].values
),
}
)
)
)