Home > Software engineering >  How to calculate geographic centroids by group?
How to calculate geographic centroids by group?

Time:05-18

I need to calculate the geographic centroid of different polygons grouped by category. Let me give an example. Let's say we have this sf dataset

library("sf")
nc <- st_read(system.file("/shape/nc.shp", package="sf"))

Let's say we group the different polygons into categories.

df <- data.frame(id=rep(1:10,rep=10))
nc <- cbind(nc,df)

If we plot it, we can see that different polygons belong to different categories. Most importantly, these polygons are not always contiguous.

plot(nc["id"])

I would like to calculate the geographic centroid for each category. One of the options I consider was to dissolve the different polygons first and then compute the centroid. However, since polygons are non-contiguous, this does not work.

p <- nc %>% group_by(id) %>%
  summarise(geometry = st_union(geometry)) %>%
  ungroup()

Also, if I try the code below, it gives me the centroid of each polygon and not by group

p <- nc %>% group_by(id) %>%
    summarise(geometry = st_centroid(geometry),.groups = "keep")

Any idea on how to obtain the centroid by group?

CodePudding user response:

It looks like it works if you group_by, st_union in the summarise, and then find the centroid.

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(tidyverse)

nc <- st_read(system.file("/shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source 
#>   `/home/x/R/x86_64-pc-linux-gnu-library/4.0/sf/shape/nc.shp' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27

df <- data.frame(id=rep(1:10,rep=10))
nc <- cbind(nc,df)


# new sf data.frame, get centroids by group
nc_grouped <- nc %>%
                group_by(id) %>%
                summarise(st_union(geometry)) %>%
                st_centroid() 
#> Warning in st_centroid.sf(.): st_centroid assumes attributes are constant over
#> geometries of x

# plot everything
ggplot()   
  geom_sf(data = nc, aes(fill = id, alpha = .2))  
  geom_sf(data = nc_grouped, aes(color = id), size = 3)   
  scale_fill_viridis_c()  
  scale_color_viridis_c()

Hard to see, but they're all there.

# plotting only id ==1, looks about right.
ggplot()   
  geom_sf(data = nc[nc$id == 1,])   
  geom_sf(data = nc_grouped[nc_grouped$id == 1,])

Checking only the 1st group, looks about right.



nc %>% group_by(id) %>%
  summarise(geometry = st_centroid(geometry),.groups = "keep")
#> Simple feature collection with 100 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -84.05986 ymin: 34.07671 xmax: -75.8095 ymax: 36.49111
#> Geodetic CRS:  NAD27
#> # A tibble: 100 × 2
#> # Groups:   id [10]
#>       id             geometry
#>    <int>          <POINT [°]>
#>  1     1  (-81.49823 36.4314)
#>  2     1 (-79.33478 36.39346)
#>  3     1 (-76.61642 36.14886)
#>  4     1 (-77.98691 35.96228)
#>  5     1 (-81.17403 35.91575)
#>  6     1 (-77.37784 35.58906)
#>  7     1 (-81.91783 35.39871)
#>  8     1 (-80.24928 35.31388)
#>  9     1 (-84.05986 35.13111)
#> 10     1 (-77.10388 35.13065)
#> # … with 90 more rows

Created on 2022-05-17 by the reprex package (v2.0.1)

  • Related