Count number of points in sfc_multipoint object-CodePudding

I would like to filter sfc_multipoint data by the size of the sfc_multipoints, i.e. the number of points/coordinate pairs contained in each row. Suppose I want to keep only rows with no more than 3 points from the following data set, based on mapview::breweries:

library(mapview)
library(sf)
library(tidyverse)

pts <- breweries %>% 
  group_by(zipcode) %>% 
  summarise()

In order to do this, I need to know the number of points within each sfc_multipoint. In theory I could export the coordinates via as("spatial") and count rows, but that is not only ugly, but also not really feasible for data sets of any realistic size. Any suggestions?

CodePudding user response：

Please find below another possible approach, slightly different of that of @Jindra Lacko.

Reprex

Code

library(mapview)
library(sf)
library(dplyr)

# computes the number of points (i.e. XY pairs = length(x)/2) for each feature and 
# keeps only the features where the number of points < 3
selection <- which(sapply(st_geometry(pts), function(x) length(x)/2) < 3)

# filter rows of pts with the 'selection'
pts[selection,]

Output

#> Simple feature collection with 65 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension:     XY
#> Bounding box:  xmin: 9.462785 ymin: 48.90074 xmax: 11.93539 ymax: 50.44162
#> Geodetic CRS:  WGS 84
#> # A tibble: 65 x 2
#>    zipcode                                              geometry
#>    <chr>                                          <GEOMETRY [°]>
#>  1 90562                               POINT (11.15795 49.53495)
#>  2 90614                                POINT (10.85194 49.4208)
#>  3 90763                               POINT (10.99625 49.44171)
#>  4 91054   MULTIPOINT ((11.00901 49.59511), (11.00505 49.60255))
#>  5 91097   MULTIPOINT ((10.76099 49.59044), (10.76954 49.59015))
#>  6 91126                                POINT (10.9281 49.27472)
#>  7 91207                               POINT (11.22997 49.55471)
#>  8 91217                               POINT (11.42834 49.50683)
#>  9 91220                                POINT (11.36851 49.5618)
#> 10 91227                               POINT (11.30872 49.45008)
#> # ... with 55 more rows

^{Created on 2021-12-29 by the reprex package (v2.0.1)}

CodePudding user response：

You could try dplyr::count() or summarize(n = n()) to read the number of rows within a given zipcode, but the breweries dataset seems to have some duplicates so this might be misleading.

breweries %>% 
  count(zipcode)

#----------
  zipcode n                       geometry
1   90562 1      POINT (11.15795 49.53495)
2   90614 1       POINT (10.85194 49.4208)
3   90763 1      POINT (10.99625 49.44171)
4   91054 3 MULTIPOINT ((11.00901 49.59...
5   91097 2 MULTIPOINT ((10.76099 49.59...

Or only unique points (note the change in 91054)

breweries %>%
  distinct(zipcode, geometry) %>%
  count(zipcode)

#-----
  zipcode n                       geometry
1   90562 1      POINT (11.15795 49.53495)
2   90614 1       POINT (10.85194 49.4208)
3   90763 1      POINT (10.99625 49.44171)
4   91054 2 MULTIPOINT ((11.00901 49.59...
5   91097 2 MULTIPOINT ((10.76099 49.59...

You could also try mapview::npts()

breweries %>%
  group_by(zipcode) %>%
  summarize() %>%
  rowwise() %>%
  mutate(n = npts(geometry))

#----
   zipcode                                              geometry     n
 * <chr>                                          <GEOMETRY [°]> <dbl>
 1 90562                               POINT (11.15795 49.53495)     1
 2 90614                                POINT (10.85194 49.4208)     1
 3 90763                               POINT (10.99625 49.44171)     1
 4 91054   MULTIPOINT ((11.00901 49.59511), (11.00505 49.60255))     2
 5 91097   MULTIPOINT ((10.76099 49.59044), (10.76954 49.59015))     2

CodePudding user response：

You could build upon sf::st_coordinates() - which for multipoints has an interesting side effect of producing a matrix of 1) coordinates (duh...) and 2) L1 column, identifying which feature of the original multipoint the coordinate refers to.

So consider this piece of code, building upon the original pts example (cast to multipoints from original geometry collection).

library(mapview)
library(sf)
library(tidyverse)

pts <- breweries %>% 
  group_by(zipcode) %>% 
  summarise() %>% 
  st_cast("MULTIPOINT")

# get count of coordinates per zipcode
pts$breweries <- st_coordinates(pts) %>% 
  as.data.frame() %>% 
  group_by(L1) %>% # id of the row from original multipoint object
  tally() %>% 
  pull(n)

# zipcodes with less than 3 breweries
pts %>% 
  filter(breweries < 3)

#Simple feature collection with 65 features and 2 fields
#Geometry type: MULTIPOINT
#Dimension:     XY
#Bounding box:  xmin: 9.462785 ymin: 48.90074 xmax: 11.93539 ymax: 50.44162
#Geodetic CRS:  WGS 84
# A tibble: 65 × 3
#   zipcode                                   geometry breweries
# * <chr>                             <MULTIPOINT [°]>     <int>
# 1 90562                        ((11.15795 49.53495))         1
# 2 90614                         ((10.85194 49.4208))         1
# 3 90763                        ((10.99625 49.44171))         1
# 4 91054   ((11.00901 49.59511), (11.00505 49.60255))         2
# 5 91097   ((10.76099 49.59044), (10.76954 49.59015))         2
# 6 91126                         ((10.9281 49.27472))         1
# 7 91207                        ((11.22997 49.55471))         1
# 8 91217                        ((11.42834 49.50683))         1
# 9 91220                         ((11.36851 49.5618))         1
#10 91227                        ((11.30872 49.45008))         1
## … with 55 more rows