I would like to filter sfc_multipoint
data by the size of the sfc_multipoint
s, i.e. the number of points/coordinate pairs contained in each row. Suppose I want to keep only rows with no more than 3 points from the following data set, based on mapview::breweries
:
library(mapview)
library(sf)
library(tidyverse)
pts <- breweries %>%
group_by(zipcode) %>%
summarise()
In order to do this, I need to know the number of points within each sfc_multipoint
. In theory I could export the coordinates via as("spatial")
and count rows, but that is not only ugly, but also not really feasible for data sets of any realistic size. Any suggestions?
CodePudding user response:
Please find below another possible approach, slightly different of that of @Jindra Lacko.
Reprex
- Code
library(mapview)
library(sf)
library(dplyr)
# computes the number of points (i.e. XY pairs = length(x)/2) for each feature and
# keeps only the features where the number of points < 3
selection <- which(sapply(st_geometry(pts), function(x) length(x)/2) < 3)
# filter rows of pts with the 'selection'
pts[selection,]
- Output
#> Simple feature collection with 65 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: 9.462785 ymin: 48.90074 xmax: 11.93539 ymax: 50.44162
#> Geodetic CRS: WGS 84
#> # A tibble: 65 x 2
#> zipcode geometry
#> <chr> <GEOMETRY [°]>
#> 1 90562 POINT (11.15795 49.53495)
#> 2 90614 POINT (10.85194 49.4208)
#> 3 90763 POINT (10.99625 49.44171)
#> 4 91054 MULTIPOINT ((11.00901 49.59511), (11.00505 49.60255))
#> 5 91097 MULTIPOINT ((10.76099 49.59044), (10.76954 49.59015))
#> 6 91126 POINT (10.9281 49.27472)
#> 7 91207 POINT (11.22997 49.55471)
#> 8 91217 POINT (11.42834 49.50683)
#> 9 91220 POINT (11.36851 49.5618)
#> 10 91227 POINT (11.30872 49.45008)
#> # ... with 55 more rows
Created on 2021-12-29 by the reprex package (v2.0.1)
CodePudding user response:
You could try dplyr::count()
or summarize(n = n())
to read the number of rows within a given zipcode
, but the breweries
dataset seems to have some duplicates so this might be misleading.
breweries %>%
count(zipcode)
#----------
zipcode n geometry
1 90562 1 POINT (11.15795 49.53495)
2 90614 1 POINT (10.85194 49.4208)
3 90763 1 POINT (10.99625 49.44171)
4 91054 3 MULTIPOINT ((11.00901 49.59...
5 91097 2 MULTIPOINT ((10.76099 49.59...
Or only unique points (note the change in 91054
)
breweries %>%
distinct(zipcode, geometry) %>%
count(zipcode)
#-----
zipcode n geometry
1 90562 1 POINT (11.15795 49.53495)
2 90614 1 POINT (10.85194 49.4208)
3 90763 1 POINT (10.99625 49.44171)
4 91054 2 MULTIPOINT ((11.00901 49.59...
5 91097 2 MULTIPOINT ((10.76099 49.59...
You could also try mapview::npts()
breweries %>%
group_by(zipcode) %>%
summarize() %>%
rowwise() %>%
mutate(n = npts(geometry))
#----
zipcode geometry n
* <chr> <GEOMETRY [°]> <dbl>
1 90562 POINT (11.15795 49.53495) 1
2 90614 POINT (10.85194 49.4208) 1
3 90763 POINT (10.99625 49.44171) 1
4 91054 MULTIPOINT ((11.00901 49.59511), (11.00505 49.60255)) 2
5 91097 MULTIPOINT ((10.76099 49.59044), (10.76954 49.59015)) 2
CodePudding user response:
You could build upon sf::st_coordinates()
- which for multipoints has an interesting side effect of producing a matrix of 1) coordinates (duh...) and 2) L1 column, identifying which feature of the original multipoint the coordinate refers to.
So consider this piece of code, building upon the original pts example (cast to multipoints from original geometry collection).
library(mapview)
library(sf)
library(tidyverse)
pts <- breweries %>%
group_by(zipcode) %>%
summarise() %>%
st_cast("MULTIPOINT")
# get count of coordinates per zipcode
pts$breweries <- st_coordinates(pts) %>%
as.data.frame() %>%
group_by(L1) %>% # id of the row from original multipoint object
tally() %>%
pull(n)
# zipcodes with less than 3 breweries
pts %>%
filter(breweries < 3)
#Simple feature collection with 65 features and 2 fields
#Geometry type: MULTIPOINT
#Dimension: XY
#Bounding box: xmin: 9.462785 ymin: 48.90074 xmax: 11.93539 ymax: 50.44162
#Geodetic CRS: WGS 84
# A tibble: 65 × 3
# zipcode geometry breweries
# * <chr> <MULTIPOINT [°]> <int>
# 1 90562 ((11.15795 49.53495)) 1
# 2 90614 ((10.85194 49.4208)) 1
# 3 90763 ((10.99625 49.44171)) 1
# 4 91054 ((11.00901 49.59511), (11.00505 49.60255)) 2
# 5 91097 ((10.76099 49.59044), (10.76954 49.59015)) 2
# 6 91126 ((10.9281 49.27472)) 1
# 7 91207 ((11.22997 49.55471)) 1
# 8 91217 ((11.42834 49.50683)) 1
# 9 91220 ((11.36851 49.5618)) 1
#10 91227 ((11.30872 49.45008)) 1
## … with 55 more rows