I have a question concerning rasterization of polygons by maximum overlap, i.e assign the value of the polygon that has the highst area overlap with the raster cell.
The real world exercise is to rasterize polygons of soil-IDs in R, in order to produce relatively low resolution maps of soil properties as model inputs.
The problem is that the rasterize()
function of the terra package (and similar stars' st_rasterize()
) assigns the cell value from the polygon that contains the cell midpoint. If a raster cell contains multiple polygons, I would rather like to select the value of the polygon (soil-ID), which has the highest aerea cover in a raster cell.
Here is a small self-contained example that visualizes my problem, using terra.
library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
r <- rast(v, ncols = 3, nrow = 3)
rcc <- vect(xyFromCell(r, cell = 1:ncell(r)))
x <- rasterize(v, r, field = "NAME_2")
plot(x)
lines(r, col = "light gray")
lines(v)
points(rcc)
Mostly, the polygons that contain the cell center also seem to have the highest area share. However, in some cases (top row, 3rd cell), this is not the case. The problem appears to get worse the bigger the cells are compared with the polygons. I could therefore start with high resolution raster, and than resample to the desired (lower) resolution, using an aggregation function (e.g. the mode). But, maybe someone has a more efficient idea?
Thank you for your help!
CodePudding user response:
Please find one possible solution using terra
and sf
libraries.
The idea is to convert the SpatRaster
r
into a SpatVector
and then into an sf
object in order to take advantage of the sf::st_join()
function using the largest = TRUE
argument. The rest of the code then consists of simply converting the sf
object back into a SpatVector
and then a SpatRaster
using the terra::rasterize()
function.
So, please find below a reprex that details the procedure.
Reprex
- Code
library(terra)
library(sf)
# Your data
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
r <- rast(v, ncols = 3, nrow = 3)
rcc <- vect(xyFromCell(r, cell = 1:ncell(r)))
# Convert the 'SpatRaster' 'r' into a 'SpatVector (i.e. 'r_poly')
r_poly <- terra::as.polygons(r)
# Convert 'r_poly' into a 'sf' object (i.e. 'r_poly_sf')
r_poly_sf <- sf::st_as_sf(r_poly)
# Convert 'v' into a 'sf' object (i.e. 'v_sf')
v_sf <- sf::st_as_sf(v)
# Left join r_poly_sf with v_sf based on the largest overlap
results_sf <- sf::st_join(r_poly_sf, v_sf, largest = TRUE)
# Convert 'results_sf' into a SpatVector (i.e. 'results_vect')
results_vect <- terra::vect(results_sf)
# Rasterize 'results_vect' to get a 'SpatRaster' (i.e. 'results')
results <- terra::rasterize(results_vect, r, field = "NAME_2")
Output
NB: please note that the cell of the upper right corner is
NA
because no polygon fromr
overlapsv
(if needed you can still set the value for cells that do not overlap by using thebackground=
argument inside theterra::rasterize()
function).
results
#> class : SpatRaster
#> dimensions : 3, 3, 1 (nrow, ncol, nlyr)
#> resolution : 0.2613707, 0.2446047 (x, y)
#> extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326)
#> source : memory
#> name : NAME_2
#> min value : Capellen
#> max value : Remich
terra::values(results, dataframe=TRUE)
#> NAME_2
#> 1 Clervaux
#> 2 Clervaux
#> 3 <NA>
#> 4 Redange
#> 5 Mersch
#> 6 Echternach
#> 7 Capellen
#> 8 Luxembourg
#> 9 Remich
- Visualization
plot(results)
lines(r, col = "light gray")
lines(v)
points(rcc)
Probably less efficient (especially if you have many IDs):
z <- lapply(1:nrow(v), \(i) rasterize(v[i,], r, cover=TRUE))
z <- which.max(rast(z))
But you could replace rasterize with exactextractr::coverage_fraction
if you want very high precision
Even less efficient, I suppose:
values(r) <- 1:ncell(r)
# get weights
e <- extract(r, v, weights=TRUE)
e <- as.matrix(e)
head(e)
# ID lyr.1 weight
#[1,] 1 1 0.38
#[2,] 1 2 0.49
#[3,] 2 2 0.06
#[4,] 2 4 0.05
#[5,] 2 5 0.52
#[6,] 2 6 0.06
# find cell with max weight
x <- sapply(unique(e[,2]), function(i) {
d <- e[e[,2] == i, ]
d[which.max(d[,3]), 2:1]
})
# assign ID to cells
r[x[1,]] <- x[2,]