Home > database >  Spatial join two data frames by nearest feature and date in R
Spatial join two data frames by nearest feature and date in R

Time:04-22

This question builds off of this and this thread. Briefly, the code in these threads attempt to spatially join two spatial data frames according to the nearest features. The added layer of complexity that I am interested in is, in addition to join by nearest feature, to also join by date. However, i'm struggling to get the code working.

Below are the data frames and code that I tried (but failed):

df1 <- structure(list(lat1 = c(4417391, 4517826, 4435680, 4509372, 4449390, 
4449390), long1 = c(5557780, 5358439, 5328731, 5323168, 5519670, 
5519670), daydate = structure(c(16085, 16085, 16087, 16087, 16088, 
16088), class = "Date")), row.names = c(NA, -6L), class = "data.frame")

df2 <- structure(list(lat2 = c(4394822, 4488830, 4417257, 4517995, 4435679
), long2 = c(5293795, 5418630, 5557927, 5358272, 5328084), daydate = structure(c(16085, 
16085, 16087, 16087, 16088), class = "Date"), temp = c(6L, 26L, 
13L, 30L, 8L)), row.names = c(NA, -5L), class = "data.frame")

df1
     lat1   long1    daydate
1 4417391 5557780 2014-01-15
2 4517826 5358439 2014-01-15
3 4435680 5328731 2014-01-17
4 4509372 5323168 2014-01-17
5 4449390 5519670 2014-01-18
6 4449390 5519670 2014-01-18

df2
     lat2   long2    daydate temp
1 4394822 5293795 2014-01-15    6
2 4488830 5418630 2014-01-15   26
3 4417257 5557927 2014-01-17   13
4 4517995 5358272 2014-01-17   30
5 4435679 5328084 2014-01-18    8

# Make df & df1 sf objects, and keep the coordinates as columns just in case.
df1 <- df1 %>% st_as_sf(coords = c("long1", "lat1"), remove = FALSE) %>%
  st_set_crs(2193)
df2 <- df2 %>% st_as_sf(coords = c("long2", "lat2"), remove = FALSE) %>%
  st_set_crs(2193)

# Join df with df1, based on the nearest feature:
df_near <- st_join(df1, df1, join = st_nearest_feature) %>%
  group_by(daydate)

Error in `st_as_sf()`:
! Must group by variables found in `.data`.
x Column `daydate` is not found.

The returned error makes 100% sense as the code is written as sequential steps, but i don't know how to tell R to simultaneously consider both steps. The main objective is to get the temp values from df2 to the correct rows of df1.

Extra information in my actual data: In my actual df1 there may be duplicates for both coordinate pairs (lat1 and long1) as well as daydate. My actual df2 has duplicate coordinate pairs and daydate, but the combination of coordinate pairs and daydate is always unique i.e. every row of df2 is unique.

CodePudding user response:

Is this approximately what you are looking for? I used {base} for this since it is IMO easier to handle when coordinating parts of two different tables... but I think lapply(split(df1, 1:nrow(df1), ...) would be approximately the same as applying st_join() dplyr::rowwise()?

For each unique record / row in df1 calculate the nearest feature in the subset of df2 that has the same daydate:

library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(dplyr, warn = FALSE)

df1 <- data.frame(
  lat1 = c(4417391, 4517826, 4435680, 4509372, 4449390, 4449390),
  long1 = c(5557780, 5358439, 5328731, 5323168, 5519670, 5519670),
  daydate = structure(c(16085, 16085, 16087, 16087, 16088, 16088), class = "Date")
)

df2 <- data.frame(
  lat2 = c(4394822, 4488830, 4417257, 4517995, 4435679),
  long2 = c(5293795, 5418630, 5557927, 5358272, 5328084),
  daydate = structure(c(16085, 16085, 16087, 16087, 16088), class = "Date"),
  temp = c(6L, 26L, 13L, 30L, 8L)
)

df1 <- df1 %>%
  st_as_sf(coords = c("long1", "lat1"), remove = FALSE) %>%
  st_set_crs(2193)

df2 <- df2 %>% 
  st_as_sf(coords = c("long2", "lat2"), remove = FALSE) %>%
  st_set_crs(2193)

res <- do.call('rbind', lapply(split(df1, 1:nrow(df1)), function(x) {
  st_join(x, df2[df2$daydate == unique(x$daydate),], join = st_nearest_feature)
}))

res
#> Simple feature collection with 6 features and 7 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 5323168 ymin: 4417391 xmax: 5557780 ymax: 4517826
#> Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
#>      lat1   long1  daydate.x    lat2   long2  daydate.y temp
#> 1 4417391 5557780 2014-01-15 4488830 5418630 2014-01-15   26
#> 2 4517826 5358439 2014-01-15 4488830 5418630 2014-01-15   26
#> 3 4435680 5328731 2014-01-17 4517995 5358272 2014-01-17   30
#> 4 4509372 5323168 2014-01-17 4517995 5358272 2014-01-17   30
#> 5 4449390 5519670 2014-01-18 4435679 5328084 2014-01-18    8
#> 6 4449390 5519670 2014-01-18 4435679 5328084 2014-01-18    8
#>                  geometry
#> 1 POINT (5557780 4417391)
#> 2 POINT (5358439 4517826)
#> 3 POINT (5328731 4435680)
#> 4 POINT (5323168 4509372)
#> 5 POINT (5519670 4449390)
#> 6 POINT (5519670 4449390)

plot(st_geometry(res))
plot(df2 %>% 
       st_as_sf(coords = c("long2", "lat2"), remove = FALSE) %>%
       st_set_crs(2193) %>% 
       st_geometry(), add = T, pch = "*")

EDIT: here is the same thing using {data.table}

df1 <- data.table(df1)
.nearest_samedate <- function(x) {
  st_join(st_as_sf(x), df2[df2$daydate == unique(x$daydate),], join = st_nearest_feature)
}

res <- df1[, .nearest_samedate(.SD), by = list(1:nrow(df1))]

CodePudding user response:

You're doing everything (almost) right. df_near doesn't have a column named daydate to group by. Since both df_1 and df_2 both have a column named daydate, the output has two columns for daydate named daydate.x and daydate.y. One from the left-hand side (df1), the other from the right-hand side(df2).

Using group_by(daydate.x) should work, buy you might want to check that the daydate columns are the same (or at least what you are expecting) between the dataframes.

library(sf)
library(tidyverse)

df1 <- df1 %>% st_as_sf(coords = c("long1", "lat1"), remove = FALSE) %>%
  st_set_crs(2193)
df2 <- df2 %>% st_as_sf(coords = c("long2", "lat2"), remove = FALSE) %>%
  st_set_crs(2193)

# Join df with df1, based on the nearest feature:
df_near <- st_join(df1, df2, join = st_nearest_feature) %>%
  group_by(daydate.x)

df_near
#> Simple feature collection with 6 features and 7 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 5323168 ymin: 4417391 xmax: 5557780 ymax: 4517826
#> Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
#> # A tibble: 6 × 8
#> # Groups:   daydate.x [3]
#>      lat1   long1 daydate.x     lat2   long2 daydate.y   temp
#>     <dbl>   <dbl> <date>       <dbl>   <dbl> <date>     <int>
#> 1 4417391 5557780 2014-01-15 4417257 5557927 2014-01-17    13
#> 2 4517826 5358439 2014-01-15 4517995 5358272 2014-01-17    30
#> 3 4435680 5328731 2014-01-17 4435679 5328084 2014-01-18     8
#> 4 4509372 5323168 2014-01-17 4517995 5358272 2014-01-17    30
#> 5 4449390 5519670 2014-01-18 4417257 5557927 2014-01-17    13
#> 6 4449390 5519670 2014-01-18 4417257 5557927 2014-01-17    13
#> # … with 1 more variable: geometry <POINT [m]>

Created on 2022-04-21 by the reprex package (v2.0.1)

  •  Tags:  
  • r sf
  • Related