Dealing with Lists in R - combine two data frames and calculate values with mismatched sizes-CodePudding

I have two data.frame in R that I need to calculate values based on a subset of columns in each of the two dataframes.

This is geolocated data, so I've been using geosphere to do some of the calculations. These are the two dataframes. dat1 contains just a list of lat/long coordinates for random data points on the surface of the Earth. dat2 is a set of objects on Earth with a lat/long location and a speed/size associated with them.

library(geosphere)
set.seed(1)
dat1 <- data.frame( long = runif( n = 10, min = -180, max = 180 ),
                lat = runif( n = 10, min = -90, max = 90 ) )

dat2 <- data.frame( long = runif( n = 10, min = -180, max = 180 ),
                lat = runif( n = 10, min = -90, max = 90 ),
                size = runif( n = 10, min = 0, max = 1500 ),
                speed = rnorm( n = 10, mean = 100, sd = 30 ) )

I need to calculate the distance from each point in dat1 to all of the objects in dat2, while also keeping the size and speed data. I've been doing this by creating a list of all the dat1 locations:

list.dat1   <- split( dat1, 1:nrow( dat1 ) )

And using a double for loop (bad form in R, I know), but it works okay.

for( i in 1:length( list.dat1 ) ){
  for( j in 1:nrow( dat2 ) ) {
two.points = matrix( c( dat2[ j, 'long'], dat2[j,'lat'], # create matrix 
           dat1[ i, 'long' ], dat1[ i,'lat' ] ), # column 1 and 2 is long lat 
           nrow = 2, ncol = 2, byrow = T )     # make a matrix of these two locations

    ## now add the data from the objects
    list.dat1[[i]][j,3] = dat2[ j, 'long' ]  # add impactor long
    list.dat1[[i]][j,4] = dat2[ j, 'lat' ]  # add impactor lat

    ## calculate distance
    list.dat1[[i]][j,5] = distGeo( two.points )[1] / 1000  # distance kilometers
    list.dat1[[i]][j,6] = dat2[ j, 'size' ]   # add size of object
    list.dat1[[i]][j,7] = dat2[ j, 'speed' ]  # add speed of object
  }
}

Then I just rename columns

for( i in 1:length( list.dat1) ) {
   colnames( list.dat1[[i]]) = c( "point.long", "point.lat", 
                                      "object.long", "object.lat",
                                      "distance.to.object", "object.size", 
                                      "object.speed" )
}

This is horribly inefficient as dat1 has 1000 rows and dat2 has anywhere from 100 to 100,000 rows.

I was thinking I could make list.dat1 a list of dataframes with length nrow(dat2) but I'm not sure how to accomplish that.

Then I could simply cbind the dat2 data into each list element in list.dat1 using lapply?

Then, finally, do the distGeo() calculation on each row of each list?

I'm still learning how to efficiently use R lists and apply() suite functions, so any help on making this more efficient would be greatly appreciated!

CodePudding user response：

You can create a new dataframe that creates all combinations of points in dat1 and dat2 with tidyr::crossing and then call the distGeo function on the resulting dataframe.

library(dplyr)
library(tidyr)
dat1 %>% 

    # rename dat1 long,lat to distinguish between dat1,dat2
    rename(dat1_long = long, dat1_lat = lat) %>% 

    # create all combinations
    tidyr::crossing(dat2) %>%

    # compute the distance between points 
    mutate(dist = distGeo(p1 = cbind(dat1_long, dat1_lat), 
                          p2 = cbind(long, lat)))

CodePudding user response：

Here is a data.table approach which might help with your large datasets:

library(data.table)

setDT(dat1)[, `:=`(d1_id=.I, id=1)]
setDT(dat2)[, `:=`(d2_id=.I, id=1)]
dat1[dat2,on=.(id),allow.cartesian=T] %>% 
.[, dist:=distGeo(cbind(long,lat), cbind(i.long,i.lat))] %>% 
.[,.(d1_id, long, lat, d2_id, size, speed, dist)]

Output: (first six rows)

     d1_id       long         lat d2_id      size     speed     dist
  1:     1  -84.41688 -52.9245765     1 1231.4194  98.31614 11610004
  2:     2  -46.03540 -58.2197845     1 1231.4194  98.31614 12903703
  3:     3   26.22721  33.6641124     1 1231.4194  98.31614 13868897
  4:     4  146.95480 -20.8613307     1 1231.4194  98.31614  2208091
  5:     5 -107.39450  48.5714556     1 1231.4194  98.31614 10727196
  6:     6  143.42029  -0.4141364     1 1231.4194  98.31614  1487321