Home > database >  Interpret the result using distm and apply function in R
Interpret the result using distm and apply function in R

Time:06-30

I would like some help to interpret the generated result. I wanted to understand specifically what those results in min_distance mean. By any chance, are the minimum distance between properties?

database<-structure(list(Latitude = c(-24.781624, -24.775017, -24.769196, 
-24.761741, -24.752019, -24.748008, -24.737312, -24.744718, -24.751996, 
-24.724589), Longitude = c(-49.937369, 
-49.950576, -49.927608, -49.92762, -49.920608, -49.927707, -49.922095, 
-49.915438, -49.910843, -49.899478)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

d<-distm(database[,2:1]) 
diag(d)<-1000000
min_distance<-as.matrix(apply(d,MARGIN=2,FUN=min))

> min_distance
           [,1]
 [1,] 1522.9967
 [2,] 1522.9967
 [3,]  825.7868
 [4,]  825.7868
 [5,]  844.4219
 [6,]  844.4219
 [7,] 1061.3607
 [8,]  930.5737
 [9,]  930.5737
 [10,] 2687.3265

CodePudding user response:

If you print your output you will have a better sense of what you are looking at.


database<-structure(list(Latitude = c(-24.781624, -24.775017, -24.769196, 
-24.761741, -24.752019, -24.748008, -24.737312, -24.744718, -24.751996, 
-24.724589), Longitude = c(-49.937369, 
-49.950576, -49.927608, -49.92762, -49.920608, -49.927707, -49.922095, 
-49.915438, -49.910843, -49.899478)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))
database

# A tibble: 10 x 2   
#    Latitude Longitude
#       <dbl>     <dbl>
#  1    -24.8     -49.9
#  2    -24.8     -50.0
#  3    -24.8     -49.9
#  4    -24.8     -49.9
#  5    -24.8     -49.9
#  6    -24.7     -49.9
#  7    -24.7     -49.9
#  8    -24.7     -49.9
#  9    -24.8     -49.9
# 10    -24.7     -49.9

This is a table of 10 locations, with their respective latitude and longitudes.

d<-geosphere::distm(database[,2:1]) 
diag(d)<-1000000
d
#              [,1]        [,2]         [,3]         [,4]         [,5]         [,6]        [,7]         [,8]         [,9]       [,10]
#  [1,] 1000000.000    1522.997    1693.9978    2413.0564    3691.5733    3849.7209    5145.795    4651.0675    4238.9045    7389.402
#  [2,]    1522.997 1000000.000    2410.7105    2748.2810    3959.3951    3781.6594    5073.723    4888.2849    4759.4651    7610.377
#  [3,]    1693.998    2410.711 1000000.0000     825.7868    2030.1461    2347.0014    3575.519    2977.7558    2550.5461    5701.861
#  [4,]    2413.056    2748.281     825.7868 1000000.0000    1289.4746    1521.2196    2763.090    2252.5415    2011.1860    5003.997
#  [5,]    3691.573    3959.395    2030.1461    1289.4746 1000000.0000     844.4219    1636.011     963.0860     987.7503    3714.976
#  [6,]    3849.721    3781.659    2347.0014    1521.2196     844.4219 1000000.0000    1313.776    1293.4863    1762.1203    3858.080
#  [7,]    5145.795    5073.723    3575.5193    2763.0905    1636.0110    1313.7763 1000000.000    1061.3607    1985.2383    2687.327
#  [8,]    4651.067    4888.285    2977.7558    2252.5415     963.0860    1293.4863    1061.361 1000000.0000     930.5737    2752.885
#  [9,]    4238.905    4759.465    2550.5461    2011.1860     987.7503    1762.1203    1985.238     930.5737 1000000.0000    3246.261
# [10,]    7389.402    7610.377    5701.8609    5003.9971    3714.9761    3858.0802    2687.327    2752.8847    3246.2607 1000000.000

This is the distance matrix of each location to each other location. You have assigned a large number to the diagonal, presumably so the next part works.

min_distance<-as.matrix(apply(d,MARGIN=2,FUN=min))
# r$> min_distance
#            [,1]
#  [1,] 1522.9967
#  [2,] 1522.9967
#  [3,]  825.7868
#  [4,]  825.7868
#  [5,]  844.4219
#  [6,]  844.4219
#  [7,] 1061.3607
#  [8,]  930.5737
#  [9,]  930.5737
# [10,] 2687.3265

This is the key line. When you use apply with MARGIN of 2 on a matrix it iterates over columns, although in this case it doesn't matter as the matrix is symmetrical. If you change the MARGIN to 1 the output will be the same.

You are asking R to tell you the lowest value in every column. In column 1, this is the distance between locations 1 and 2, and in column 2 it is the same. The next two are location 3 to location 4. And so on. Whether this is useful depends on what the question you are asking is.

  •  Tags:  
  • r
  • Related