Finding distance between a row and the row two above it in R-CodePudding

I would like to efficiently compute distances between every row in a matrix and the row two rows above it in R...

My attempts at finding a dplyr rowwise solution with lag(., n = 2) have failed, and I'm sure there's a better solution than this for loop.

Thoughts are much appreciated!

library(rdist)
library(tidyverse)

structure(list(sodium = c(140, 152.6, 138, 152.4, 140, 152.6, 
141, 152.7, 141, 152.7), chloride = c(103, 148.9, 104, 149, 102, 
148.8, 103, 148.9, 104, 149), potassium_plas = c(3.4, 0.34, 4.1, 
0.41, 3.7, 0.37, 4, 0.4, 3.7, 0.37), co2_totl = c(31, 3.1, 22, 
2.2, 23, 2.3, 27, 2.7, 20, 2), bun = c(11, 1.1, 5, 0.5, 8, 0.8, 
21, 2.1, 10, 1), creatinine = c(0.84, 0.084, 0.53, 0.053, 0.69, 
0.069, 1.04, 0.104, 1.86, 0.186), calcium = c(9.3, 0.93, 9.8, 
0.98, 9.4, 0.94, 9.4, 0.94, 9.1, 0.91), glucose = c(102, 10.2, 
99, 9.9, 115, 11.5, 94, 9.4, 122, 12.2), anion_gap = c(6, 0.599999999999989, 
12, 1.20000000000001, 15, 1.50000000000001, 11, 1.09999999999998, 
17, 1.69999999999999)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

dist_prior <- rep(NA, n = nrow(input_labs))

for(i in 3:nrow(input_labs)){
  dist_prior[i] <- cdist(input_labs[i,], input_labs[i-2,])
}

CodePudding user response：

We could loop over the sequence of rows in map and apply the function, append NAs at the beginning to make the length correct

library(dplyr)
library(rdist)
library(purrr)
input_labs %>%
   mutate(dist_prior = c(NA_real_, NA_real_,
    map_dbl(3:n(), ~ cdist(cur_data()[.x,], cur_data()[.x-2, ]))))

-output

# A tibble: 10 × 10
   sodium chloride potassium_plas co2_totl   bun creatinine calcium glucose anion_gap dist_prior
    <dbl>    <dbl>          <dbl>    <dbl> <dbl>      <dbl>   <dbl>   <dbl>     <dbl>      <dbl>
 1   140      103            3.4      31    11        0.84     9.3    102       6          NA   
 2   153.     149.           0.34      3.1   1.1      0.084    0.93    10.2     0.600      NA   
 3   138      104            4.1      22     5        0.53     9.8     99      12          13.0 
 4   152.     149            0.41      2.2   0.5      0.053    0.98     9.9     1.20        1.30
 5   140      102            3.7      23     8        0.69     9.4    115      15          16.8 
 6   153.     149.           0.37      2.3   0.8      0.069    0.94    11.5     1.50        1.68
 7   141      103            4        27    21        1.04     9.4     94      11          25.4 
 8   153.     149.           0.4       2.7   2.1      0.104    0.94     9.4     1.10        2.54
 9   141      104            3.7      20    10        1.86     9.1    122      17          31.5 
10   153.     149            0.37      2     1        0.186    0.91    12.2     1.70        3.15

Or may split by row on the original data and the laged one and use map2 to loop over the list and apply

input_labs$dist_prior <- map2_dbl(
         asplit(lag(input_labs, n = 2), 1),
          asplit(input_labs, 1), 
         ~ cdist(as.data.frame.list(.x), as.data.frame.list(.y))[,1])

CodePudding user response：

in Base R you can use diff and rowSums as shown below:

c(NA, NA, sqrt(rowSums(diff(as.matrix(input_labs), 2)^2)))

[1]        NA        NA 12.955157  1.295516 16.832873  1.683287 25.381342  2.538134 31.493688  3.149369

You can cbind the results to the original dataframe.