Home > Software design >  Finding distance between a row and the row two above it in R
Finding distance between a row and the row two above it in R

Time:07-05

I would like to efficiently compute distances between every row in a matrix and the row two rows above it in R...

My attempts at finding a dplyr rowwise solution with lag(., n = 2) have failed, and I'm sure there's a better solution than this for loop.

Thoughts are much appreciated!

library(rdist)
library(tidyverse)

structure(list(sodium = c(140, 152.6, 138, 152.4, 140, 152.6, 
141, 152.7, 141, 152.7), chloride = c(103, 148.9, 104, 149, 102, 
148.8, 103, 148.9, 104, 149), potassium_plas = c(3.4, 0.34, 4.1, 
0.41, 3.7, 0.37, 4, 0.4, 3.7, 0.37), co2_totl = c(31, 3.1, 22, 
2.2, 23, 2.3, 27, 2.7, 20, 2), bun = c(11, 1.1, 5, 0.5, 8, 0.8, 
21, 2.1, 10, 1), creatinine = c(0.84, 0.084, 0.53, 0.053, 0.69, 
0.069, 1.04, 0.104, 1.86, 0.186), calcium = c(9.3, 0.93, 9.8, 
0.98, 9.4, 0.94, 9.4, 0.94, 9.1, 0.91), glucose = c(102, 10.2, 
99, 9.9, 115, 11.5, 94, 9.4, 122, 12.2), anion_gap = c(6, 0.599999999999989, 
12, 1.20000000000001, 15, 1.50000000000001, 11, 1.09999999999998, 
17, 1.69999999999999)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

dist_prior <- rep(NA, n = nrow(input_labs))

for(i in 3:nrow(input_labs)){
  dist_prior[i] <- cdist(input_labs[i,], input_labs[i-2,])
}

CodePudding user response:

We could loop over the sequence of rows in map and apply the function, append NAs at the beginning to make the length correct

library(dplyr)
library(rdist)
library(purrr)
input_labs %>%
   mutate(dist_prior = c(NA_real_, NA_real_,
    map_dbl(3:n(), ~ cdist(cur_data()[.x,], cur_data()[.x-2, ]))))

-output

# A tibble: 10 × 10
   sodium chloride potassium_plas co2_totl   bun creatinine calcium glucose anion_gap dist_prior
    <dbl>    <dbl>          <dbl>    <dbl> <dbl>      <dbl>   <dbl>   <dbl>     <dbl>      <dbl>
 1   140      103            3.4      31    11        0.84     9.3    102       6          NA   
 2   153.     149.           0.34      3.1   1.1      0.084    0.93    10.2     0.600      NA   
 3   138      104            4.1      22     5        0.53     9.8     99      12          13.0 
 4   152.     149            0.41      2.2   0.5      0.053    0.98     9.9     1.20        1.30
 5   140      102            3.7      23     8        0.69     9.4    115      15          16.8 
 6   153.     149.           0.37      2.3   0.8      0.069    0.94    11.5     1.50        1.68
 7   141      103            4        27    21        1.04     9.4     94      11          25.4 
 8   153.     149.           0.4       2.7   2.1      0.104    0.94     9.4     1.10        2.54
 9   141      104            3.7      20    10        1.86     9.1    122      17          31.5 
10   153.     149            0.37      2     1        0.186    0.91    12.2     1.70        3.15

Or may split by row on the original data and the laged one and use map2 to loop over the list and apply

input_labs$dist_prior <- map2_dbl(
         asplit(lag(input_labs, n = 2), 1),
          asplit(input_labs, 1), 
         ~ cdist(as.data.frame.list(.x), as.data.frame.list(.y))[,1])

CodePudding user response:

in Base R you can use diff and rowSums as shown below:

c(NA, NA, sqrt(rowSums(diff(as.matrix(input_labs), 2)^2)))

[1]        NA        NA 12.955157  1.295516 16.832873  1.683287 25.381342  2.538134 31.493688  3.149369

You can cbind the results to the original dataframe.

  • Related