I would like to efficiently compute distances between every row in a matrix and the row two rows above it in R...
My attempts at finding a dplyr rowwise solution with lag(., n = 2) have failed, and I'm sure there's a better solution than this for loop.
Thoughts are much appreciated!
library(rdist)
library(tidyverse)
structure(list(sodium = c(140, 152.6, 138, 152.4, 140, 152.6,
141, 152.7, 141, 152.7), chloride = c(103, 148.9, 104, 149, 102,
148.8, 103, 148.9, 104, 149), potassium_plas = c(3.4, 0.34, 4.1,
0.41, 3.7, 0.37, 4, 0.4, 3.7, 0.37), co2_totl = c(31, 3.1, 22,
2.2, 23, 2.3, 27, 2.7, 20, 2), bun = c(11, 1.1, 5, 0.5, 8, 0.8,
21, 2.1, 10, 1), creatinine = c(0.84, 0.084, 0.53, 0.053, 0.69,
0.069, 1.04, 0.104, 1.86, 0.186), calcium = c(9.3, 0.93, 9.8,
0.98, 9.4, 0.94, 9.4, 0.94, 9.1, 0.91), glucose = c(102, 10.2,
99, 9.9, 115, 11.5, 94, 9.4, 122, 12.2), anion_gap = c(6, 0.599999999999989,
12, 1.20000000000001, 15, 1.50000000000001, 11, 1.09999999999998,
17, 1.69999999999999)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
dist_prior <- rep(NA, n = nrow(input_labs))
for(i in 3:nrow(input_labs)){
dist_prior[i] <- cdist(input_labs[i,], input_labs[i-2,])
}
CodePudding user response:
We could loop over the sequence of rows in map
and apply the function, append NAs at the beginning to make the length correct
library(dplyr)
library(rdist)
library(purrr)
input_labs %>%
mutate(dist_prior = c(NA_real_, NA_real_,
map_dbl(3:n(), ~ cdist(cur_data()[.x,], cur_data()[.x-2, ]))))
-output
# A tibble: 10 × 10
sodium chloride potassium_plas co2_totl bun creatinine calcium glucose anion_gap dist_prior
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 140 103 3.4 31 11 0.84 9.3 102 6 NA
2 153. 149. 0.34 3.1 1.1 0.084 0.93 10.2 0.600 NA
3 138 104 4.1 22 5 0.53 9.8 99 12 13.0
4 152. 149 0.41 2.2 0.5 0.053 0.98 9.9 1.20 1.30
5 140 102 3.7 23 8 0.69 9.4 115 15 16.8
6 153. 149. 0.37 2.3 0.8 0.069 0.94 11.5 1.50 1.68
7 141 103 4 27 21 1.04 9.4 94 11 25.4
8 153. 149. 0.4 2.7 2.1 0.104 0.94 9.4 1.10 2.54
9 141 104 3.7 20 10 1.86 9.1 122 17 31.5
10 153. 149 0.37 2 1 0.186 0.91 12.2 1.70 3.15
Or may split by row on the original data and the lag
ed one and use map2
to loop over the list and apply
input_labs$dist_prior <- map2_dbl(
asplit(lag(input_labs, n = 2), 1),
asplit(input_labs, 1),
~ cdist(as.data.frame.list(.x), as.data.frame.list(.y))[,1])
CodePudding user response:
in Base R you can use diff
and rowSums
as shown below:
c(NA, NA, sqrt(rowSums(diff(as.matrix(input_labs), 2)^2)))
[1] NA NA 12.955157 1.295516 16.832873 1.683287 25.381342 2.538134 31.493688 3.149369
You can cbind
the results to the original dataframe.