Iterative function to subtract all possible permutations in R-CodePudding

I have a data frame...

example <- data.frame(obs_val= c(20,15,3,7,5), patient = c("pt1","pt2","pt3","pt4","pt5"))

... where every row or "patient" is a unique observation.

My goal is to generate a data frame that subtracts each patient's observed value (obs_val) from another patient's obs_val. This subtraction would be a permutation, where i.e. pt1 does not have their own obs_val subtracted from their self. Ideally, the final data frame should look something like the following:

             pt1-pt2    pt1-pt3    pt1-pt4    pt1-pt5    pt2-pt3    pt2-pt4    ...

obs_val_diff    5          17         13         15         12         8       ...

Any suggestions on solving this problem, or reformatting the final data frame, are greatly appreciated. Thank you!

CodePudding user response：

You can just join the dataframe on itself, remove the rows where the patient has been matched to itself, and retain the differences:

library(data.table)
library(magrittr)

setDT(example)
example[,id:=1][example, on=.(id), allow.cartesian=T] %>% 
  .[patient!=i.patient] %>% 
  .[, .(p1 = i.patient, p2=patient, p1_minus_p2=i.obs_val-obs_val)]

Output:

     p1  p2 p1_minus_p2
 1: pt1 pt2           5
 2: pt1 pt3          17
 3: pt1 pt4          13
 4: pt1 pt5          15
 5: pt2 pt1          -5
 6: pt2 pt3          12
 7: pt2 pt4           8
 8: pt2 pt5          10
 9: pt3 pt1         -17
10: pt3 pt2         -12
11: pt3 pt4          -4
12: pt3 pt5          -2
13: pt4 pt1         -13
14: pt4 pt2          -8
15: pt4 pt3           4
16: pt4 pt5           2
17: pt5 pt1         -15
18: pt5 pt2         -10
19: pt5 pt3           2
20: pt5 pt4          -2

CodePudding user response：

Another option is to use combn to get all the combinations and then map out the subtractions.

library(tidyverse)

data.frame(t(combn(example$patient, 2))) |>
  mutate(obs_val_diff = map2_dbl(X1, X2, ~example[example$patient ==.x, "obs_val"] -
                                   example[example$patient ==.y, "obs_val"])) |>
  unite(test, X1, X2, sep = "-") |>
  pivot_wider(names_from = test, values_from = obs_val_diff)
#> # A tibble: 1 x 10
#>   `pt1-pt2` `pt1-pt3` `pt1-pt4` pt1-pt~1 pt2-p~2 pt2-p~3 pt2-p~4 pt3-p~5 pt3-p~6
#>       <dbl>     <dbl>     <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1         5        17        13       15      12       8      10      -4      -2
#> # ... with 1 more variable: `pt4-pt5` <dbl>, and abbreviated variable names
#> #   1: `pt1-pt5`, 2: `pt2-pt3`, 3: `pt2-pt4`, 4: `pt2-pt5`, 5: `pt3-pt4`,
#> #   6: `pt3-pt5`

or in base R:


apply(t(combn(example$patient, 2)), 1, 
      \(x) -diff(example[example$patient %in% x, "obs_val"])) |>
  (\(v) matrix(v, ncol = length(v)))() |>
  as.data.frame() |>
  `colnames<-`(apply(t(combn(example$patient, 2)), 1, 
                     \(x) paste(x, collapse = "-")))
#>   pt1-pt2 pt1-pt3 pt1-pt4 pt1-pt5 pt2-pt3 pt2-pt4 pt2-pt5 pt3-pt4 pt3-pt5
#> 1       5      17      13      15      12       8      10      -4      -2
#>   pt4-pt5
#> 1       2