I have a data frame...
example <- data.frame(obs_val= c(20,15,3,7,5), patient = c("pt1","pt2","pt3","pt4","pt5"))
... where every row or "patient" is a unique observation.
My goal is to generate a data frame that subtracts each patient's observed value (obs_val
) from another patient's obs_val
. This subtraction would be a permutation, where i.e. pt1
does not have their own obs_val
subtracted from their self. Ideally, the final data frame should look something like the following:
pt1-pt2 pt1-pt3 pt1-pt4 pt1-pt5 pt2-pt3 pt2-pt4 ...
obs_val_diff 5 17 13 15 12 8 ...
Any suggestions on solving this problem, or reformatting the final data frame, are greatly appreciated. Thank you!
CodePudding user response:
You can just join the dataframe on itself, remove the rows where the patient has been matched to itself, and retain the differences:
library(data.table)
library(magrittr)
setDT(example)
example[,id:=1][example, on=.(id), allow.cartesian=T] %>%
.[patient!=i.patient] %>%
.[, .(p1 = i.patient, p2=patient, p1_minus_p2=i.obs_val-obs_val)]
Output:
p1 p2 p1_minus_p2
1: pt1 pt2 5
2: pt1 pt3 17
3: pt1 pt4 13
4: pt1 pt5 15
5: pt2 pt1 -5
6: pt2 pt3 12
7: pt2 pt4 8
8: pt2 pt5 10
9: pt3 pt1 -17
10: pt3 pt2 -12
11: pt3 pt4 -4
12: pt3 pt5 -2
13: pt4 pt1 -13
14: pt4 pt2 -8
15: pt4 pt3 4
16: pt4 pt5 2
17: pt5 pt1 -15
18: pt5 pt2 -10
19: pt5 pt3 2
20: pt5 pt4 -2
CodePudding user response:
Another option is to use combn
to get all the combinations and then map out the subtractions.
library(tidyverse)
data.frame(t(combn(example$patient, 2))) |>
mutate(obs_val_diff = map2_dbl(X1, X2, ~example[example$patient ==.x, "obs_val"] -
example[example$patient ==.y, "obs_val"])) |>
unite(test, X1, X2, sep = "-") |>
pivot_wider(names_from = test, values_from = obs_val_diff)
#> # A tibble: 1 x 10
#> `pt1-pt2` `pt1-pt3` `pt1-pt4` pt1-pt~1 pt2-p~2 pt2-p~3 pt2-p~4 pt3-p~5 pt3-p~6
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5 17 13 15 12 8 10 -4 -2
#> # ... with 1 more variable: `pt4-pt5` <dbl>, and abbreviated variable names
#> # 1: `pt1-pt5`, 2: `pt2-pt3`, 3: `pt2-pt4`, 4: `pt2-pt5`, 5: `pt3-pt4`,
#> # 6: `pt3-pt5`
or in base R:
apply(t(combn(example$patient, 2)), 1,
\(x) -diff(example[example$patient %in% x, "obs_val"])) |>
(\(v) matrix(v, ncol = length(v)))() |>
as.data.frame() |>
`colnames<-`(apply(t(combn(example$patient, 2)), 1,
\(x) paste(x, collapse = "-")))
#> pt1-pt2 pt1-pt3 pt1-pt4 pt1-pt5 pt2-pt3 pt2-pt4 pt2-pt5 pt3-pt4 pt3-pt5
#> 1 5 17 13 15 12 8 10 -4 -2
#> pt4-pt5
#> 1 2