I am fairly new to programming, so forgive me if I give too little information. I have a df which looks like something like this:
Diagnosis | Value | Brainregion |
---|---|---|
NC | 2 | region_a |
NC | 3 | region_b |
BD | 4 | region_a |
BD | 5 | region_b |
I would like to perform a permutation test between same brain regions of different diagnoses (to clarify: mean value of region_a in BD vs mean value of region_a in NC, mean value of region_b in BD vs mean value of region_b in NC and so on).
I would like to use a code that would help me do it in one step for every region.
I tried adapting the method described below, but I can't seem to make it work as intended.
Multiple groups tests via permutation
Can someone please help me?
P.S. I have another version of the same dataframe which looks like this, if it can be more useful:
Diagnosis | Region_a | Region_b |
---|---|---|
NC | 2 | 3 |
BD | 4 | 5 |
CodePudding user response:
With combn
:
library(dplyr)
df %>%
group_by(Brainregion) %>%
summarise(Diagnosis = combn(Diagnosis, 2, paste0, collapse = '-'),
p.value = combn(Value, 2, function(x) t.test(x)$p.value))
# A tibble: 2 × 3
Brainregion Diagnosis p.value
<chr> <chr> <dbl>
1 region_a NC-BD 0.205
2 region_b NC-BD 0.156
CodePudding user response:
Using by
to split the data into regions and performing the t.test
s.
by(dat, dat$region, \(x) {
tt <- with(x, t.test(value ~ diagnosis))
data.frame(region=el(as.character(x$region)), tt[c('statistic', 'p.value')],
hypothesis=toString(unique(x$diagnosis)))
}) |> do.call(what=rbind)
# region statistic p.value hypothesis
# a a 1.628979 0.23082956 NC, BD
# b b -2.813154 0.05840455 NC, BD
# c c 1.030808 0.36206117 NC, BD
Data:
dat <- structure(list(diagnosis = structure(c(1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), levels = c("NC",
"BD"), class = "factor"), region = structure(c(1L, 1L, 2L, 2L,
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), levels = c("a",
"b", "c"), class = "factor"), value = c(0.914806043496355, 0.937075413297862,
0.286139534786344, 0.830447626067325, 0.641745518893003, 0.519095949130133,
0.736588314641267, 0.13466659723781, 0.656992290401831, 0.705064784036949,
0.45774177624844, 0.719112251652405, 0.934672247152776, 0.255428824340925,
0.462292822543532, 0.940014522755519, 0.978226428385824, 0.117487361654639
)), out.attrs = list(dim = structure(2:3, names = c("diagnosis",
"region")), dimnames = list(diagnosis = c("diagnosis=NC", "diagnosis=BD"
), region = c("region=a", "region=b", "region=c"))), row.names = c(NA,
-18L), class = "data.frame")