Multiple permutations - R-CodePudding

I am fairly new to programming, so forgive me if I give too little information. I have a df which looks like something like this:

Diagnosis	Value	Brainregion
NC	2	region_a
NC	3	region_b
BD	4	region_a
BD	5	region_b

I would like to perform a permutation test between same brain regions of different diagnoses (to clarify: mean value of region_a in BD vs mean value of region_a in NC, mean value of region_b in BD vs mean value of region_b in NC and so on).

I would like to use a code that would help me do it in one step for every region.

I tried adapting the method described below, but I can't seem to make it work as intended.

Multiple groups tests via permutation

Can someone please help me?

P.S. I have another version of the same dataframe which looks like this, if it can be more useful:

Diagnosis	Region_a	Region_b
NC	2	3
BD	4	5

CodePudding user response：

With combn:

library(dplyr)
df %>% 
  group_by(Brainregion) %>% 
  summarise(Diagnosis = combn(Diagnosis, 2, paste0, collapse = '-'), 
            p.value = combn(Value, 2, function(x) t.test(x)$p.value))

# A tibble: 2 × 3
  Brainregion Diagnosis p.value
  <chr>       <chr>       <dbl>
1 region_a    NC-BD       0.205
2 region_b    NC-BD       0.156

CodePudding user response：

Using by to split the data into regions and performing the t.tests.

by(dat, dat$region, \(x) {
  tt <- with(x, t.test(value ~ diagnosis))
  data.frame(region=el(as.character(x$region)), tt[c('statistic', 'p.value')],
             hypothesis=toString(unique(x$diagnosis)))
}) |> do.call(what=rbind)
#   region statistic    p.value hypothesis
# a      a  1.628979 0.23082956     NC, BD
# b      b -2.813154 0.05840455     NC, BD
# c      c  1.030808 0.36206117     NC, BD

Data:

dat <- structure(list(diagnosis = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), levels = c("NC", 
"BD"), class = "factor"), region = structure(c(1L, 1L, 2L, 2L, 
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), levels = c("a", 
"b", "c"), class = "factor"), value = c(0.914806043496355, 0.937075413297862, 
0.286139534786344, 0.830447626067325, 0.641745518893003, 0.519095949130133, 
0.736588314641267, 0.13466659723781, 0.656992290401831, 0.705064784036949, 
0.45774177624844, 0.719112251652405, 0.934672247152776, 0.255428824340925, 
0.462292822543532, 0.940014522755519, 0.978226428385824, 0.117487361654639
)), out.attrs = list(dim = structure(2:3, names = c("diagnosis", 
"region")), dimnames = list(diagnosis = c("diagnosis=NC", "diagnosis=BD"
), region = c("region=a", "region=b", "region=c"))), row.names = c(NA, 
-18L), class = "data.frame")