Home > other >  How to apply t-test between ranges of columns in R
How to apply t-test between ranges of columns in R

Time:01-16

I have a large dataset that looks like this. I was wondering if there is a clever way to apply a t-test, in each row, aka gene, and compare the counts between humans and mice.

I want to compete in each row (human_A,human_B,human_C) vs (mouse_A,mouse_B)

human_A = rnorm(20, 10, 1)
human_B <- rnorm(20, 10, 2)
human_C <- rnorm(20, 20, 3)

mouse_A = rnorm(20, 5, 1)
mouse_B <- rnorm(20, 10, 2)

genes <- paste0("gene_",rep(1:20))

df <- data.frame(genes,human_A,human_B,human_C,mouse_A,mouse_B)
head(df)
#>    genes   human_A   human_B  human_C  mouse_A   mouse_B
#> 1 gene_1  8.482934 10.396456 21.88825 6.070031  6.136563
#> 2 gene_2  9.836256 13.170547 23.04314 4.247680 11.781652
#> 3 gene_3  9.280803 11.184282 19.64985 6.010297  6.430591
#> 4 gene_4  9.069052  8.884374 19.95509 4.633871 11.233594
#> 5 gene_5  8.059434 10.314406 20.45426 4.519976  6.357627
#> 6 gene_6 11.433998 13.497519 20.28876 4.904321  9.599483

Created on 2023-01-15 with reprex v2.0.2

any help and advice are appreciated

CodePudding user response:

Here is a slightly different approach. I like the answer from @andre-wildberg just thought this approach might be a useful alternative.

library(tidyr)
library(dplyr)
library(broom)
library(purrr)

gene_t_test <- function(data) {
  t.test(value ~ species, data = data, na.action = na.pass) %>% 
    broom::tidy()
}

df_long <- df %>% 
  tidyr::pivot_longer(
    cols = -genes, 
    names_to = c('species', 'id'), 
    names_sep = "_", 
    values_to = 'value') %>% 
  dplyr::group_by(genes) %>% 
  tidyr::nest() %>% 
  dplyr::mutate(test = purrr::map(data, gene_t_test)) %>% 
  tidyr::unnest(test)

# A tibble: 20 × 12
# Groups:   genes [20]
   genes   data             estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method                  alternative
   <chr>   <list>              <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>                   <chr>      
 1 gene_1  <tibble [5 × 3]>     5.69      14.6      8.89     0.983   0.412      2.48   -15.2       26.6 Welch Two Sample t-test two.sided  
 2 gene_2  <tibble [5 × 3]>     4.26      12.6      8.37     0.978   0.418      2.34   -12.1       20.6 Welch Two Sample t-test two.sided  
 3 gene_3  <tibble [5 × 3]>     6.54      12.9      6.34     1.48    0.242      2.80    -8.10      21.2 Welch Two Sample t-test two.sided  
 4 gene_4  <tibble [5 × 3]>     6.89      13.3      6.43     1.23    0.310      2.87   -11.4       25.2 Welch Two Sample t-test two.sided  
 5 gene_5  <tibble [5 × 3]>     6.13      13.0      6.87     1.92    0.163      2.64    -4.86      17.1 Welch Two Sample t-test two.sided  
 6 gene_6  <tibble [5 × 3]>     7.43      13.8      6.42     2.12    0.126      2.94    -3.85      18.7 Welch Two Sample t-test two.sided  
 7 gene_7  <tibble [5 × 3]>     7.11      13.8      6.71     1.65    0.212      2.60    -7.91      22.1 Welch Two Sample t-test two.sided  
 8 gene_8  <tibble [5 × 3]>     5.88      12.7      6.80     1.51    0.229      2.99    -6.55      18.3 Welch Two Sample t-test two.sided  
 9 gene_9  <tibble [5 × 3]>     5.95      13.8      7.81     1.31    0.288      2.75    -9.23      21.1 Welch Two Sample t-test two.sided  
10 gene_10 <tibble [5 × 3]>     4.22      14.1      9.87     0.790   0.530      1.58   -25.6       34.1 Welch Two Sample t-test two.sided  
11 gene_11 <tibble [5 × 3]>     5.26      11.5      6.23     1.76    0.214      2.12    -6.94      17.5 Welch Two Sample t-test two.sided  
12 gene_12 <tibble [5 × 3]>     4.63      13.4      8.79     1.02    0.383      3.00    -9.83      19.1 Welch Two Sample t-test two.sided  
13 gene_13 <tibble [5 × 3]>     9.57      15.5      5.91     1.85    0.163      2.98    -7.00      26.1 Welch Two Sample t-test two.sided  
14 gene_14 <tibble [5 × 3]>     3.86      13.2      9.37     1.32    0.303      2.28    -7.33      15.1 Welch Two Sample t-test two.sided  
15 gene_15 <tibble [5 × 3]>     7.29      13.4      6.06     1.82    0.168      2.95    -5.61      20.2 Welch Two Sample t-test two.sided  
16 gene_16 <tibble [5 × 3]>     5.81      12.6      6.80     1.23    0.313      2.76    -9.98      21.6 Welch Two Sample t-test two.sided  
17 gene_17 <tibble [5 × 3]>     6.19      13.5      7.35     1.09    0.367      2.58   -13.6       26.0 Welch Two Sample t-test two.sided  
18 gene_18 <tibble [5 × 3]>     5.27      13.2      7.94     0.993   0.398      2.81   -12.3       22.8 Welch Two Sample t-test two.sided  
19 gene_19 <tibble [5 × 3]>     5.51      12.8      7.28     1.51    0.237      2.75    -6.75      17.8 Welch Two Sample t-test two.sided  
20 gene_20 <tibble [5 × 3]>     4.30      12.1      7.76     1.69    0.195      2.87    -4.03      12.6 Welch Two Sample t-test two.sided 

CodePudding user response:

An approach with dplyr

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(T = list(t.test(c_across(starts_with("human")),
    c_across(starts_with("mouse"))))) %>% 
  ungroup()
# A tibble: 20 × 7
   genes   human_A human_B human_C mouse_A mouse_B T
   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <list>
 1 gene_1    11.4     9.39    20.6    4.63   13.0  <htest>
 2 gene_2     9.44    6.44    18.9    5.19   10.5  <htest>
 3 gene_3    10.4     9.66    22.3    5.58   10.2  <htest>
 4 gene_4    10.6    12.4     17.8    6.40    9.76 <htest>
 5 gene_5    10.4    13.8     15.9    4.27    7.61 <htest>
 6 gene_6     9.89    9.14    21.3    6.30   11.2  <htest>
 7 gene_7    11.5     9.49    17.6    5.34    9.57 <htest>
 8 gene_8     9.91    6.47    24.3    6.04    9.63 <htest>
 9 gene_9    12.0    10.9     18.7    5.92   11.9  <htest>
10 gene_10    9.94    8.72    22.0    5.72   11.6  <htest>
11 gene_11   11.3    10.9     21.0    3.96   12.8  <htest>
12 gene_12   12.3    11.4     17.6    4.91    9.05 <htest>
13 gene_13    8.61   12.1     24.7    5.62   11.3  <htest>
14 gene_14    9.72    8.78    21.9    4.05   12.8  <htest>
15 gene_15    9.87   11.0     20.3    4.46    7.78 <htest>
16 gene_16   10.6     6.57    20.8    5.58    8.28 <htest>
17 gene_17    9.72    8.43    22.0    5.77    7.74 <htest>
18 gene_18    7.34    8.30    20.3    5.46    7.08 <htest>
19 gene_19    7.56    5.17    11.0    4.11   10.2  <htest>
20 gene_20   11.3    10.1     20.9    3.90   11.3  <htest>

Pulling out the p-value

df %>% 
  rowwise() %>% 
  mutate(T = t.test(c_across(starts_with("human")), 
    c_across(starts_with("mouse")))$p.value) %>% 
  ungroup()
# A tibble: 20 × 7
   genes   human_A human_B human_C mouse_A mouse_B      T
   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 gene_1    11.4     9.39    20.6    4.63   13.0  0.447
 2 gene_2     9.44    6.44    18.9    5.19   10.5  0.476
 3 gene_3    10.4     9.66    22.3    5.58   10.2  0.280
 4 gene_4    10.6    12.4     17.8    6.40    9.76 0.136
 5 gene_5    10.4    13.8     15.9    4.27    7.61 0.0601
 6 gene_6     9.89    9.14    21.3    6.30   11.2  0.388
 7 gene_7    11.5     9.49    17.6    5.34    9.57 0.196
 8 gene_8     9.91    6.47    24.3    6.04    9.63 0.409
 9 gene_9    12.0    10.9     18.7    5.92   11.9  0.310
10 gene_10    9.94    8.72    22.0    5.72   11.6  0.416
11 gene_11   11.3    10.9     21.0    3.96   12.8  0.384
12 gene_12   12.3    11.4     17.6    4.91    9.05 0.111
13 gene_13    8.61   12.1     24.7    5.62   11.3  0.326
14 gene_14    9.72    8.78    21.9    4.05   12.8  0.475
15 gene_15    9.87   11.0     20.3    4.46    7.78 0.138
16 gene_16   10.6     6.57    20.8    5.58    8.28 0.308
17 gene_17    9.72    8.43    22.0    5.77    7.74 0.263
18 gene_18    7.34    8.30    20.3    5.46    7.08 0.303
19 gene_19    7.56    5.17    11.0    4.11   10.2  0.847
20 gene_20   11.3    10.1     20.9    3.90   11.3  0.304

Data

set.seed(42)
human_A = rnorm(20, 10, 1)
human_B <- rnorm(20, 10, 2)
human_C <- rnorm(20, 20, 3)

mouse_A = rnorm(20, 5, 1)
mouse_B <- rnorm(20, 10, 2)

genes <- paste0("gene_",rep(1:20))

df <- data.frame(genes,human_A,human_B,human_C,mouse_A,mouse_B)

CodePudding user response:

We could it this way also:

  1. Bring the data in right position with pivot_longer
  2. Create lists with group_split
  3. iterate over each list with map_dfr applying t.test
  4. Use tidy() from broom package to get nice output
library(dplyr)
library(tidyr)
library(broom)
library(tibble)
library(purrr)

df %>% 
  pivot_longer(cols = -genes, names_to = c("group", ".value"),
               names_pattern = "^(human|mouse)(.*)") %>% 
  pivot_longer(-c(genes, group)) %>% 
  mutate(genes = factor(genes)) %>% 
  select(-name) %>% 
  group_split(genes) %>% 
  map_dfr(.f = function(df) {
    t.test(value ~ group, data = df) %>% 
      tidy() %>% 
      add_column(genes = unique(df$genes), .before = 1)
  })
genes   estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method                  alternative
   <fct>      <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>                   <chr>      
 1 gene_1      2.70      11.8      9.09     0.595   0.595      2.92   -12.0       17.3 Welch Two Sample t-test two.sided  
 2 gene_10     5.04      12.2      7.16     1.73    0.216      2.17    -6.60      16.7 Welch Two Sample t-test two.sided  
 3 gene_11     7.93      15.7      7.81     1.47    0.270      2.15   -13.7       29.6 Welch Two Sample t-test two.sided  
 4 gene_12     5.97      14.2      8.20     1.04    0.376      2.99   -12.4       24.3 Welch Two Sample t-test two.sided  
 5 gene_13     3.33      11.5      8.19     0.834   0.486      2.16   -12.7       19.3 Welch Two Sample t-test two.sided  
 6 gene_14     5.58      13.1      7.57     1.40    0.281      2.32    -9.52      20.7 Welch Two Sample t-test two.sided  
 7 gene_15     5.94      15.8      9.85     0.863   0.452      2.97   -16.1       27.9 Welch Two Sample t-test two.sided  
 8 gene_16     5.40      13.2      7.78     1.23    0.308      2.98    -8.67      19.5 Welch Two Sample t-test two.sided  
 9 gene_17     5.81      14.2      8.44     0.902   0.435      2.93   -15.0       26.6 Welch Two Sample t-test two.sided  
10 gene_18     7.00      13.6      6.61     1.36    0.273      2.82   -10.0       24.0 Welch Two Sample t-test two.sided  
11 gene_19     4.18      13.8      9.59     0.791   0.487      3.00   -12.6       21.0 Welch Two Sample t-test two.sided  
12 gene_2      6.66      13.2      6.56     2.08    0.134      2.86    -3.83      17.1 Welch Two Sample t-test two.sided  
13 gene_20     6.35      12.5      6.16     1.81    0.211      2.00    -8.71      21.4 Welch Two Sample t-test two.sided  
14 gene_3      6.03      13.7      7.69     1.47    0.246      2.72    -7.77      19.8 Welch Two Sample t-test two.sided  
15 gene_4      6.83      14.0      7.14     1.52    0.246      2.43    -9.56      23.2 Welch Two Sample t-test two.sided  
16 gene_5     10.3       16.5      6.16     1.73    0.183      3.00    -8.72      29.3 Welch Two Sample t-test two.sided  
17 gene_6      1.37      10.4      8.98     0.328   0.792      1.15   -37.6       40.3 Welch Two Sample t-test two.sided  
18 gene_7      5.40      13.9      8.50     0.956   0.411      2.95   -12.7       23.5 Welch Two Sample t-test two.sided  
19 gene_8      6.67      12.8      6.13     1.74    0.181      2.96    -5.60      18.9 Welch Two Sample t-test two.sided  
20 gene_9      6.95      14.3      7.39     1.69    0.191      2.98    -6.20      20.1 Welch Two Sample t-test two.sided  

CodePudding user response:

Using base R

df$ttest <- apply(df[-1], 1, function(x) 
   t.test(x[grep('human', names(x))], x[grep('mouse', names(x))]))
  • Related