I have a large dataset that looks like this. I was wondering if there is a clever way to apply a t-test, in each row, aka gene, and compare the counts between humans and mice.
I want to compete in each row (human_A,human_B,human_C) vs (mouse_A,mouse_B)
human_A = rnorm(20, 10, 1)
human_B <- rnorm(20, 10, 2)
human_C <- rnorm(20, 20, 3)
mouse_A = rnorm(20, 5, 1)
mouse_B <- rnorm(20, 10, 2)
genes <- paste0("gene_",rep(1:20))
df <- data.frame(genes,human_A,human_B,human_C,mouse_A,mouse_B)
head(df)
#> genes human_A human_B human_C mouse_A mouse_B
#> 1 gene_1 8.482934 10.396456 21.88825 6.070031 6.136563
#> 2 gene_2 9.836256 13.170547 23.04314 4.247680 11.781652
#> 3 gene_3 9.280803 11.184282 19.64985 6.010297 6.430591
#> 4 gene_4 9.069052 8.884374 19.95509 4.633871 11.233594
#> 5 gene_5 8.059434 10.314406 20.45426 4.519976 6.357627
#> 6 gene_6 11.433998 13.497519 20.28876 4.904321 9.599483
Created on 2023-01-15 with reprex v2.0.2
any help and advice are appreciated
CodePudding user response:
Here is a slightly different approach. I like the answer from @andre-wildberg just thought this approach might be a useful alternative.
library(tidyr)
library(dplyr)
library(broom)
library(purrr)
gene_t_test <- function(data) {
t.test(value ~ species, data = data, na.action = na.pass) %>%
broom::tidy()
}
df_long <- df %>%
tidyr::pivot_longer(
cols = -genes,
names_to = c('species', 'id'),
names_sep = "_",
values_to = 'value') %>%
dplyr::group_by(genes) %>%
tidyr::nest() %>%
dplyr::mutate(test = purrr::map(data, gene_t_test)) %>%
tidyr::unnest(test)
# A tibble: 20 × 12
# Groups: genes [20]
genes data estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
<chr> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 gene_1 <tibble [5 × 3]> 5.69 14.6 8.89 0.983 0.412 2.48 -15.2 26.6 Welch Two Sample t-test two.sided
2 gene_2 <tibble [5 × 3]> 4.26 12.6 8.37 0.978 0.418 2.34 -12.1 20.6 Welch Two Sample t-test two.sided
3 gene_3 <tibble [5 × 3]> 6.54 12.9 6.34 1.48 0.242 2.80 -8.10 21.2 Welch Two Sample t-test two.sided
4 gene_4 <tibble [5 × 3]> 6.89 13.3 6.43 1.23 0.310 2.87 -11.4 25.2 Welch Two Sample t-test two.sided
5 gene_5 <tibble [5 × 3]> 6.13 13.0 6.87 1.92 0.163 2.64 -4.86 17.1 Welch Two Sample t-test two.sided
6 gene_6 <tibble [5 × 3]> 7.43 13.8 6.42 2.12 0.126 2.94 -3.85 18.7 Welch Two Sample t-test two.sided
7 gene_7 <tibble [5 × 3]> 7.11 13.8 6.71 1.65 0.212 2.60 -7.91 22.1 Welch Two Sample t-test two.sided
8 gene_8 <tibble [5 × 3]> 5.88 12.7 6.80 1.51 0.229 2.99 -6.55 18.3 Welch Two Sample t-test two.sided
9 gene_9 <tibble [5 × 3]> 5.95 13.8 7.81 1.31 0.288 2.75 -9.23 21.1 Welch Two Sample t-test two.sided
10 gene_10 <tibble [5 × 3]> 4.22 14.1 9.87 0.790 0.530 1.58 -25.6 34.1 Welch Two Sample t-test two.sided
11 gene_11 <tibble [5 × 3]> 5.26 11.5 6.23 1.76 0.214 2.12 -6.94 17.5 Welch Two Sample t-test two.sided
12 gene_12 <tibble [5 × 3]> 4.63 13.4 8.79 1.02 0.383 3.00 -9.83 19.1 Welch Two Sample t-test two.sided
13 gene_13 <tibble [5 × 3]> 9.57 15.5 5.91 1.85 0.163 2.98 -7.00 26.1 Welch Two Sample t-test two.sided
14 gene_14 <tibble [5 × 3]> 3.86 13.2 9.37 1.32 0.303 2.28 -7.33 15.1 Welch Two Sample t-test two.sided
15 gene_15 <tibble [5 × 3]> 7.29 13.4 6.06 1.82 0.168 2.95 -5.61 20.2 Welch Two Sample t-test two.sided
16 gene_16 <tibble [5 × 3]> 5.81 12.6 6.80 1.23 0.313 2.76 -9.98 21.6 Welch Two Sample t-test two.sided
17 gene_17 <tibble [5 × 3]> 6.19 13.5 7.35 1.09 0.367 2.58 -13.6 26.0 Welch Two Sample t-test two.sided
18 gene_18 <tibble [5 × 3]> 5.27 13.2 7.94 0.993 0.398 2.81 -12.3 22.8 Welch Two Sample t-test two.sided
19 gene_19 <tibble [5 × 3]> 5.51 12.8 7.28 1.51 0.237 2.75 -6.75 17.8 Welch Two Sample t-test two.sided
20 gene_20 <tibble [5 × 3]> 4.30 12.1 7.76 1.69 0.195 2.87 -4.03 12.6 Welch Two Sample t-test two.sided
CodePudding user response:
An approach with dplyr
library(dplyr)
df %>%
rowwise() %>%
mutate(T = list(t.test(c_across(starts_with("human")),
c_across(starts_with("mouse"))))) %>%
ungroup()
# A tibble: 20 × 7
genes human_A human_B human_C mouse_A mouse_B T
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list>
1 gene_1 11.4 9.39 20.6 4.63 13.0 <htest>
2 gene_2 9.44 6.44 18.9 5.19 10.5 <htest>
3 gene_3 10.4 9.66 22.3 5.58 10.2 <htest>
4 gene_4 10.6 12.4 17.8 6.40 9.76 <htest>
5 gene_5 10.4 13.8 15.9 4.27 7.61 <htest>
6 gene_6 9.89 9.14 21.3 6.30 11.2 <htest>
7 gene_7 11.5 9.49 17.6 5.34 9.57 <htest>
8 gene_8 9.91 6.47 24.3 6.04 9.63 <htest>
9 gene_9 12.0 10.9 18.7 5.92 11.9 <htest>
10 gene_10 9.94 8.72 22.0 5.72 11.6 <htest>
11 gene_11 11.3 10.9 21.0 3.96 12.8 <htest>
12 gene_12 12.3 11.4 17.6 4.91 9.05 <htest>
13 gene_13 8.61 12.1 24.7 5.62 11.3 <htest>
14 gene_14 9.72 8.78 21.9 4.05 12.8 <htest>
15 gene_15 9.87 11.0 20.3 4.46 7.78 <htest>
16 gene_16 10.6 6.57 20.8 5.58 8.28 <htest>
17 gene_17 9.72 8.43 22.0 5.77 7.74 <htest>
18 gene_18 7.34 8.30 20.3 5.46 7.08 <htest>
19 gene_19 7.56 5.17 11.0 4.11 10.2 <htest>
20 gene_20 11.3 10.1 20.9 3.90 11.3 <htest>
Pulling out the p-value
df %>%
rowwise() %>%
mutate(T = t.test(c_across(starts_with("human")),
c_across(starts_with("mouse")))$p.value) %>%
ungroup()
# A tibble: 20 × 7
genes human_A human_B human_C mouse_A mouse_B T
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 gene_1 11.4 9.39 20.6 4.63 13.0 0.447
2 gene_2 9.44 6.44 18.9 5.19 10.5 0.476
3 gene_3 10.4 9.66 22.3 5.58 10.2 0.280
4 gene_4 10.6 12.4 17.8 6.40 9.76 0.136
5 gene_5 10.4 13.8 15.9 4.27 7.61 0.0601
6 gene_6 9.89 9.14 21.3 6.30 11.2 0.388
7 gene_7 11.5 9.49 17.6 5.34 9.57 0.196
8 gene_8 9.91 6.47 24.3 6.04 9.63 0.409
9 gene_9 12.0 10.9 18.7 5.92 11.9 0.310
10 gene_10 9.94 8.72 22.0 5.72 11.6 0.416
11 gene_11 11.3 10.9 21.0 3.96 12.8 0.384
12 gene_12 12.3 11.4 17.6 4.91 9.05 0.111
13 gene_13 8.61 12.1 24.7 5.62 11.3 0.326
14 gene_14 9.72 8.78 21.9 4.05 12.8 0.475
15 gene_15 9.87 11.0 20.3 4.46 7.78 0.138
16 gene_16 10.6 6.57 20.8 5.58 8.28 0.308
17 gene_17 9.72 8.43 22.0 5.77 7.74 0.263
18 gene_18 7.34 8.30 20.3 5.46 7.08 0.303
19 gene_19 7.56 5.17 11.0 4.11 10.2 0.847
20 gene_20 11.3 10.1 20.9 3.90 11.3 0.304
Data
set.seed(42)
human_A = rnorm(20, 10, 1)
human_B <- rnorm(20, 10, 2)
human_C <- rnorm(20, 20, 3)
mouse_A = rnorm(20, 5, 1)
mouse_B <- rnorm(20, 10, 2)
genes <- paste0("gene_",rep(1:20))
df <- data.frame(genes,human_A,human_B,human_C,mouse_A,mouse_B)
CodePudding user response:
We could it this way also:
- Bring the data in right position with
pivot_longer
- Create lists with
group_split
- iterate over each list with
map_dfr
applying t.test - Use
tidy()
frombroom
package to get nice output
library(dplyr)
library(tidyr)
library(broom)
library(tibble)
library(purrr)
df %>%
pivot_longer(cols = -genes, names_to = c("group", ".value"),
names_pattern = "^(human|mouse)(.*)") %>%
pivot_longer(-c(genes, group)) %>%
mutate(genes = factor(genes)) %>%
select(-name) %>%
group_split(genes) %>%
map_dfr(.f = function(df) {
t.test(value ~ group, data = df) %>%
tidy() %>%
add_column(genes = unique(df$genes), .before = 1)
})
genes estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 gene_1 2.70 11.8 9.09 0.595 0.595 2.92 -12.0 17.3 Welch Two Sample t-test two.sided
2 gene_10 5.04 12.2 7.16 1.73 0.216 2.17 -6.60 16.7 Welch Two Sample t-test two.sided
3 gene_11 7.93 15.7 7.81 1.47 0.270 2.15 -13.7 29.6 Welch Two Sample t-test two.sided
4 gene_12 5.97 14.2 8.20 1.04 0.376 2.99 -12.4 24.3 Welch Two Sample t-test two.sided
5 gene_13 3.33 11.5 8.19 0.834 0.486 2.16 -12.7 19.3 Welch Two Sample t-test two.sided
6 gene_14 5.58 13.1 7.57 1.40 0.281 2.32 -9.52 20.7 Welch Two Sample t-test two.sided
7 gene_15 5.94 15.8 9.85 0.863 0.452 2.97 -16.1 27.9 Welch Two Sample t-test two.sided
8 gene_16 5.40 13.2 7.78 1.23 0.308 2.98 -8.67 19.5 Welch Two Sample t-test two.sided
9 gene_17 5.81 14.2 8.44 0.902 0.435 2.93 -15.0 26.6 Welch Two Sample t-test two.sided
10 gene_18 7.00 13.6 6.61 1.36 0.273 2.82 -10.0 24.0 Welch Two Sample t-test two.sided
11 gene_19 4.18 13.8 9.59 0.791 0.487 3.00 -12.6 21.0 Welch Two Sample t-test two.sided
12 gene_2 6.66 13.2 6.56 2.08 0.134 2.86 -3.83 17.1 Welch Two Sample t-test two.sided
13 gene_20 6.35 12.5 6.16 1.81 0.211 2.00 -8.71 21.4 Welch Two Sample t-test two.sided
14 gene_3 6.03 13.7 7.69 1.47 0.246 2.72 -7.77 19.8 Welch Two Sample t-test two.sided
15 gene_4 6.83 14.0 7.14 1.52 0.246 2.43 -9.56 23.2 Welch Two Sample t-test two.sided
16 gene_5 10.3 16.5 6.16 1.73 0.183 3.00 -8.72 29.3 Welch Two Sample t-test two.sided
17 gene_6 1.37 10.4 8.98 0.328 0.792 1.15 -37.6 40.3 Welch Two Sample t-test two.sided
18 gene_7 5.40 13.9 8.50 0.956 0.411 2.95 -12.7 23.5 Welch Two Sample t-test two.sided
19 gene_8 6.67 12.8 6.13 1.74 0.181 2.96 -5.60 18.9 Welch Two Sample t-test two.sided
20 gene_9 6.95 14.3 7.39 1.69 0.191 2.98 -6.20 20.1 Welch Two Sample t-test two.sided
CodePudding user response:
Using base R
df$ttest <- apply(df[-1], 1, function(x)
t.test(x[grep('human', names(x))], x[grep('mouse', names(x))]))