I am trying to do t test comparing control and treatment groups in a long table.
Part of table looks like this, the ones with T are the one with treatment while the ones without T are the controls and each group has triplicate:
Cell_line | Gene | Group | Values |
---|---|---|---|
A | a | 1 | 1 |
A | a | 1 | 2 |
A | a | 1 | 3 |
A | a | 1_T | 1 |
A | a | 1_T | 2 |
A | a | 1_T | 3 |
A | a | 2 | 1 |
A | a | 2 | 2 |
A | a | 2 | 3 |
A | a | 2_T | 1 |
A | a | 2_T | 2 |
A | a | 2_T | 3 |
A | a | 3 | 1 |
A | a | 3 | 2 |
A | a | 3 | 3 |
A | a | 3_T | 1 |
A | a | 3_T | 2 |
A | a | 3_T | 3 |
I want to compare the treatment with the respective control only, so it will be 1 vs 1_T, 2 vs 2_T, 3 vs 3_T and so on. My end goal is to generate a column of p-value from the t test comparing treatment and respective control.
I've tried the codes below and some other codes as well but all are not working. I am thinking if I should change the table format? Any suggestions or help would be much appreciated!
dataframe <- dataframe %>% group_by(Cell_line, Gene, Group) %>%
mutate(t.test(Values ~ Group))
dataframe_1 <- dataframe %>% group_by(Cell_line, Gene, Group) %>%
select_if(is.numeric) %>%
map_df(t.test(Values, Group, paired = T))
CodePudding user response:
You should separate the Group
column into 2 columns, one indicates ID
and the other indicates treatment(T) or control(C) groups.
library(dplyr)
library(tidyr)
df2 <- df %>%
separate(Group, c("ID", "Group"), sep = "_", fill = "right") %>%
mutate(Group = replace_na(Group, "C"))
# > df2
# Cell_line Gene ID Group Values
# 1 A a 1 C 19.00937
# 2 A a 1 C 19.24884
# 3 A a 1 C 17.69836
# 4 A a 1 T 25.38643
# 5 A a 1 T 23.04596
# 6 A a 1 T 24.25100
# ...
Then perform the two sample or paired t-test for each ID
:
df2 %>%
group_by(Cell_line, Gene, ID) %>%
group_map(~ t.test(Values ~ Group, .x, paired = TRUE))
Output
[[1]]
Paired t-test
data: Values by Group
t = -6.2599, df = 2, p-value = 0.02458
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-9.407919 -1.743297
sample estimates:
mean difference
-5.575608
[[2]]
Paired t-test
data: Values by Group
t = -8.9412, df = 2, p-value = 0.01228
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-8.261189 -2.893422
sample estimates:
mean difference
-5.577306
[[3]]
Paired t-test
data: Values by Group
t = -1.929, df = 2, p-value = 0.1935
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-11.844963 4.511769
sample estimates:
mean difference
-3.666597
Update
If you want to summarise each group with the p-value of each t-test, try summarise()
:
df2 %>%
group_by(Cell_line, Gene, ID) %>%
summarise(p.value = t.test(Values ~ Group, paired = TRUE)$p.value) %>%
ungroup()
# # A tibble: 3 × 4
# Cell_line Gene ID p.value
# <chr> <chr> <chr> <dbl>
# 1 A a 1 0.0246
# 2 A a 2 0.0123
# 3 A a 3 0.194
Data
df <- structure(list(Cell_line = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"), Gene = c("a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a"), Group = c("1", "1", "1", "1_T", "1_T", "1_T",
"2", "2", "2", "2_T", "2_T", "2_T", "3", "3", "3", "3_T", "3_T",
"3_T"), Values = c(19.0093682898042, 19.2488407161094, 17.6983554368874,
25.3864281704297, 23.0459637706291, 24.2509958128999, 18.6843799736362,
20.7674389968636, 18.833524600653, 23.2825845151011, 26.1647404821767,
25.5699355732609, 20.820013126065, 20.2674129364223, 21.3344018769664,
22.4175652694876, 22.2066293870532, 28.7974230636024)), row.names = c(NA,
-18L), class = "data.frame")