Home > database >  Perform t-tests by groups
Perform t-tests by groups

Time:07-14

I am trying to do t test comparing control and treatment groups in a long table.

Part of table looks like this, the ones with T are the one with treatment while the ones without T are the controls and each group has triplicate:

Cell_line Gene Group Values
A a 1 1
A a 1 2
A a 1 3
A a 1_T 1
A a 1_T 2
A a 1_T 3
A a 2 1
A a 2 2
A a 2 3
A a 2_T 1
A a 2_T 2
A a 2_T 3
A a 3 1
A a 3 2
A a 3 3
A a 3_T 1
A a 3_T 2
A a 3_T 3

I want to compare the treatment with the respective control only, so it will be 1 vs 1_T, 2 vs 2_T, 3 vs 3_T and so on. My end goal is to generate a column of p-value from the t test comparing treatment and respective control.

I've tried the codes below and some other codes as well but all are not working. I am thinking if I should change the table format? Any suggestions or help would be much appreciated!

dataframe <- dataframe %>% group_by(Cell_line, Gene, Group) %>%
 mutate(t.test(Values ~ Group))

dataframe_1 <- dataframe %>% group_by(Cell_line, Gene, Group) %>%
 select_if(is.numeric) %>%
 map_df(t.test(Values, Group, paired = T))

CodePudding user response:

You should separate the Group column into 2 columns, one indicates ID and the other indicates treatment(T) or control(C) groups.

library(dplyr)
library(tidyr)

df2 <- df %>%
  separate(Group, c("ID", "Group"), sep = "_", fill = "right") %>%
  mutate(Group = replace_na(Group, "C"))

# > df2
#    Cell_line Gene ID Group   Values
# 1          A    a  1     C 19.00937
# 2          A    a  1     C 19.24884
# 3          A    a  1     C 17.69836
# 4          A    a  1     T 25.38643
# 5          A    a  1     T 23.04596
# 6          A    a  1     T 24.25100
# ...

Then perform the two sample or paired t-test for each ID:

df2 %>%
  group_by(Cell_line, Gene, ID) %>%
  group_map(~ t.test(Values ~ Group, .x, paired = TRUE))
Output
[[1]]
        Paired t-test

data:  Values by Group
t = -6.2599, df = 2, p-value = 0.02458
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -9.407919 -1.743297
sample estimates:
mean difference
      -5.575608

[[2]]
        Paired t-test

data:  Values by Group
t = -8.9412, df = 2, p-value = 0.01228
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -8.261189 -2.893422
sample estimates:
mean difference
      -5.577306

[[3]]
        Paired t-test

data:  Values by Group
t = -1.929, df = 2, p-value = 0.1935
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -11.844963   4.511769
sample estimates:
mean difference
      -3.666597

Update

If you want to summarise each group with the p-value of each t-test, try summarise():

df2 %>%
  group_by(Cell_line, Gene, ID) %>%
  summarise(p.value = t.test(Values ~ Group, paired = TRUE)$p.value) %>%
  ungroup()

# # A tibble: 3 × 4
#   Cell_line Gene  ID    p.value
#   <chr>     <chr> <chr>   <dbl>
# 1 A         a     1      0.0246
# 2 A         a     2      0.0123
# 3 A         a     3      0.194

Data
df <- structure(list(Cell_line = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"), Gene = c("a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "a", "a"), Group = c("1", "1", "1", "1_T", "1_T", "1_T",
"2", "2", "2", "2_T", "2_T", "2_T", "3", "3", "3", "3_T", "3_T",
"3_T"), Values = c(19.0093682898042, 19.2488407161094, 17.6983554368874,
25.3864281704297, 23.0459637706291, 24.2509958128999, 18.6843799736362,
20.7674389968636, 18.833524600653, 23.2825845151011, 26.1647404821767,
25.5699355732609, 20.820013126065, 20.2674129364223, 21.3344018769664,
22.4175652694876, 22.2066293870532, 28.7974230636024)), row.names = c(NA, 
-18L), class = "data.frame")
  • Related