S1 S2 S3 S4
Cohort 1 2 1 1
G1 23 44 67 13
G2 11 78 88 30
G3 45 46 56 66
G4 67 77 22 45
This is a demo dataset that I am using where S1, S2... are samples, cohort is the cohort variable which is 1 or 2, and G1, G2... are genes. The values are the expression values.
I want to find mean expression in cohort 1 and cohort 2.
I tried using if statements like if(data$cohort ==1)
but it gives me an error: the condition has length > 1
Is there an easy way to work this out?
CodePudding user response:
Data frames are built around columns, not rows. I would first tidy the data into a long column-based format:
library(tidyr)
library(dplyr)
library(tibble)
df = t(data) |>
as.data.frame() |>
rownames_to_column(var = "sample") |>
pivot_longer(cols = starts_with("G"), names_to = "gene", values_to = "expression")
df
# # A tibble: 16 × 4
# sample Cohort gene expression
# <chr> <int> <chr> <int>
# 1 S1 1 G1 23
# 2 S1 1 G2 11
# 3 S1 1 G3 45
# 4 S1 1 G4 67
# 5 S2 2 G1 44
# 6 S2 2 G2 78
# 7 S2 2 G3 46
# 8 S2 2 G4 77
# 9 S3 1 G1 67
# 10 S3 1 G2 88
# ...
Now we have a clear grouping column and a value column, we can use any method from the FAQ on calculating mean by group. Here's the dplyr
method:
df |>
group_by(Cohort) %>%
summarize(mean_ex = mean(expression))
# # A tibble: 2 × 2
# Cohort mean_ex
# <int> <dbl>
# 1 1 44.4
# 2 2 61.2
(And you could group_by(Cohort, gene)
if you want the mean grouped by both of those... it wasn't clear in your question what your desired output is.)
Using this sample data:
data = read.table(text = ' S1 S2 S3 S4
Cohort 1 2 1 1
G1 23 44 67 13
G2 11 78 88 30
G3 45 46 56 66
G4 67 77 22 45', header = T)
CodePudding user response:
Transpose your data, then group by Cohort
and summarize dplyr::across()
all gene columns:
library(dplyr)
data %>%
t() %>%
as.data.frame() %>%
group_by(Cohort) %>%
summarize(across(G1:G4, mean))
# A tibble: 2 × 5
Cohort G1 G2 G3 G4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 34.3 43 55.7 44.7
2 2 44 78 46 77
CodePudding user response:
This is another possibility:
df %>% pivot_longer(-Cohort) %>%
nest(data = -Cohort) %>%
mutate(mean = map(data, ~mean(.$value))) %>%
unnest(mean)
#> # A tibble: 2 × 3
#> Cohort data mean
#> <int> <list> <dbl>
#> 1 1 <tibble [12 × 2]> 44.4
#> 2 2 <tibble [4 × 2]> 61.2
Data:
df <- read.table(text = "
S1 S2 S3 S4
Cohort 1 2 1 1
G1 23 44 67 13
G2 11 78 88 30
G3 45 46 56 66
G4 67 77 22 45", header =T) %>%
t() %>%
as.data.frame()