I have been trying to figure out a more efficient way to summarize rowMeans for various samples belonging to the same group. I couldn't find a good post to what I would think would be a common problem.
I'd like to have something more automated using a metadata file but can't figure it out. Any guidance would be greatly appreciated!
file
looks like this:
Target_Name | Target_position | Sample_1 | Sample_2 | Sample_3 | Sample_4 |
---|---|---|---|---|---|
Target-1 | 1 | 100 | 105 | 54 | 55 |
Target-1 | 2 | 120 | 125 | 57 | 58 |
metadata
looks like this:
group | sample |
---|---|
Control | Sample_1 |
Control | Sample_2 |
Test | Sample_3 |
Test | Sample_4 |
My Expected output looks like this:
Target_Name | Target_position | Sample_1 | Sample_2 | Sample_3 | Sample_4 | Control_Mean | Test_Mean |
---|---|---|---|---|---|---|---|
Target-1 | 1 | 100 | 105 | 54 | 55 | 102.5 | 54.5 |
Target-1 | 2 | 120 | 125 | 57 | 58 | 122.5 | 57.5 |
I am able to generate the expected output using
file$Control_Mean <- rowMeans(file[ , c(3,4)], na.rm=FALSE)
file$Test_Mean <- rowMeans(file[ , c(5,6)], na.rm=FALSE)
But I was wondering if there is a more automated approach, as the group names in my metadata file will be different all of the time and not just called Control or Test. Any suggestions? Thanks!
CodePudding user response:
Here is a solution using dplyr
and tidyr
Packaged
library(dplyr)
library(tidyr)
Input Data
# Data
file <- tibble::tribble(
~Target_Name, ~Target_position, ~Sample_1, ~Sample_2, ~Sample_3, ~Sample_4,
"Target-1", 1L, 100L, 105L, 54L, 55L,
"Target-1", 2L, 120L, 125L, 57L, 58L
)
metadata <- tibble::tribble(
~group, ~sample,
"Control", "Sample_1",
"Control", "Sample_2",
"Test", "Sample_3",
"Test", "Sample_4"
)
Solution
file %>%
pivot_longer(Sample_1:Sample_4, names_to = "sample", values_to = "value") %>%
left_join(metadata) %>%
group_by(Target_position, group) %>%
summarise(mean = mean(value)) %>%
ungroup() %>%
pivot_wider(id_cols = Target_position, names_from = "group", values_from = "mean",
names_prefix = "Mean_") %>%
left_join(file) %>%
select(Target_Name, Target_position, Sample_1:Sample_4, starts_with("Mean"))
Output
## A tibble: 2 × 8
# Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Mean_Control Mean_Test
# <chr> <int> <int> <int> <int> <int> <dbl> <dbl>
#1 Target-1 1 100 105 54 55 102.5 54.5
#
#2 Target-1 2 120 125 57 58 122.5 57.5
Created on 2022-12-08 with reprex v2.0.2
CodePudding user response:
Using RuamPimentel's very handy data,
file %>%
mutate(purrr::map_dfc(split(metadata$sample, metadata$group),
function(z) rowMeans(cur_data()[z])))
# # A tibble: 2 x 8
# Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Control Test
# <chr> <int> <int> <int> <int> <int> <dbl> <dbl>
# 1 Target-1 1 100 105 54 55 102. 54.5
# 2 Target-1 2 120 125 57 58 122. 57.5