Obtain row means for columns in same group using metadata file-CodePudding

I have been trying to figure out a more efficient way to summarize rowMeans for various samples belonging to the same group. I couldn't find a good post to what I would think would be a common problem.

I'd like to have something more automated using a metadata file but can't figure it out. Any guidance would be greatly appreciated!

file looks like this:

Target_Name	Target_position	Sample_1	Sample_2	Sample_3	Sample_4
Target-1	1	100	105	54	55
Target-1	2	120	125	57	58

metadata looks like this:

group	sample
Control	Sample_1
Control	Sample_2
Test	Sample_3
Test	Sample_4

My Expected output looks like this:

Target_Name	Target_position	Sample_1	Sample_2	Sample_3	Sample_4	Control_Mean	Test_Mean
Target-1	1	100	105	54	55	102.5	54.5
Target-1	2	120	125	57	58	122.5	57.5

I am able to generate the expected output using

file$Control_Mean <- rowMeans(file[ , c(3,4)], na.rm=FALSE)
file$Test_Mean <- rowMeans(file[ , c(5,6)], na.rm=FALSE)

But I was wondering if there is a more automated approach, as the group names in my metadata file will be different all of the time and not just called Control or Test. Any suggestions? Thanks!

CodePudding user response：

Here is a solution using dplyr and tidyr

Packaged

library(dplyr)
library(tidyr)

Input Data

# Data
file <- tibble::tribble(
  ~Target_Name, ~Target_position, ~Sample_1, ~Sample_2, ~Sample_3, ~Sample_4,
    "Target-1",               1L,      100L,      105L,       54L,       55L,
    "Target-1",               2L,      120L,      125L,       57L,       58L
  )


metadata <- tibble::tribble(
     ~group,    ~sample,
  "Control", "Sample_1",
  "Control", "Sample_2",
     "Test", "Sample_3",
     "Test", "Sample_4"
  )

Solution

file %>% 
  pivot_longer(Sample_1:Sample_4, names_to = "sample", values_to = "value") %>% 
  left_join(metadata) %>% 
  group_by(Target_position, group) %>% 
  summarise(mean = mean(value)) %>% 
  ungroup() %>% 
  pivot_wider(id_cols = Target_position, names_from = "group", values_from = "mean",
              names_prefix = "Mean_") %>% 
  left_join(file) %>% 
  select(Target_Name, Target_position, Sample_1:Sample_4, starts_with("Mean"))

Output

## A tibble: 2 × 8
#  Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Mean_Control Mean_Test
#  <chr>                 <int>    <int>    <int>    <int>    <int>        <dbl>     <dbl>
#1 Target-1                  1      100      105       54       55         102.5     54.5
#
#2 Target-1                  2      120      125       57       58         122.5     57.5

^{Created on 2022-12-08 with reprex v2.0.2}

CodePudding user response：

Using RuamPimentel's very handy data,

file %>%
  mutate(purrr::map_dfc(split(metadata$sample, metadata$group),
                        function(z) rowMeans(cur_data()[z])))
# # A tibble: 2 x 8
#   Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Control  Test
#   <chr>                 <int>    <int>    <int>    <int>    <int>   <dbl> <dbl>
# 1 Target-1                  1      100      105       54       55    102.  54.5
# 2 Target-1                  2      120      125       57       58    122.  57.5