Home > OS >  Obtain row means for columns in same group using metadata file
Obtain row means for columns in same group using metadata file

Time:12-09

I have been trying to figure out a more efficient way to summarize rowMeans for various samples belonging to the same group. I couldn't find a good post to what I would think would be a common problem.

I'd like to have something more automated using a metadata file but can't figure it out. Any guidance would be greatly appreciated!

file looks like this:

Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4
Target-1 1 100 105 54 55
Target-1 2 120 125 57 58

metadata looks like this:

group sample
Control Sample_1
Control Sample_2
Test Sample_3
Test Sample_4

My Expected output looks like this:

Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Control_Mean Test_Mean
Target-1 1 100 105 54 55 102.5 54.5
Target-1 2 120 125 57 58 122.5 57.5

I am able to generate the expected output using

file$Control_Mean <- rowMeans(file[ , c(3,4)], na.rm=FALSE)
file$Test_Mean <- rowMeans(file[ , c(5,6)], na.rm=FALSE)

But I was wondering if there is a more automated approach, as the group names in my metadata file will be different all of the time and not just called Control or Test. Any suggestions? Thanks!

CodePudding user response:

Here is a solution using dplyr and tidyr

Packaged

library(dplyr)
library(tidyr)

Input Data

# Data
file <- tibble::tribble(
  ~Target_Name, ~Target_position, ~Sample_1, ~Sample_2, ~Sample_3, ~Sample_4,
    "Target-1",               1L,      100L,      105L,       54L,       55L,
    "Target-1",               2L,      120L,      125L,       57L,       58L
  )


metadata <- tibble::tribble(
     ~group,    ~sample,
  "Control", "Sample_1",
  "Control", "Sample_2",
     "Test", "Sample_3",
     "Test", "Sample_4"
  )

Solution

file %>% 
  pivot_longer(Sample_1:Sample_4, names_to = "sample", values_to = "value") %>% 
  left_join(metadata) %>% 
  group_by(Target_position, group) %>% 
  summarise(mean = mean(value)) %>% 
  ungroup() %>% 
  pivot_wider(id_cols = Target_position, names_from = "group", values_from = "mean",
              names_prefix = "Mean_") %>% 
  left_join(file) %>% 
  select(Target_Name, Target_position, Sample_1:Sample_4, starts_with("Mean"))

Output

## A tibble: 2 × 8
#  Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Mean_Control Mean_Test
#  <chr>                 <int>    <int>    <int>    <int>    <int>        <dbl>     <dbl>
#1 Target-1                  1      100      105       54       55         102.5     54.5
#
#2 Target-1                  2      120      125       57       58         122.5     57.5

Created on 2022-12-08 with reprex v2.0.2

CodePudding user response:

Using RuamPimentel's very handy data,

file %>%
  mutate(purrr::map_dfc(split(metadata$sample, metadata$group),
                        function(z) rowMeans(cur_data()[z])))
# # A tibble: 2 x 8
#   Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Control  Test
#   <chr>                 <int>    <int>    <int>    <int>    <int>   <dbl> <dbl>
# 1 Target-1                  1      100      105       54       55    102.  54.5
# 2 Target-1                  2      120      125       57       58    122.  57.5
  • Related