Home > database >  Using Dplyr to calculate percent by group for every column without specifying the name?
Using Dplyr to calculate percent by group for every column without specifying the name?

Time:05-07

This is similar to this. However what I'm interested is to calculate the percentage for every column. So for example when I do the below I can calculate column S1 by explicity listing it, however I want a way to do it for all columns without specifying it.

input <- 'Gene  Exon    S1  S2  S3
G1  E1  56  52  95
G1  E2  25  52  5
G1  E3  32  66  22
G2  E1  55  11  33
G2  E2  46  12  44'

df = read.table ( text=input, header=T)
df$Exon = NULL 
df %>% group_by(Gene) %>% summarise ( per = S1 / sum (S1) ) 

Above will summarize the percent for S1 however when I tried using the a period it causes and error.

df %>% group_by(Gene) %>% summarise ( per = . / sum (.) ) 

thanks in advance.

CodePudding user response:

You can use across for this:

library(dplyr)
df %>%
  group_by(Gene) %>%
  summarize(across(matches("^S[0-9] "), ~ . / sum(.)), .groups = "drop") 
# # A tibble: 5 x 4
#   Gene     S1    S2     S3
#   <chr> <dbl> <dbl>  <dbl>
# 1 G1    0.496 0.306 0.779 
# 2 G1    0.221 0.306 0.0410
# 3 G1    0.283 0.388 0.180 
# 4 G2    0.545 0.478 0.429 
# 5 G2    0.455 0.522 0.571 
  • Related