This is similar to this. However what I'm interested is to calculate the percentage for every column. So for example when I do the below I can calculate column S1 by explicity listing it, however I want a way to do it for all columns without specifying it.
input <- 'Gene Exon S1 S2 S3
G1 E1 56 52 95
G1 E2 25 52 5
G1 E3 32 66 22
G2 E1 55 11 33
G2 E2 46 12 44'
df = read.table ( text=input, header=T)
df$Exon = NULL
df %>% group_by(Gene) %>% summarise ( per = S1 / sum (S1) )
Above will summarize the percent for S1 however when I tried using the a period it causes and error.
df %>% group_by(Gene) %>% summarise ( per = . / sum (.) )
thanks in advance.
CodePudding user response:
You can use across
for this:
library(dplyr)
df %>%
group_by(Gene) %>%
summarize(across(matches("^S[0-9] "), ~ . / sum(.)), .groups = "drop")
# # A tibble: 5 x 4
# Gene S1 S2 S3
# <chr> <dbl> <dbl> <dbl>
# 1 G1 0.496 0.306 0.779
# 2 G1 0.221 0.306 0.0410
# 3 G1 0.283 0.388 0.180
# 4 G2 0.545 0.478 0.429
# 5 G2 0.455 0.522 0.571