Calcuate Ratio Matrix using R-CodePudding

I was wondering if there is a simple method to calculate a ratio matrix for each element in a data frame. Example -

gene sample1 sample2 sample3 sample4 .....
aa     2       2       3      2
aa     1       5       2      1
aa     4       1       2      3
bb     1       2       1      2
bb     2       1       1      2

and I was the ratio for each element from sample1 to sample4 calculated for common row values in gene in each column. The calculation would be like this -

gene sample1 sample2 sample3 sample4 .....
aa     2/7     2/8     3/7      2/6
aa     1/7     5/8     2/7      1/6
aa     4/7     1/8     2/7      3/6
bb     1/3     2/3     1/2      2/4
bb     2/3     1/3     1/2      2/4

The result would be like this -

gene  sample1  sample2  sample3  sample4 .....
aa     .28       .25       .42      .33
aa     .14       .62       .28      .16
aa     .57       .12       .28      .5
bb     .33       .66       .5       .5
bb     .66       .33       .5       .5

What I have tried in a loop is this -

tf <- dd %>%
        group_by(symbol) %>%
        summarise_if(is.numeric, mean)

but this summarises but does not calculate for each element and keep the same matrix dimension of initial data frame (e.g here its dd). Any suggestion would be most appreciated.

CodePudding user response：

You can do:

library(dplyr)

dat %>%
  group_by(gene) %>%
  mutate(across(everything(), proportions)) %>% 
  ungroup()

# A tibble: 5 x 5
  gene  sample1 sample2 sample3 sample4
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>
1 aa      0.286   0.25    0.429   0.333
2 aa      0.143   0.625   0.286   0.167
3 aa      0.571   0.125   0.286   0.5  
4 bb      0.333   0.667   0.5     0.5  
5 bb      0.667   0.333   0.5     0.5

If you have missing values that you'd like to ignore, use:

dat %>%
  group_by(gene) %>%
  mutate(across(everything(),  ~ .x / sum(.x, na.rm = TRUE)))

Data:

dat <- structure(list(gene = c("aa", "aa", "aa", "bb", "bb"), sample1 = c(2, 
1, 4, 1, 2), sample2 = c(2, 5, 1, 2, 1), sample3 = c(3, 2, 2, 
1, 1), sample4 = c(2, 1, 3, 2, 2)), class = "data.frame", row.names = c(NA, 
-5L))

CodePudding user response：

Here is an option with data.table

> library(data.table)

> setDT(df)[,lapply(.SD,proportions),gene]
   gene   sample1   sample2   sample3   sample4
1:   aa 0.2857143 0.2500000 0.4285714 0.3333333
2:   aa 0.1428571 0.6250000 0.2857143 0.1666667
3:   aa 0.5714286 0.1250000 0.2857143 0.5000000
4:   bb 0.3333333 0.6666667 0.5000000 0.5000000
5:   bb 0.6666667 0.3333333 0.5000000 0.5000000