How to divide the values of multiple columns by the values of a single column of a dataset in R-CodePudding

I have a dataset similar to the following that I am showing you

gene_name	gene_length	value1	value2	value3
NameA	1070	100	300	600
NameB	110	200	600	1200

My goal is to create new columns with the results of the division of the values that are in the columns value1, value2, value3.... value-n by the values that are in the gene_length column.

Something like this:

gene_name	gene_length	value1	value2	value3	value1_result	value2_result	value3_result
NameA	1070	100	300	600	0.0934	0.2803	0.5607
NameB	110	200	600	1200	1.8181	5.4545	10.9090

I could apply several mutate functions in R with few columns and rows, but the problem is that my dataset has more than 50 thousand rows and 21 columns.

How could this task be accomplished using the tidyverse more efficiently?

I have read that I could use the mutate function in conjunction with the across function, however I have not been able to get them to work together.

desired_df <- df %>% 
  mutate(across(.cols = 3:21, # 21 because of the 21 columns i have in my dataframe
                # I need to specify a function to perform the division in the columns i want 
                # but i dont know how
                .names = '{col}_value')) # names of new columns

CodePudding user response：

Loop across the 'value' columns, create a lambda function (~) to divide the column (.x) by the 'gene_length' and modify the .names to create new columns with _result as suffix

library(dplyr)
desired_df <- df %>% 
  mutate(across(.cols = starts_with("value"), ~ .x/gene_length,
                .names = '{.col}_result'))

-output

> desired_df
  gene_name gene_length value1 value2 value3 value1_result value2_result value3_result
1     NameA        1070    100    300    600    0.09345794     0.2803738     0.5607477
2     NameB         110    200    600   1200    1.81818182     5.4545455    10.9090909

Or using data.table

library(data.table)
 nm1 <- grep("value", names(df), value = TRUE)
 setDT(df)[, paste0(nm1, "_result") := lapply(.SD, \(x) 
       x/gene_length), .SDcols = nm1]

data

df <- structure(list(gene_name = c("NameA", "NameB"), gene_length = c(1070L, 
110L), value1 = c(100L, 200L), value2 = c(300L, 600L), value3 = c(600L, 
1200L)), class = "data.frame", row.names = c(NA, -2L))