Home > Mobile >  How to divide the values of multiple columns by the values of a single column of a dataset in R
How to divide the values of multiple columns by the values of a single column of a dataset in R

Time:12-05

I have a dataset similar to the following that I am showing you

gene_name gene_length value1 value2 value3
NameA 1070 100 300 600
NameB 110 200 600 1200

My goal is to create new columns with the results of the division of the values that are in the columns value1, value2, value3.... value-n by the values that are in the gene_length column.

Something like this:

gene_name gene_length value1 value2 value3 value1_result value2_result value3_result
NameA 1070 100 300 600 0.0934 0.2803 0.5607
NameB 110 200 600 1200 1.8181 5.4545 10.9090

I could apply several mutate functions in R with few columns and rows, but the problem is that my dataset has more than 50 thousand rows and 21 columns.

How could this task be accomplished using the tidyverse more efficiently?

I have read that I could use the mutate function in conjunction with the across function, however I have not been able to get them to work together.

desired_df <- df %>% 
  mutate(across(.cols = 3:21, # 21 because of the 21 columns i have in my dataframe
                # I need to specify a function to perform the division in the columns i want 
                # but i dont know how
                .names = '{col}_value')) # names of new columns

CodePudding user response:

Loop across the 'value' columns, create a lambda function (~) to divide the column (.x) by the 'gene_length' and modify the .names to create new columns with _result as suffix

library(dplyr)
desired_df <- df %>% 
  mutate(across(.cols = starts_with("value"), ~ .x/gene_length,
                .names = '{.col}_result')) 

-output

> desired_df
  gene_name gene_length value1 value2 value3 value1_result value2_result value3_result
1     NameA        1070    100    300    600    0.09345794     0.2803738     0.5607477
2     NameB         110    200    600   1200    1.81818182     5.4545455    10.9090909

Or using data.table

library(data.table)
 nm1 <- grep("value", names(df), value = TRUE)
 setDT(df)[, paste0(nm1, "_result") := lapply(.SD, \(x) 
       x/gene_length), .SDcols = nm1]

data

df <- structure(list(gene_name = c("NameA", "NameB"), gene_length = c(1070L, 
110L), value1 = c(100L, 200L), value2 = c(300L, 600L), value3 = c(600L, 
1200L)), class = "data.frame", row.names = c(NA, -2L))
  • Related