Let's say I've got a dataframe with multiple columns, some of which I want to transform. The column names define what transformation needs to be used.
library(tidyverse)
set.seed(42)
df <- data.frame(A = 1:100, B = runif(n = 100, 0, 1), log10 = runif(n = 100, 10, 100), log2 = runif(n = 100, 10, 100), log1p = runif(n = 100, 10, 100), sqrt = runif(n = 100, 10, 100))
trans <- list()
trans$log10 <- log10
trans$log2 <- log2
trans$log1p <- log1p
trans$sqrt <- sqrt
Ideally, I would like to use an across
call where the column names were matched up with the trans function names and the transformations would be performed on the fly.
The desired output is the following:
df_trans <- df %>%
dplyr::mutate(log10 = trans$log10(log10),
log2 = trans$log2(log2),
log1p = trans$log1p(log1p),
sqrt = trans$sqrt(sqrt))
df_trans
However, I don't want to manually specify each transformation separately. In the representative example I only have 4 but this number could vary and be significantly higher making manual specification cumbersome and error prone.
I have managed to match up the column names with the functions by turning the trans list into a data frame and left-joining but am then unable to call the function in the trans_function
column.
trans_df <- enframe(trans, value = "trans_function")
df %>%
pivot_longer(cols = everything()) %>%
left_join(trans_df) %>%
dplyr::mutate(value = trans_function(value))
Error: Problem with
mutate()
columnvalue
.
ivalue = trans_function(value)
.
x could not find function "trans_function"
I think I either need to find a way of calling the functions from the list columns or another way of matching up the function names with the column names. All ideas are welcome.
CodePudding user response:
We can use cur_column()
in across
to get the column name and use it to subset trans
.
library(dplyr)
df %>%
mutate(across(names(trans), ~trans[[cur_column()]](.x))) %>%
head
# A B log10 log2 log1p sqrt
#1 1 0.9148060 1.821920 6.486402 3.998918 3.470303
#2 2 0.9370754 1.470472 5.821200 3.932046 7.496103
#3 3 0.2861395 1.469690 6.437524 2.799395 8.171007
#4 4 0.8304476 1.653261 5.639570 3.700698 6.905755
#5 5 0.6417455 1.976905 4.597484 4.500461 9.441077
#6 6 0.5190959 1.985133 5.638341 4.551289 4.440590
Comparing it with output of df_trans
.
head(df_trans)
# A B log10 log2 log1p sqrt
#1 1 0.9148060 1.821920 6.486402 3.998918 3.470303
#2 2 0.9370754 1.470472 5.821200 3.932046 7.496103
#3 3 0.2861395 1.469690 6.437524 2.799395 8.171007
#4 4 0.8304476 1.653261 5.639570 3.700698 6.905755
#5 5 0.6417455 1.976905 4.597484 4.500461 9.441077
#6 6 0.5190959 1.985133 5.638341 4.551289 4.440590
CodePudding user response:
One way can be to use lapply:
library(tidyverse)
set.seed(42)
df <- data.frame(A = 1:100, B = runif(n = 100, 0, 1), log10 = runif(n = 100, 10, 100), log2 = runif(n = 100, 10, 100), log1p = runif(n = 100, 10, 100), sqrt = runif(n = 100, 10, 100))
trans <- list()
trans$log10 <- log10
trans$log2 <- log2
trans$log1p <- log1p
trans$sqrt <- sqrt
df_trans <- setNames(lapply(names(df),
function(x) if(x %in% names(trans))
{ trans[[x]](df[,(x)])} else {df[,x]}),names(df)) %>%
bind_cols() %>%
as.data.frame()
head(df_trans)
which gives:
A B log10 log2 log1p sqrt
1 1 0.1365052 1.739051 6.301896 4.530600 4.318942
2 2 0.1771364 1.549601 5.793220 4.521715 3.649834
3 3 0.5195605 1.902438 4.819125 3.343266 6.788565
4 4 0.8111208 1.572253 6.219991 4.075945 3.322401
5 5 0.1153620 1.751276 6.306097 4.060292 7.817301
6 6 0.8934218 1.724403 6.201123 3.235938 9.749128
The original dataframe being:
head(df)
A B log10 log2 log1p sqrt
1 1 0.1365052 54.83409 78.89684 91.81428 18.65326
2 2 0.1771364 35.44878 55.45401 90.99323 13.32129
3 3 0.5195605 79.88006 28.22936 27.31143 46.08461
4 4 0.8111208 37.34675 74.54249 57.90612 11.03835
5 5 0.1153620 56.39961 79.12693 56.99123 61.11019
6 6 0.8934218 53.01557 73.57393 24.43022 95.04549