I am trying to mutate certain columns of a tibble, where the specific function to use is named in another tibble. The setup code is below, after which I explain the issue.
library(tidyverse)
library(lubridate)
transforms <- list(
"floor_date" = function(x) {
floor_date(dmy(x), "month")
},
"integer" = function(x) {
as.integer(gsub("[^[:digit:]]", "", x))
}
)
data_meta <- tibble(
datafield = letters[1:3],
transform_to = c("floor_date", "", "integer")
)
# A tibble: 3 x 2
# datafield transform_to
# <chr> <chr>
# 1 a "floor_date"
# 2 b ""
# 3 c "integer"
data <- tibble(
a = c("09/09/2021", "19/09/2021", "06/10/2021"),
b = c("lorem", "ipsum", "dolor"),
c = c("99 bottles", "98 bottles", "97 bottles")
)
# A tibble: 3 x 3
# a b c
# <chr> <chr> <chr>
# 1 09/09/2021 lorem 99 bottles
# 2 19/09/2021 ipsum 98 bottles
# 3 06/10/2021 dolor 97 bottles
The data_meta
tibble contains the desired transformation function (if any) for each column of the data
tibble. These transformation functions are in a named list, transforms
. In order to focus on only those columns that need a transform, I define needs_transform
:
needs_transform <- data_meta %>%
filter(nchar(transform_to) > 0)
# A tibble: 2 x 2
# datafield transform_to
# <chr> <chr>
# 1 a floor_date
# 2 c integer
I now want to use mutate(across(...))
to apply the transformations. I find that the following gives the correct function, based on the column name:
transforms[[(needs_transform %>% filter(datafield == "a") %>% select(transform_to))[[1,1]]]]
# function(x) {
# floor_date(dmy(x), "month")
# }
So I try the below using the cur_column()
function to filter correctly:
clean_data <- data %>%
mutate(across(
needs_transform$datafield,
~ transforms[[(needs_transform %>%
filter(datafield == cur_column()) %>% select(transform_to))[[1,1]]]]
))
# Error in `mutate()`:
# ! Problem while computing `..1 = across(...)`.
# Caused by error in `across()`:
# ! Problem while computing column `a`.
Unfortunately this does not work and I am not sure why, even after inspecting the traceback (can provide it, it was not helpful tho).
My second attempt was to try wrapping the logic in a function (note the x
arg does nothing but is required to be used in across
):
get_transform <- function(x) {
t <- (needs_transform %>%
filter(datafield == cur_column()) %>%
select(transform_to))[[1,1]]
transforms[[t]]
}
clean_data <- data %>%
mutate(across(
needs_transform$datafield,
get_transform
))
# Error in `mutate()`:
# ! Problem while computing `..1 = across(needs_transform$datafield, get_transform)`.
# Caused by error in `across()`:
# ! Problem while computing column `a`.
Almost the exact same error message. I have looked thru several threads on here and nothing quite matches what I am looking to do. Could anyone help to get this to work? Or is this not a great way to do it, is there a better way?
CodePudding user response:
One option to achieve your desired result may look like so:
library(tidyverse)
library(lubridate)
trans <- function(x, y) {
fn_name <- data_meta %>%
filter(datafield == y) %>%
pull(transform_to)
transforms[[fn_name]](x)
}
data %>%
mutate(across(needs_transform$datafield, ~ trans(.x, cur_column())))
#> # A tibble: 3 × 3
#> a b c
#> <date> <chr> <int>
#> 1 2021-09-01 lorem 99
#> 2 2021-09-01 ipsum 98
#> 3 2021-10-01 dolor 97
CodePudding user response:
You could also try:
data %>%
mutate( across(needs_transform$datafield,
~ transforms[[with(data_meta,
transform_to[datafield == cur_column()])]](.x)))
a b c
<date> <chr> <int>
1 2021-09-01 lorem 99
2 2021-09-01 ipsum 98
3 2021-10-01 dolor 97
or Even:
data %>%
mutate( across(needs_transform$datafield,
~.x %>% {transforms %>%
getElement(data_meta %>%
filter(datafield == cur_column())%>%
pull(transform_to))}()))