Home > database >  Problem using across in mutate in dplyr, where function to apply depends on another tibble
Problem using across in mutate in dplyr, where function to apply depends on another tibble

Time:04-19

I am trying to mutate certain columns of a tibble, where the specific function to use is named in another tibble. The setup code is below, after which I explain the issue.

library(tidyverse)
library(lubridate)

transforms <- list(
  "floor_date" = function(x) {
    floor_date(dmy(x), "month")
  },
  "integer" = function(x) {
    as.integer(gsub("[^[:digit:]]", "", x))
  }
)

data_meta <- tibble(
  datafield = letters[1:3], 
  transform_to = c("floor_date", "", "integer")
)
# A tibble: 3 x 2
# datafield transform_to
# <chr>     <chr>       
# 1 a       "floor_date"
# 2 b       ""          
# 3 c       "integer"  

data <- tibble(
  a = c("09/09/2021", "19/09/2021", "06/10/2021"),
  b = c("lorem", "ipsum", "dolor"),
  c = c("99 bottles", "98 bottles", "97 bottles")
)
# A tibble: 3 x 3
#   a          b     c         
#   <chr>      <chr> <chr>
# 1 09/09/2021 lorem 99 bottles
# 2 19/09/2021 ipsum 98 bottles
# 3 06/10/2021 dolor 97 bottles

The data_meta tibble contains the desired transformation function (if any) for each column of the data tibble. These transformation functions are in a named list, transforms. In order to focus on only those columns that need a transform, I define needs_transform:

needs_transform <- data_meta %>%
      filter(nchar(transform_to) > 0)
    # A tibble: 2 x 2
    #   datafield transform_to
    #   <chr>     <chr>       
    # 1 a         floor_date  
    # 2 c         integer

I now want to use mutate(across(...)) to apply the transformations. I find that the following gives the correct function, based on the column name:

transforms[[(needs_transform %>% filter(datafield == "a") %>% select(transform_to))[[1,1]]]]
# function(x) {
#   floor_date(dmy(x), "month")
# }

So I try the below using the cur_column() function to filter correctly:

clean_data <- data %>%
  mutate(across(
    needs_transform$datafield,
    ~ transforms[[(needs_transform %>%
                     filter(datafield == cur_column()) %>% select(transform_to))[[1,1]]]]
  ))
# Error in `mutate()`:
#   ! Problem while computing `..1 = across(...)`.
# Caused by error in `across()`:
#   ! Problem while computing column `a`.

Unfortunately this does not work and I am not sure why, even after inspecting the traceback (can provide it, it was not helpful tho).

My second attempt was to try wrapping the logic in a function (note the x arg does nothing but is required to be used in across):

get_transform <- function(x) {
  t <- (needs_transform %>%
          filter(datafield == cur_column()) %>%
          select(transform_to))[[1,1]]
  
  transforms[[t]]
}

clean_data <- data %>%
  mutate(across(
    needs_transform$datafield,
    get_transform
  ))
# Error in `mutate()`:
#   ! Problem while computing `..1 = across(needs_transform$datafield, get_transform)`.
# Caused by error in `across()`:
#   ! Problem while computing column `a`.

Almost the exact same error message. I have looked thru several threads on here and nothing quite matches what I am looking to do. Could anyone help to get this to work? Or is this not a great way to do it, is there a better way?

CodePudding user response:

One option to achieve your desired result may look like so:

library(tidyverse)
library(lubridate)

trans <- function(x, y) {
  fn_name <- data_meta %>%
    filter(datafield == y) %>%
    pull(transform_to)
  transforms[[fn_name]](x)
}

data %>%
  mutate(across(needs_transform$datafield, ~ trans(.x, cur_column())))
#> # A tibble: 3 × 3
#>   a          b         c
#>   <date>     <chr> <int>
#> 1 2021-09-01 lorem    99
#> 2 2021-09-01 ipsum    98
#> 3 2021-10-01 dolor    97

CodePudding user response:

You could also try:

data %>%
  mutate( across(needs_transform$datafield, 
        ~ transforms[[with(data_meta,
            transform_to[datafield == cur_column()])]](.x)))

 a          b         c
  <date>     <chr> <int>
1 2021-09-01 lorem    99
2 2021-09-01 ipsum    98
3 2021-10-01 dolor    97

or Even:

data %>%
   mutate( across(needs_transform$datafield, 
           ~.x %>% {transforms %>%
             getElement(data_meta %>%
             filter(datafield == cur_column())%>%
             pull(transform_to))}()))
  • Related