dplyr mutate and purrr map: use data masking to select columns for map-CodePudding

In a dplyr mutate context, I would like to select the column a function is applied to by purrr:map using the value of another column.

Let's take a test data frame

test <- data.frame(a = c(1,2), b = c(3,4), selector = c("a","b"))

I want to apply following function

calc <- function(col)
{res <- col ^ 2
return(res)
}

I am trying something like this:

test_2 <- test %>% mutate(quad = map(.data[[selector]], ~ calc(.x)))

My expected result would be:

  a b selector quad
1 1 3        a    1
2 2 4        b   16

but I get

Error in local_error_context(dots = dots, .index = i, mask = mask) : 
  promise already under evaluation: recursive default argument reference or earlier problems?

I know .data[[var]] is supposed to be used only in special context of function programming, but also if I wrap this in functions or similar I cannot get it done. Trying to use tidy-selection gives the error that selecting helpers can only be used in special dplyr verbs, not functions like purrr:map.

how to use dynamic variable in purrr map within dplyr hinted me to use get() and anonymous functions, but this also did not work in this context.

CodePudding user response：

Here's one way:

test %>% 
  mutate(quad = map(seq_along(selector), ~ calc(test[[selector[.x]]])[.x]))

#   a b selector quad
# 1 1 3        a    1
# 2 2 4        b   16

Instead of .data, you can also cur_data (which accounts for grouping):

test %>% 
  mutate(quad = map(seq(selector), ~ calc(cur_data()[[selector[.x]]])[.x]))

Or, with diag:

test %>% 
  mutate(quad = diag(as.matrix(calc(cur_data()[selector]))))

#  a b selector quad
#1 1 3        a    1
#2 2 4        b   16

CodePudding user response：

You could also change the function to return a single number and use purrr:

calc <- function(col, id) {test[[col]][[id]]^2}

test %>% 
    mutate(
        quad = purrr::map2_dbl(selector, row_number(), calc)
    )
  a b selector quad
1 1 3        a    1
2 2 4        b   16

CodePudding user response：

Not quite what you asked for but an alternative might be to restructure the data so that the calculation is easier:

test %>% 
   pivot_longer(
       cols = c(a, b)
   ) %>% 
   filter(name == selector) %>% 
   mutate(quad = value**2)

# A tibble: 2 × 4
  selector name  value  quad
  <chr>    <chr> <dbl> <dbl>
1 a        a         1     1
2 b        b         4    16

You can join the results back onto the original data using an id column.

CodePudding user response：

You can use rowwise() and get() the selector variable:

library(dplyr)

test %>%
  rowwise() %>%
  mutate(quad = calc(get(selector))) %>%
  ungroup()

# A tibble: 2 × 4
      a     b selector  quad
  <dbl> <dbl> <chr>    <dbl>
1     1     3 a            1
2     2     4 b           16

Or if the selector repeats, group_by() will be more efficient:

test <- data.frame(a = c(1,2,5), b = c(3,4,6), selector = c("a","b","a"))

test %>%
  group_by(selector) %>%
  mutate(quad = calc(get(selector[1]))) %>%
  ungroup()

# A tibble: 3 × 4
      a     b selector  quad
  <dbl> <dbl> <chr>    <dbl>
1     1     3 a            1
2     2     4 b           16
3     5     6 a           25