Home > OS >  dplyr mutate and purrr map: use data masking to select columns for map
dplyr mutate and purrr map: use data masking to select columns for map

Time:01-27

In a dplyr mutate context, I would like to select the column a function is applied to by purrr:map using the value of another column.

Let's take a test data frame

test <- data.frame(a = c(1,2), b = c(3,4), selector = c("a","b"))

I want to apply following function

calc <- function(col)
{res <- col ^ 2
return(res)
}

I am trying something like this:

test_2 <- test %>% mutate(quad = map(.data[[selector]], ~ calc(.x)))

My expected result would be:

  a b selector quad
1 1 3        a    1
2 2 4        b   16

but I get

Error in local_error_context(dots = dots, .index = i, mask = mask) : 
  promise already under evaluation: recursive default argument reference or earlier problems?

I know .data[[var]] is supposed to be used only in special context of function programming, but also if I wrap this in functions or similar I cannot get it done. Trying to use tidy-selection gives the error that selecting helpers can only be used in special dplyr verbs, not functions like purrr:map.

how to use dynamic variable in purrr map within dplyr hinted me to use get() and anonymous functions, but this also did not work in this context.

CodePudding user response:

Here's one way:

test %>% 
  mutate(quad = map(seq_along(selector), ~ calc(test[[selector[.x]]])[.x]))

#   a b selector quad
# 1 1 3        a    1
# 2 2 4        b   16

Instead of .data, you can also cur_data (which accounts for grouping):

test %>% 
  mutate(quad = map(seq(selector), ~ calc(cur_data()[[selector[.x]]])[.x]))

Or, with diag:

test %>% 
  mutate(quad = diag(as.matrix(calc(cur_data()[selector]))))

#  a b selector quad
#1 1 3        a    1
#2 2 4        b   16

CodePudding user response:

You could also change the function to return a single number and use purrr:

calc <- function(col, id) {test[[col]][[id]]^2}

test %>% 
    mutate(
        quad = purrr::map2_dbl(selector, row_number(), calc)
    )
  a b selector quad
1 1 3        a    1
2 2 4        b   16

CodePudding user response:

Not quite what you asked for but an alternative might be to restructure the data so that the calculation is easier:

test %>% 
   pivot_longer(
       cols = c(a, b)
   ) %>% 
   filter(name == selector) %>% 
   mutate(quad = value**2)

# A tibble: 2 × 4
  selector name  value  quad
  <chr>    <chr> <dbl> <dbl>
1 a        a         1     1
2 b        b         4    16

You can join the results back onto the original data using an id column.

CodePudding user response:

You can use rowwise() and get() the selector variable:

library(dplyr)

test %>%
  rowwise() %>%
  mutate(quad = calc(get(selector))) %>%
  ungroup()

# A tibble: 2 × 4
      a     b selector  quad
  <dbl> <dbl> <chr>    <dbl>
1     1     3 a            1
2     2     4 b           16

Or if the selector repeats, group_by() will be more efficient:

test <- data.frame(a = c(1,2,5), b = c(3,4,6), selector = c("a","b","a"))

test %>%
  group_by(selector) %>%
  mutate(quad = calc(get(selector[1]))) %>%
  ungroup()

# A tibble: 3 × 4
      a     b selector  quad
  <dbl> <dbl> <chr>    <dbl>
1     1     3 a            1
2     2     4 b           16
3     5     6 a           25
  • Related