Home > other >  Iteration in R using tidyverse
Iteration in R using tidyverse

Time:04-23

I am trying to avoid using a for loop and instead use tidyverse for iteration. Specifically, I have a vector of values that I want to loop through a single variable from a data frame to create new variables with a prefix. I've tried using dplyr::across but am unsuccessful when the vector length is >1

Sample code:

library(tidyverse)
library(glue)

data <- data.frame(id = 1:10, 
                   y = letters[1:10], 
                   z = LETTERS[1:10])
letter_list <- letters[1:10]

var_naming <- function(dat, list){
  dat %>%
    mutate(!!glue("hx_{list}") := ifelse(y == {list}, 1, 0))
}

Code I've tried:

**the correct dimensions of the data frame should be 13 variables and 10 observations**

# data_b outputs the correct number of observations but has 40 variables
data_b <- map(letter_list, 
             ~var_naming(data, .x)) %>%
  as.data.frame()

# data_c gives me the correct number of variables but has 100 observations
data_c <- map_df(letter_list,
                 ~var_naming(data, .x))

# error message from data_d when using dplyr::across:
>> Error in `mutate()`:
>> ! Problem while computing `..1 =
  >> across(...)`.
>> Caused by error in `across()`:
>> ! All unnamed arguments must be length 1
>> Run `rlang::last_error()` to see where the error occurred.

data_d <- data %>%
  mutate(
    across(
      .cols  = y, 
      .fns   = ~ifelse(y == {letter_list}, 1, 0),
      .names = glue("hx_{letter_list}")
  ))
Desired output:
id y     z      hx_a  hx_b  hx_c  hx_d  hx_e  hx_f  hx_g  hx_h  hx_i  hx_j

1  a     A         1     0     0     0     0     0     0     0     0     0
2  b     B         0     1     0     0     0     0     0     0     0     0
3  c     C         0     0     1     0     0     0     0     0     0     0
4  d     D         0     0     0     1     0     0     0     0     0     0
5  e     E         0     0     0     0     1     0     0     0     0     0
6  f     F         0     0     0     0     0     1     0     0     0     0
7  g     G         0     0     0     0     0     0     1     0     0     0
8  h     H         0     0     0     0     0     0     0     1     0     0
9  i     I         0     0     0     0     0     0     0     0     1     0
10 j     J         0     0     0     0     0     0     0     0     0     1

CodePudding user response:

You were close with the mutate call, but what you ultimately want is a list of functions (one for each letter in letter_list) to pass to .fns. Since they are anonymous functions, name them the same as letter_list to help with across naming the columns

myFxs <- map(letter_list, ~function(y) ifelse(y == .x, 1, 0)) %>% 
  setNames(letter_list)

For whatever reason, .names seemed to be having a problem with the glue character vector (for me anyway). Since the functions are named for the letters they are matching against you can use the .fn pronoun to instead to pass a template to across

data %>%
  mutate(
    across(
      .cols  = y, 
      .fns   = myFxs,
      .names = "hx_{.fn}"
    )
  )

CodePudding user response:

The code can be modified

  1. Remove the {} around the list on the rhs of :=
  2. It may be better to use transmute instead of mutate as mutate returns the whole data by default.
  3. Once we get the column binded (_dfc) data from map, bind the original data with bind_cols
library(dplyr)
library(purrr)
var_naming <- function(dat, list){
  dat %>%
    transmute(!!glue::glue('hx_{list}') := ifelse(y == list, 1, 0))
}

NOTE: list is a base R function to construct list data structure. It may be better to create functions with argument names different than the reserved words or function names already existing. -testing

map_dfc(letter_list, var_naming, dat = data) %>% 
   bind_cols(data, .)

-output

   id y z hx_a hx_b hx_c hx_d hx_e hx_f hx_g hx_h hx_i hx_j
1   1 a A    1    0    0    0    0    0    0    0    0    0
2   2 b B    0    1    0    0    0    0    0    0    0    0
3   3 c C    0    0    1    0    0    0    0    0    0    0
4   4 d D    0    0    0    1    0    0    0    0    0    0
5   5 e E    0    0    0    0    1    0    0    0    0    0
6   6 f F    0    0    0    0    0    1    0    0    0    0
7   7 g G    0    0    0    0    0    0    1    0    0    0
8   8 h H    0    0    0    0    0    0    0    1    0    0
9   9 i I    0    0    0    0    0    0    0    0    1    0
10 10 j J    0    0    0    0    0    0    0    0    0    1

CodePudding user response:

Anotehr way to get the same results:

data %>%
  cbind(model.matrix(~y   0, .)) %>%
  rename_with(~str_replace(., 'y\\B', 'hx_'))

   id y z hx_a hx_b hx_c hx_d hx_e hx_f hx_g hx_h hx_i hx_j
1   1 a A    1    0    0    0    0    0    0    0    0    0
2   2 b B    0    1    0    0    0    0    0    0    0    0
3   3 c C    0    0    1    0    0    0    0    0    0    0
4   4 d D    0    0    0    1    0    0    0    0    0    0
5   5 e E    0    0    0    0    1    0    0    0    0    0
6   6 f F    0    0    0    0    0    1    0    0    0    0
7   7 g G    0    0    0    0    0    0    1    0    0    0
8   8 h H    0    0    0    0    0    0    0    1    0    0
9   9 i I    0    0    0    0    0    0    0    0    1    0
10 10 j J    0    0    0    0    0    0    0    0    0    1

If you only consider those in letters_list:

data %>%
  mutate( y =factor(y, letter_list)) %>%
  cbind(model.matrix(~y   0, .) %>%
  as_tibble() %>%
  select(paste0('y', letter_list)) %>%
  rename_with(~str_replace(., 'y', 'hx_')))
  • Related