I'm trying to create columns of dichotomous variables based on presence (or absence) of selected continuous variables.
Example:
library(tidyverse)
df <- tibble(z = c(0, 0), a_1 = c(.1, NA), a_2 = c(NA, .1))
out <- tibble(z = c(0, 0),
a_1 = c(.1, NA),
a_2 = c(NA, .1),
a_1_d = c(1, 0),
a_2_d = c(0, 1))
I can do this on an ad hoc basis using mutate
:
out <- df %>%
mutate(a_1_d = if_else(is.na(a_1), 0, 1)) %>%
mutate(a_2_d = if_else(is.na(a_2), 0, 1))
But my real use case involves a lot of variables, so I'd like to use purrr
and dplyr::select
. I've tried a bunch of approaches, such as:
out <- df %>%
select(starts_with("a_")) %>%
map(.x, .f = mutate({{.x}}_d =
if_else(is.na(.x), 0, 1)))
But I think I'm missing something fundamental about some combination of name assignment within map
and passing variables to map
. What is the most efficient way to get from df
to out
using a purrr
function and dplyr::select
?
CodePudding user response:
How do you feel about mutate()
with across()
? That seems like a good tool for this sort of problem.
You can choose which columns to work "across" with tidy select functions just like in select()
. We then give the function we want to use on each column. You'll see I used as.numeric()
on the logical output of "not NA" (!is.na
) to 0/1 but you could absolutely use if_else()
here, as well. I use the purrr-style lambda in the function (i.e., ~).
To add a suffix to new columns to be added to the dataset I use a named list for .fns
.
mutate(df, across(.cols = starts_with("a"),
.fns = list(d = ~as.numeric(!is.na(.x)))))
#> # A tibble: 2 x 5
#> z a_1 a_2 a_1_d a_2_d
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0.1 NA 1 0
#> 2 0 NA 0.1 0 1
Created on 2021-11-03 by the reprex package (v2.0.0)