I'm looking for a more eloquent way to write R code for a kind of case that I've encountered more than once. Here is an example of the data and some code that accomplishes the result I want:
library(tidyverse)
df <- tibble(id = 1:5, primary_county = 101:105, secondary_county = 201:205)
specific_counties <- c(101, 103, 202, 205)
df |>
mutate(target_area =
primary_county %in% specific_counties | secondary_county %in% specific_counties)
The result is:
# A tibble: 5 × 4
id primary_county secondary_county target_area
<int> <int> <int> <lgl>
1 1 101 201 TRUE
2 2 102 202 TRUE
3 3 103 203 TRUE
4 4 104 204 FALSE
5 5 105 205 TRUE
I want to know if there is a way to get the same result using code that would be more succinct and eloquent if I were dealing with more columns of the "..._county" variety. Specifically, in my code above, the expression %in% specific_counties
must be repeated with an |
for each extra column I want to handle. Is there a way to not have to repeat so many lines of code?
CodePudding user response:
This allows a little over what you have, not sure how "eloquent" I'd call it:
df %>%
mutate(
target_area = rowSums(
sapply(select(cur_data(), matches("_county")),
`%in%`, specific_counties)) > 0
)
# # A tibble: 5 x 4
# id primary_county secondary_county target_area
# <int> <int> <int> <lgl>
# 1 1 101 201 TRUE
# 2 2 102 202 TRUE
# 3 3 103 203 TRUE
# 4 4 104 204 FALSE
# 5 5 105 205 TRUE
Or you can list the columns explicitly, replacing the select(.., matches(..))
with list(primary_county, secondary_county)
.
Add as many columns to the list(..)
as you want.
CodePudding user response:
I would use across()
to select the columns, and pmap
inside mutate()
to create the desired column. The key would be to use c(...)
as an argument inside any(c(...) %in% index)
library(dplyr)
library(purrr)
df %>%
mutate(target_area = pmap_lgl(across(ends_with('county')),
~any(c(...) %in% specific_counties)))
# A tibble: 5 × 4
id primary_county secondary_county target_area
<int> <int> <int> <lgl>
1 1 101 201 TRUE
2 2 102 202 TRUE
3 3 103 203 TRUE
4 4 104 204 FALSE
5 5 105 205 TRUE
using dplyr::select()
instead of list()
may be more generalizable to other use cases.