Home > Net >  Succinctly write logical expression in R when using multiple variables in a df
Succinctly write logical expression in R when using multiple variables in a df

Time:09-29

I'm looking for a more eloquent way to write R code for a kind of case that I've encountered more than once. Here is an example of the data and some code that accomplishes the result I want:

library(tidyverse)

df <- tibble(id = 1:5, primary_county = 101:105, secondary_county = 201:205)

specific_counties <- c(101, 103, 202, 205)

df |> 
  mutate(target_area = 
           primary_county %in% specific_counties | secondary_county %in% specific_counties)

The result is:

    # A tibble: 5 × 4
         id primary_county secondary_county target_area
      <int>          <int>            <int> <lgl>      
    1     1            101              201 TRUE       
    2     2            102              202 TRUE       
    3     3            103              203 TRUE       
    4     4            104              204 FALSE      
    5     5            105              205 TRUE  
     

I want to know if there is a way to get the same result using code that would be more succinct and eloquent if I were dealing with more columns of the "..._county" variety. Specifically, in my code above, the expression %in% specific_counties must be repeated with an | for each extra column I want to handle. Is there a way to not have to repeat so many lines of code?

CodePudding user response:

This allows a little over what you have, not sure how "eloquent" I'd call it:

df %>%
  mutate(
    target_area = rowSums(
      sapply(select(cur_data(), matches("_county")),
             `%in%`, specific_counties)) > 0
  )
# # A tibble: 5 x 4
#      id primary_county secondary_county target_area
#   <int>          <int>            <int> <lgl>      
# 1     1            101              201 TRUE       
# 2     2            102              202 TRUE       
# 3     3            103              203 TRUE       
# 4     4            104              204 FALSE      
# 5     5            105              205 TRUE       

Or you can list the columns explicitly, replacing the select(.., matches(..)) with list(primary_county, secondary_county).

Add as many columns to the list(..) as you want.

CodePudding user response:

I would use across() to select the columns, and pmap inside mutate() to create the desired column. The key would be to use c(...) as an argument inside any(c(...) %in% index)

library(dplyr)
library(purrr)

df %>%
    mutate(target_area = pmap_lgl(across(ends_with('county')),
                                  ~any(c(...) %in% specific_counties)))

# A tibble: 5 × 4
     id primary_county secondary_county target_area
  <int>          <int>            <int> <lgl>      
1     1            101              201 TRUE       
2     2            102              202 TRUE       
3     3            103              203 TRUE       
4     4            104              204 FALSE      
5     5            105              205 TRUE   

using dplyr::select() instead of list() may be more generalizable to other use cases.

  • Related