Home > Software engineering >  scalable case_when case combining any two of situation
scalable case_when case combining any two of situation

Time:12-02

I am looking for a scalable version of the following.

Given:

library(tidyverse)
df <- data.frame(a = c(1, 2, 3, 4, 5, 3, 1),
                 b = c(6, 7, 8, 9, 10, 8, 9),
                 c = c(15, 12, 9, 2, 13, 9, 10))
df
#   a  b  c
# 1 1  6 15
# 2 2  7 12
# 3 3  8  9
# 4 4  9  2
# 5 5 10 13
# 6 3  8  9
# 7 1  9 10

I want to create an indicator variable if any two combinations of the following criteria are valid:

a > 3, b > 8, c > 10

One option to do this manually is to use case_when:

df %>% 
  mutate(indicator_variable = case_when(
    a > 3 & b > 8 ~ 1,
    a > 3 & c > 10 ~ 1,
    b > 8 & c > 10 ~ 1,
    TRUE ~ 0
  ))

#   a  b  c indicator_variable
# 1 1  6 15                  0
# 2 2  7 12                  0
# 3 3  8  9                  0
# 4 4  9  2                  1
# 5 5 10 13                  1
# 6 3  8  9                  0
# 7 1  9 10                  0

This works but obviously gets more tricky the more variables considered (choose(4, 2)), change criteria to any two of four variables:

a > 3, b > 8, c > 10, d >= 5

df2 <- data.frame(a = c(1, 2, 3, 4, 5, 3, 1),
                 b = c(6, 7, 8, 9, 10, 8, 9),
                 c = c(15, 12, 9, 2, 13, 9, 10),
                 d = c(1, 2, 3, 4, 5, 6, 5))
df2
#   a  b  c d
# 1 1  6 15 1
# 2 2  7 12 2
# 3 3  8  9 3
# 4 4  9  2 4
# 5 5 10 13 5
# 6 3  8  9 6
# 7 1  9 10 5

df2 %>% 
  mutate(indicator_variable = case_when(
    a > 3 & b > 8 ~ 1,
    a > 3 & c > 10 ~ 1,
    a > 3 & d >= 5 ~ 1,
    b > 8 & c > 10 ~ 1,
    b > 8 & d >= 5 ~ 1,
    c > 10 & d >= 5 ~ 1,
    TRUE ~ 0
  ))
#   a  b  c d indicator_variable
# 1 1  6 15 1                  0
# 2 2  7 12 2                  0
# 3 3  8  9 3                  0
# 4 4  9  2 4                  1
# 5 5 10 13 5                  1
# 6 3  8  9 6                  0
# 7 1  9 10 5                  1

etc.

Best approaches? Doesn't have to involve case_when, maybe combn could be used?

thanks

CodePudding user response:

No need to use case_when here, just make use of the fact that TRUE converts to 1 in arithmetic operations and FALSE to 0.

library(dplyr)
df <- tibble(a = c(1, 2, 3, 4, 5, 3, 1),
             b = c(6, 7, 8, 9, 10, 8, 9),
             c = c(15, 12, 9, 2, 13, 9, 10))
df %>%
  mutate(indicator = as.numeric(((a > 3)   (b > 8)   (c > 10)) >= 2))
#> # A tibble: 7 × 4
#>       a     b     c indicator
#>   <dbl> <dbl> <dbl>     <dbl>
#> 1     1     6    15         0
#> 2     2     7    12         0
#> 3     3     8     9         0
#> 4     4     9     2         1
#> 5     5    10    13         1
#> 6     3     8     9         0
#> 7     1     9    10         0

CodePudding user response:

Create a function that takes a input for operator, value and dataset

f1 <- function(dat, val, op) op(dat, val)

Then, create two lists` with values and operator (if the operators are same, we don't need it though)

lst1 <- list(a = 3, b = 8, c = 10, d = 5)
lst2 <- list(a = `>`, b = `>`, c = `>`, d = `>=`)

Now, we do the combn with Reduce (using the second data 'df2')

df2$indicator_variable <-  (Reduce(`|`, combn(lst1, 2, function(x) 
     Reduce(`&`, Map(f1, dat = df2[names(x)], val = lst1[names(x)], 
      op = lst2[names(x)])), simplify = FALSE)))

-output

> df2
  a  b  c d indicator_variable
1 1  6 15 1                  0
2 2  7 12 2                  0
3 3  8  9 3                  0
4 4  9  2 4                  1
5 5 10 13 5                  1
6 3  8  9 6                  0
7 1  9 10 5                  1

Or if we want to update the first data

lst1 <- list(a = 3, b = 8, c = 10)
lst2 <- list(a = `>`, b = `>`, c = `>`)
df$indicator_variable <-  (Reduce(`|`, combn(lst1, 2, function(x) 
     Reduce(`&`, Map(f1, dat = df[names(x)], val = lst1[names(x)], 
      op = lst2[names(x)])), simplify = FALSE)))
> df
  a  b  c indicator_variable
1 1  6 15                  0
2 2  7 12                  0
3 3  8  9                  0
4 4  9  2                  1
5 5 10 13                  1
6 3  8  9                  0
7 1  9 10                  0
  • Related