Home > OS >  Conditionally Joining One of Two Tables
Conditionally Joining One of Two Tables

Time:03-03

I am trying to join one of two tables to a third table based on a logical evaluation. In this highly simplified example, I would like result to end up with the values from df2 since in this case option == 2.

Example Data:

df1 <- tibble(index = 1:5)
df2 <- tibble(index = 1:5, value = 1:5)
df3 <- tibble(index = 1:5, value = 10*1:5)

option <- 2

What I've tried:

# Method 1
result <- df1 %>%
  left_join(if_else(option == 2, df2, df3), by = "index")

# Error: Can't use NA as column index with `[` at position 1.

# Method 2
result <- df1 %>%
  if (option == 2){
    left_join(df2, by = "index")
  } else {
    left_join(df3, by = "index")
  }

# Error in if (.) option == 2 else { : 
# argument is not interpretable as logical

# Method 3
result <- if_else(option == 2, 
              df1 %>% left_join(df2, by = "index"),
              df1 %>% left_join(df3, by = "index")
          )

# Error: Can't use NA as column index with `[` at position 1.

Is there a way to do something like this?

CodePudding user response:

You can't pipe directory into an if statement. You could pipe in a block and then maggrittr defines the . variable for the data that was passed in. So you could so

result <- df1 %>% {
  if (option == 2){
    left_join(., df2, by = "index")
  } else {
    left_join(., df3, by = "index")
  }
}

But even better it probably to keep your other data.frames in a list and then index into the list with your option value

others <- list( df2, df3)
result <- df1 %>% left_join(others[[option]], by = "index")

You can also use names with this method

others <- list(aa=df2, bb=df3)
option <- "aa"
result <- df1 %>% left_join(others[[aa]], by = "index")

You can't use if_else because that's a vectorized function, not a control flow function. It's not meant to return objects that are a different size that the input condition.

CodePudding user response:

Another option below. I agree having a named list and choosing a name rather than numeric option is probably better though.

library(dplyr, warn.conflicts = F)

df1 <- tibble(index = 1:5)
df2 <- tibble(index = 1:5, value = 1:5)
df3 <- tibble(index = 1:5, value = 10*1:5)

option <- 2

df1 %>%
  left_join(if (option == 2) df2 else df3, by = "index")
#> # A tibble: 5 × 2
#>   index value
#>   <int> <int>
#> 1     1     1
#> 2     2     2
#> 3     3     3
#> 4     4     4
#> 5     5     5

Created on 2022-03-02 by the reprex package (v2.0.1)

  • Related