efficient implementation of value selection based on a column-CodePudding

R> data.frame(x1=1:3, x2=11:13, y=c('a', 'a;b', 'b'))
  x1 x2   y
1  1 11   a
2  2 12 a;b
3  3 13   b

I have a data.frame in the format like above, where if y contains a, then x1 will be added to the result, and if y contains b, then x2 will be added to the result.

For this specific example, the result should be data.frame(i=c(1,2,2,3), v=c(1, 2, 12, 13)), where i is the index. The order must be maintained as in the input. It is trivial to use element-by-element operations to perform these tasks. But I am wondering if there is a more efficient implementation (e.g., based on vector operations). Does anybody have any more efficient implementation of this problem?

EDIT: A method based on *apply may be

f=data.frame(x1=1:3, x2=11:13, y=c('a', 'a;b', 'b'))
n=nrow(f)
do.call(
  rbind
  , lapply(seq_len(n), function(i) {
    do.call(
      rbind
      , lapply(strsplit(f$y[[i]], ';')[[1]], function(x) {
        if(x=='a') {
          data.frame(i=i, v=f$x1[[i]])
        } else if(x=='b') {
          data.frame(i=i, v=f$x2[[i]])
        } else {
          NULL
        }
})
    )
})
)

CodePudding user response：

This will give you the desired output:

vector <- df %>% 
  separate_rows(y) %>% 
  mutate(new_col = ifelse(y=="a", x1, x2)) %>% 
  pull(new_col)

dput(vector)

output:

c(1L, 2L, 12L, 13L)

CodePudding user response：

I don't know about efficient for your particular case, but here is what I propose :

library(tidyr)
dat <- tibble( # First create the data
  x1 = 1:3, x2 = 11:13, y = c('a', 'a;b', 'b'))

dat %>% 
  add_row(x1 = 23, x2 = -2, y = "bla") %>% # Add a row for testing purposes
  separate_rows(y, sep = ";") %>% # separate rows with ";"
  mutate(
    result = 
      case_when( # Output either x1 or x2 based on the value in "y"
        y == "a" ~ x1,
        y == "b" ~ x2))