R> data.frame(x1=1:3, x2=11:13, y=c('a', 'a;b', 'b'))
x1 x2 y
1 1 11 a
2 2 12 a;b
3 3 13 b
I have a data.frame in the format like above, where if y contains a
, then x1 will be added to the result, and if y contains b
, then x2 will be added to the result.
For this specific example, the result should be data.frame(i=c(1,2,2,3), v=c(1, 2, 12, 13))
, where i
is the index. The order must be maintained as in the input. It is trivial to use element-by-element operations to perform these tasks. But I am wondering if there is a more efficient implementation (e.g., based on vector operations). Does anybody have any more efficient implementation of this problem?
EDIT: A method based on *apply may be
f=data.frame(x1=1:3, x2=11:13, y=c('a', 'a;b', 'b'))
n=nrow(f)
do.call(
rbind
, lapply(seq_len(n), function(i) {
do.call(
rbind
, lapply(strsplit(f$y[[i]], ';')[[1]], function(x) {
if(x=='a') {
data.frame(i=i, v=f$x1[[i]])
} else if(x=='b') {
data.frame(i=i, v=f$x2[[i]])
} else {
NULL
}
})
)
})
)
CodePudding user response:
This will give you the desired output:
vector <- df %>%
separate_rows(y) %>%
mutate(new_col = ifelse(y=="a", x1, x2)) %>%
pull(new_col)
dput(vector)
output:
c(1L, 2L, 12L, 13L)
CodePudding user response:
I don't know about efficient for your particular case, but here is what I propose :
library(tidyr)
dat <- tibble( # First create the data
x1 = 1:3, x2 = 11:13, y = c('a', 'a;b', 'b'))
dat %>%
add_row(x1 = 23, x2 = -2, y = "bla") %>% # Add a row for testing purposes
separate_rows(y, sep = ";") %>% # separate rows with ";"
mutate(
result =
case_when( # Output either x1 or x2 based on the value in "y"
y == "a" ~ x1,
y == "b" ~ x2))