Extract character elements from vectors-CodePudding

I have a set of character vectors:

a <- "bmi   ch | study"
b <- "bmi * ch | study"
c <- "bmi * ch - 1 | study"
d <- "bmi * ch   0 | study"
e <- "bmi:ch   0 | study"

In this example, I want to extract the two strings "bmi" and "ch", i.e. the desired output is c("bmi", "ch")

The strings above are just examples; the character elements to be extracted can be anything else other than ch and bmi. I'm looking for a general solution, without hard-coding.

I have tried unlist(stringr::str_extract_all(a, "bmi|ch")). However, here I manually define the pattern "bmi|ch" to achieve the desired output. Thus, it's not a general solution.

CodePudding user response：

Assume the vector v defined in the Note at the end. Then we can lapply over it using the indicated function. If the number of variables is always the same you could alternately use sapply giving a matrix.

lapply(sub("\\|.*", "", v), function(x) all.vars(parse(text = x)))

giving:

[[1]]
[1] "bmi" "ch" 

[[2]]
[1] "bmi" "ch" 

[[3]]
[1] "bmi" "ch" 

[[4]]
[1] "bmi" "ch" 

[[5]]
[1] "bmi" "ch"

Note

a <- "bmi   ch | study"
b <- "bmi * ch | study"
c <- "bmi * ch - 1 | study"
d <- "bmi * ch   0 | study"
e <- "bmi:ch   0 | study"
v <- c(a, b, c, d, e)

CodePudding user response：

This is a bit more complicated and not efficient. I will just leave it here in case someone may find it interesting.

vecs<-list(a,b, c,d,e)
split_me<-Map(function(x) gsub("([a-z].*[a-z])(\\W.*)","\\1",x, 

perl=TRUE), vecs)
 lapply(split_me, function(x) 
  unlist(strsplit(gsub("\\s", "",x), "[ *:]")))

Result

[[1]]
[1] "bmi" "ch" 

[[2]]
[1] "bmi" "ch" 

[[3]]
[1] "bmi" "ch" 

[[4]]
[1] "bmi" "ch" 

[[5]]
[1] "bmi" "ch"

Data

a <- "bmi   ch | study"
b <- "bmi * ch | study"
c <- "bmi * ch - 1 | study"
d <- "bmi * ch   0 | study"
e <- "bmi:ch   0 | study"
vecs<-list(a,b, c,d,e)