Home > front end >  stringr::str_extract all elements of a list R
stringr::str_extract all elements of a list R

Time:04-20

I'm trying to get all the "banana word" ocurrences of a given object, but the str_extract returns only the first occurence. My code:

all_terms <- c("banana word2 word3 word4 banana split word2 word3 word4",
               "x y z",
               "banana ice cream")

banana_terms <- all_terms %>% 
  str_extract("banana. ") %>% 
  word(1,2)


banana_terms
Out: [1] "banana word2" NA             "banana ice"  

What I wanted:

Out: [1] "banana word2" "banana split", "banana ice" 

CodePudding user response:

Use str_extract_all and \\w to get the word after banana (and banana).

all_terms %>% 
  str_extract_all("banana.\\w ") %>% 
  unlist()

# [1] "banana word2" "banana split" "banana ice"

Without unlist, you get a list:

str_extract_all(all_terms, "banana.\\w ")

[[1]]
[1] "banana word2" "banana split"

[[2]]
character(0)

[[3]]
[1] "banana ice"

CodePudding user response:

If you want to use str_extract, you need to make sure each "banana word" is an individual element in a vector.

str_split is used to split every "empty space" "banana" pattern (" (?=banana)") into individual element. Then use the regex (banana.\\w ) provided by @Maël in str_extract.

Finally, remove NA in the vector.

library(stringr)

all_banana <- str_extract(str_split(all_terms, " (?=banana)", simplify = T), "banana.\\w ")
all_banana <- all_banana[!is.na(all_banana)]

all_banana
[1] "banana word2" "banana ice"   "banana split"

CodePudding user response:

In base R, we can use regmatches/gregexpr

unlist(regmatches(all_terms, gregexpr("banana\\s \\S ", all_terms)))
[1] "banana word2" "banana split" "banana ice"  
  • Related