Let's say I have a vector as follows:
patient_condition <- c("Pre_P1","Post_P1","Enriched_Post_P1","Post_P1_2","Pre_P2","Post_P2", "P3_Pre")
to_match <- c("P1","P2","P3")
I want to create another vector such that the new vector only contains value in to_match if it is a substring.
[1] "P1" "P1" "P1" "P1" "P2" "P2" "P3"
Any help is appreciated. Thank you!
CodePudding user response:
We can use
stringr::str_extract(patient_condition, "P[0-9] ")
#[1] "P1" "P1" "P1" "P1" "P2" "P2" "P3"
Misc Replies
In my case, this answer works. but I guess the question I ask is extracting substrings from a vector given some values to match. Meaning this answer won't work if I want to extract characters (i.e. Pre, Post, Enriched, etc)
to_match <- c("Pre", "Post", "Enriched")
In that case, we can use
## R-level loop through `to_match`
tmp <- t(sapply(to_match, stringr::str_extract, string = patient_condition))
tmp[!is.na(tmp)]
#[1] "Pre" "Post" "Enriched" "Post" "Pre" "Post" "Pre"
or
## convert multiple matches to REGEX "or" operation `|`
stringr::str_extract(patient_condition, paste0(to_match, collapse = "|"))
#[1] "Pre" "Post" "Enriched" "Post" "Pre" "Post" "Pre"
ThomasIsCoding's answer using gregexpr
regmatches
is also a good alternative.
Note that this is doing exact substrings matching.
CodePudding user response:
You could grep
then rep
according to the lengths
.
Map(rep, to_match, lengths(sapply(to_match, grep, patient_condition)), USE.NAMES=FALSE) |> unlist()
# [1] "P1" "P1" "P1" "P1" "P2" "P2" "P3"
CodePudding user response:
A base R option using regmatches
to extract the desired patterns
> regmatches(patient_condition, gregexpr(paste0(to_match, collapse = "|"), patient_condition))
[[1]]
[1] "P1"
[[2]]
[1] "P1"
[[3]]
[1] "P1"
[[4]]
[1] "P1"
[[5]]
[1] "P2"
[[6]]
[1] "P2"
[[7]]
[1] "P3"
CodePudding user response:
more generally you can use match
lookup <- c("Pre_P1","Post_P1","Enriched_Post_P1","Post_P1_2","Pre_P2","Post_P2", "P3_Pre")
to_match <- c("P1","P1","P1", "P1", "P2", "P2","P3")
patient_condition <- c("P3_Pre", "Post_P1", "Enriched_Post_P1")
result <- to_match[match(patient_condition, lookup)]
[1] "P3" "P1" "P1"