I have the following vector.
column_names <- c("6Li", "7Li", "10B", "11B", "7Li.1",
"205Pb", "206Pb", "207Pb", "238U",
"206Pb.1", "238U.1")
Notice that some of the values are just duplicates with a ".1" stuck at the end. I want to index out all of these character strings along with their corresponding character strings that match such that only the following are returned.
#[1] "7Li" "7Li.1" "206Pb" "238U" "206Pb.1" "238U.1"
Assume you don't know the index positions and so you cannot simply index these values out as follows column_names[c(2,5,7,9,10,11)]
. How can I use pattern matching to extract these values?
CodePudding user response:
There is likely a more elegant solution, but in base R you cold try a combination of grep
/gsub
and paste
:
idx <- grep(paste(gsub("\\.1", "", column_names[grep("\\.1", column_names)]), collapse = "|"), column_names)
# [1] 2 5 7 9 10 11
column_names[idx]
# [1] "7Li" "7Li.1" "206Pb" "238U" "206Pb.1" "238U.1"
CodePudding user response:
Using gsub()
and duplicated()
to find values with repeated stems:
column_stems <- gsub("\\.1", "", column_names)
dup_idx <- duplicated(column_stems) | duplicated(column_stems, fromLast = TRUE)
column_names[dup_idx]
# "7Li" "7Li.1" "206Pb" "238U" "206Pb.1" "238U.1"
To also find instances ending with .2
, .3
, etc., use "\\.\\d "
instead of "\\.1"
in gsub()
.
CodePudding user response:
You could use stringr
:
library(stringr)
idx <- str_extract(column_names, ".*(?=\\.1)")
column_names[str_detect(column_names, paste(idx[!is.na(idx)], collapse = "|"))]
This returns
#> [1] "7Li" "7Li.1" "206Pb" "238U" "206Pb.1" "238U.1"