I have the following vector for an example.
isotopes <- c("6Li", "7Li", "7LiH", "10B", "11B", "11BH")
I want to remove the strings "7LiH"
and "11BH"
from the vector. These values have two capital letters and so I am trying to figure out how to use grep
to remove those values or just index out the other strings in the vector. How can I do this?
CodePudding user response:
You can simply grep for elements that contain 2 or more capital letters and invert the match:
grep('[A-Z].*[A-Z]', isotopes, value=TRUE, invert=TRUE)
The regex matches a string that contains an uppercase letter, then probably something else and then one more uppercase letter (not necessary at the beginning or end)
CodePudding user response:
I made a slight modification to the code so that it could match any digit or lowercase letter between 2 uppercase:
isotopes[!grepl('[A-Z]([1-9a-z] )?[A-Z]', isotopes)]
CodePudding user response:
Another option is to count the number of uppercase letters using stringr
, then keep only the strings that have less than 2 uppercase letters.
library(stringr)
isotopes[str_count(isotopes, "[A-Z]") < 2]
# "6Li" "7Li" "10B" "11B"
Or with stringi
:
library(stringi)
isotopes[stri_count(isotopes, regex="[A-Z]") < 2]
Or with base R:
isotopes[lengths(gregexpr("[A-Z]", isotopes)) < 2]