Home > Blockchain >  Remove strings that contain two or more capital letters from a vector in r
Remove strings that contain two or more capital letters from a vector in r

Time:11-23

I have the following vector for an example.

isotopes <- c("6Li", "7Li", "7LiH", "10B", "11B", "11BH")

I want to remove the strings "7LiH" and "11BH" from the vector. These values have two capital letters and so I am trying to figure out how to use grep to remove those values or just index out the other strings in the vector. How can I do this?

CodePudding user response:

You can simply grep for elements that contain 2 or more capital letters and invert the match:

grep('[A-Z].*[A-Z]', isotopes, value=TRUE, invert=TRUE)

The regex matches a string that contains an uppercase letter, then probably something else and then one more uppercase letter (not necessary at the beginning or end)

CodePudding user response:

I made a slight modification to the code so that it could match any digit or lowercase letter between 2 uppercase:

isotopes[!grepl('[A-Z]([1-9a-z] )?[A-Z]', isotopes)]

CodePudding user response:

Another option is to count the number of uppercase letters using stringr, then keep only the strings that have less than 2 uppercase letters.

library(stringr)

isotopes[str_count(isotopes, "[A-Z]") < 2]

# "6Li" "7Li" "10B" "11B"

Or with stringi:

library(stringi)

isotopes[stri_count(isotopes, regex="[A-Z]") < 2]

Or with base R:

isotopes[lengths(gregexpr("[A-Z]", isotopes)) < 2]
  •  Tags:  
  • r
  • Related