Home > Software engineering >  Select a string ending at the first instance of character in Regular Expressions
Select a string ending at the first instance of character in Regular Expressions

Time:10-14

Say I have the following string:

pos/S881.LMG1810.QE009562.mzML

And wish to select the beginning from that string:

pos/S881.

I can use the following regex expression to get the start of the string (^), then any character (.), any number of time (*), ending with a decimal point (\.)

^.*\.

However this terminates at the last decimal in the string and thus gives me:

pos/S881.LMG1810.QE009562.

How do I terminate the selection at the first decimal point?

CodePudding user response:

We can use a regex lookaround ((?<=\\.)) to match the characters that succeed after the . and remove those with trimws

trimws(str1, whitespace = "(?<=\\.).*")
[1] "pos/S881."

Or extract the characters from the start (^) of the string that are not a . ([^.] ) followed by a dot (metacharacter, thus escaped)

library(stringr)
str_extract(str1, "^[^.] \\.")
[1] "pos/S881."

data

str1 <- "pos/S881.LMG1810.QE009562.mzML"

CodePudding user response:

Alternatively just use sub():

s <- 'pos/S881.LMG1810.QE009562.mzML'
sub("\\..*", ".", s)
# [1] "pos/S881."
  • \\..* - Match a literal dot followed by 0 characters.

CodePudding user response:

Accepting @akrun answer for their quick response but found that the "?" modifier makes "*" non greedy in my original expression as written.

stringr::str_extract("pos/S881.LMG1810.QE009562.mzML", "^.*?\\.")
[1] "pos/S881."

CodePudding user response:

We could use strsplit:

With strsplit function and indexing we extract the desired part of the string:

strsplit(x, "\\.")[[1]][1]  
[1] "pos/S881"
  • Related