Say I have the following string:
pos/S881.LMG1810.QE009562.mzML
And wish to select the beginning from that string:
pos/S881.
I can use the following regex expression to get the start of the string (^), then any character (.), any number of time (*), ending with a decimal point (\.)
^.*\.
However this terminates at the last decimal in the string and thus gives me:
pos/S881.LMG1810.QE009562.
How do I terminate the selection at the first decimal point?
CodePudding user response:
We can use a regex lookaround ((?<=\\.)
) to match the characters that succeed after the .
and remove those with trimws
trimws(str1, whitespace = "(?<=\\.).*")
[1] "pos/S881."
Or extract the characters from the start (^
) of the string that are not a .
([^.]
) followed by a dot (metacharacter, thus escaped)
library(stringr)
str_extract(str1, "^[^.] \\.")
[1] "pos/S881."
data
str1 <- "pos/S881.LMG1810.QE009562.mzML"
CodePudding user response:
Alternatively just use sub()
:
s <- 'pos/S881.LMG1810.QE009562.mzML'
sub("\\..*", ".", s)
# [1] "pos/S881."
\\..*
- Match a literal dot followed by 0 characters.
CodePudding user response:
Accepting @akrun answer for their quick response but found that the "?" modifier makes "*" non greedy in my original expression as written.
stringr::str_extract("pos/S881.LMG1810.QE009562.mzML", "^.*?\\.")
[1] "pos/S881."
CodePudding user response:
We could use strsplit
:
With strsplit
function and indexing we extract the desired part of the string:
strsplit(x, "\\.")[[1]][1]
[1] "pos/S881"