Home > OS >  R regex for matching where string does not start with
R regex for matching where string does not start with

Time:08-24

I need a regex for str_remove() that removes all characters after and including " - " but only for strings that do not start with a number. For example, it should turn

"d_19 - blah" into "d_19"

but leave "1 - blah" unaffected

CodePudding user response:

Try this one.

gsub('^\\D.*\\K\\s\\-.*', '', x, perl=TRUE)
# [1] "d_19"     "1 - blah"

CodePudding user response:

In base R:

sub("^(\\D.*) -.*", "\\1", string)
[1] "d_19"     "1 - blah"  

Using perl in base R

sub("^\\D.*\\K -.*", "", string, perl=TRUE)
[1] "d_19"     "1 - blah"

using str_replace

str_replace(string, "^(\\D.*) -.*", "\\1")
[1] "d_19"     "1 - blah"

CodePudding user response:

For me it is somethimes easier to read if the regex is not too complex, so I simplified the task by excluding the untouchable elements first

x <- c("d_19 - blah", "1 - blah")

# which elements start with a number
x_num <- !grepl("^[0-9]", x)

# remove everything starting from the dash from all other elements
x[x_num] <- trimws(stringr::str_remove(x[x_num], "-. "))

x
#> [1] "d_19"     "1 - blah"
  • Related