Home > Software engineering >  Replace only the last occurring pattern of a string
Replace only the last occurring pattern of a string

Time:01-05

I need to delete the part of a text string that occurs after the last underscore, _, of the string, including the underscore. Just to clarify with an example: the initial string is "surname_name_job" and I need to change it into "surname_name".

I have tried gsub("_.*", "", "string"), but this, of course, deletes after the first occurrence of _, and I was not able to find the right syntax for deleting from the last underscore. Tried also the solution in Remove all characters in a string after the last ocurrence of a pattern in R, but it keeps the last _, which I do not want to be included.

CodePudding user response:

We can use sub as follows:

x <- "surname_name_job"
output <- sub("_[^_]*$", "", x)
output

[1] "surname_name"

The replacement pattern works by targeting:

  • _ an underscore
  • [^_]* followed by zero or more non underscores
  • $ end of the string (implying the underscore we targeted was the last one)

CodePudding user response:

if you have a fixed separator (in this case is underscore) you can use the stringr::str_split to remove the string part you dont want.

# set the text
text<- "surname_name_job"
# do splitting using underscore as separator
splits<- stringr::str_split(text,'_',simplify=TRUE)
# will remove the last position (as simplify=TRUE splits is a matrix)
splits<- splits[-length(splits)]
# will concatenate the remaining strings using the paste function
newText<- paste(splits,collapse="_")

CodePudding user response:

You can also use

sub('^(.*)_.*', '\\1', "surname_name_job")

See the online R demo and the regex demo.

Details:

  • ^ - start of string
  • (.*) - Group 1 (\1 refers to this group value from the replacement pattern): any zero or more chars as many as possible
  • _ - an underscore
  • .* - the rest of the string.
  • Related