I need to delete the part of a text string that occurs after the last underscore, _
, of the string, including the underscore. Just to clarify with an example: the initial string is "surname_name_job"
and I need to change it into "surname_name"
.
I have tried gsub("_.*", "", "string")
, but this, of course, deletes after the first occurrence of _
, and I was not able to find the right syntax for deleting from the last underscore. Tried also the solution in Remove all characters in a string after the last ocurrence of a pattern in R, but it keeps the last _
, which I do not want to be included.
CodePudding user response:
We can use sub
as follows:
x <- "surname_name_job"
output <- sub("_[^_]*$", "", x)
output
[1] "surname_name"
The replacement pattern works by targeting:
_
an underscore[^_]*
followed by zero or more non underscores$
end of the string (implying the underscore we targeted was the last one)
CodePudding user response:
if you have a fixed separator (in this case is underscore) you can use the stringr::str_split to remove the string part you dont want.
# set the text
text<- "surname_name_job"
# do splitting using underscore as separator
splits<- stringr::str_split(text,'_',simplify=TRUE)
# will remove the last position (as simplify=TRUE splits is a matrix)
splits<- splits[-length(splits)]
# will concatenate the remaining strings using the paste function
newText<- paste(splits,collapse="_")
CodePudding user response:
You can also use
sub('^(.*)_.*', '\\1', "surname_name_job")
See the online R demo and the regex demo.
Details:
^
- start of string(.*)
- Group 1 (\1
refers to this group value from the replacement pattern): any zero or more chars as many as possible_
- an underscore.*
- the rest of the string.