I have the following data that looks like this:
my_data = c("red A1B 5L2 101", "blue A1C 5L8 10872", "Green A1D 5L5 100003" )
Starting from the right hand side of each string, I wanted to remove the number as well as the spaces before the number.
The final result would look something like this:
[1] "red A1B 5L2" "blue A1C 5L8" "Green A1D 5L5"
I know that there is a regex pattern that appears in each string in the following format: '(([A-Z] ?[0-9]){3})|.', '\\1'
Thus, I want to identify the position where this regex pattern finishes and the position where the string finishes - then I could delete the space between these two positions and obtain the desired result.
I found this link which shows how to remove all characters in a string appearing to the left or to the right of a certain pattern (https://datascience.stackexchange.com/questions/8922/removing-strings-after-a-certain-character-in-a-given-text). I tried to apply the logic provided here to my example:
gsub("(([A-Z] ?[0-9]){3})|.', '\\1.*","",my_data)
But this is producing the opposite result!
[1] "red 101" "blue 10872" "Green 100003"
Can someone please show me how to resolve this problem?
CodePudding user response:
We can use sub()
here:
my_data <- c("red A1B 5L2 101", "blue A1C 5L8 10872", "Green A1D 5L5 100003" )
output <- sub("\\s \\d $", "", my_data)
output
[1] "red A1B 5L2" "blue A1C 5L8" "Green A1D 5L5"
The regex pattern used here is \s \d $
and matches one or more whitespace characters followed by one or more digits at the end of the string.