Home > Enterprise >  extracting segment from string with multiple identical symbols
extracting segment from string with multiple identical symbols

Time:12-27

I’m trying to extract “A35-9B004” out of “A35-9B004-65g3h” using the function sub in R but keep failing. I’ve tried using regular expressions but can´t seem to figure out how to handle the double “-“ in the string, and can only extract first or last segment in the string.

Thank you!

x<-"A35-9B004-65g3h"
sub(".*-", "",x)
[1] "65g3h"
sub("*-.*", "", x)
[1] "A35"

CodePudding user response:

We could use the pattern to match the - followed by one or more characters that are not a - ([^-] ) till the end ($) of the string and replace with blank ("")

sub("-[^-] $", "", x)
[1] "A35-9B004"

Or use trimws with whitespace that takes a regex

trimws(x, whitespace = "-[^-] ")
[1] "A35-9B004"
  •  Tags:  
  • r
  • Related