Home > other >  Replacing substring containing a non-uniform pattern involving a special character
Replacing substring containing a non-uniform pattern involving a special character

Time:06-27

I have a data wherein one of the variable has a non-uniform pattern/format and I need to write a code in R which can remove that part of the string in the variable which follows a specific pattern.

There are links on replacement of patterns such as Extract a string between patterns/delimiters in R, Replace patterns separated by delimiter in R, and Remove part of a string but they haven't discussed the issue related to my data.

This is how the variable (c) looks like and below are the options I tried along with their results.

c <-  c("1998/123; 2001","181;2002/12","212")
c1 <- gsub("[0-9]/[0-9]", "", c) # returns 19923;2001, 181;2002, 212
c2 <- gsub("[0-9]/*", "", c) # returns ";",  " ", ";", ""
c3 <- gsub("[0-9][0-9][0-9][0-9]/", "", c) # returns 123;2001, 181;12, 212 
c4 <- gsub("*[0-9]/[0-9]*", "", c) # returns 199, 200, 212 
c5 <- gsub(" */* ", "", c) # no change
c6 <- str_replace_all(c,"/","") # returns 1998123, 200212, 212
c7 <- grep(fixed("/"), c, invert=TRUE, value = TRUE) # returns 212

a) There can be 3-8 digits after the forward slash. But there can only be 4 digits before the forward slash.

b) Each sub-string is separated by a semicolon delimiter.

c) I want to replace those substrings that contain the forward slash with blank. So, my result should be c(";2001", "181;" ,"212").

Kindly let me know where am I making the mistake. Any suggestions are very much welcome. Thanks.

CodePudding user response:

As the numbers before and after the forward slash have multiple digits you could use (1 or more) or * (0 and more) in your first approach to remove all of them:

c <-  c("1998/123; 2001","181;2002/12","212")

gsub("\\d \\/\\d ", "", c)
#> [1] "; 2001" "181;"   "212"
  • Related