I have a data wherein one of the variable has a non-uniform pattern/format and I need to write a code in R which can remove that part of the string in the variable which follows a specific pattern.
There are links on replacement of patterns such as Extract a string between patterns/delimiters in R, Replace patterns separated by delimiter in R, and Remove part of a string but they haven't discussed the issue related to my data.
This is how the variable (c) looks like and below are the options I tried along with their results.
c <- c("1998/123; 2001","181;2002/12","212")
c1 <- gsub("[0-9]/[0-9]", "", c) # returns 19923;2001, 181;2002, 212
c2 <- gsub("[0-9]/*", "", c) # returns ";", " ", ";", ""
c3 <- gsub("[0-9][0-9][0-9][0-9]/", "", c) # returns 123;2001, 181;12, 212
c4 <- gsub("*[0-9]/[0-9]*", "", c) # returns 199, 200, 212
c5 <- gsub(" */* ", "", c) # no change
c6 <- str_replace_all(c,"/","") # returns 1998123, 200212, 212
c7 <- grep(fixed("/"), c, invert=TRUE, value = TRUE) # returns 212
a) There can be 3-8 digits after the forward slash. But there can only be 4 digits before the forward slash.
b) Each sub-string is separated by a semicolon delimiter.
c) I want to replace those substrings that contain the forward slash with blank. So, my result should be c(";2001", "181;" ,"212").
Kindly let me know where am I making the mistake. Any suggestions are very much welcome. Thanks.
CodePudding user response:
As the numbers before and after the forward slash have multiple digits you could use
(1 or more) or *
(0 and more) in your first approach to remove all of them:
c <- c("1998/123; 2001","181;2002/12","212")
gsub("\\d \\/\\d ", "", c)
#> [1] "; 2001" "181;" "212"