Home > Mobile >  Trying to write a regular expression to remove all text after date in any given string
Trying to write a regular expression to remove all text after date in any given string

Time:10-29

Need to remove all the text from any given string after the date part, which is mentioned as duration with a hyphen in between

For e.g., x = "AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22 - 30Jan22 - Video Package - XXXX - Optimize"

Expression I'm currently using: gsub("[0-9 A-Z a-z][22] \- [0-9 A-Z a-z][22].*", "\1", x)

Output: AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22 - 30Jan22

However, the space before & after the hyphen might not be present always, for e.g.,

x = "AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22-30Jan22 - Video Package - XXXX - Optimize"

The above mentioned regex isn't working in this case

CodePudding user response:

I would use sub() here with a capture group:

x <- "AB - CEPC - Telephone_BAU_CPM_Link - 2Jan22-30Jan22 - Video Package - XXXX - Optimize"
output <- sub("(.*\\b\\d{1,2}[A-Z][a-z]{2}\\d{2}\\s*-\\s*\\d{1,2}[A-Z][a-z]{2}\\d{2})\\b.*", "\\1", x)
output

[1] "AB - CEPC - Telephone_BAU_CPM_Link - 2Jan22-30Jan22"

CodePudding user response:

Here is my idea. Very similar to the other, but I modified a little bit just in case the date is not always 2 num, 3 letter, 2 num. Maybe you might have "5Jan22-10Jan22". Maybe it would always read "05Jan22-10Jan22" and this wouldn't be an issue.

x = "AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22-30Jan22 - Video Package - XXXX - Optimize"

sub("(^.*\\d{1,2}[A-Z][a-z]{2}\\d{2}.*\\d{1,2}[A-Z][a-z]{2}\\d{2}).*$", "\\1", x)
#> [1] "AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22-30Jan22"
  •  Tags:  
  • r
  • Related