Need to remove all the text from any given string after the date part, which is mentioned as duration with a hyphen in between
For e.g., x = "AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22 - 30Jan22 - Video Package - XXXX - Optimize"
Expression I'm currently using: gsub("[0-9 A-Z a-z][22] \- [0-9 A-Z a-z][22].*", "\1", x)
Output: AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22 - 30Jan22
However, the space before & after the hyphen might not be present always, for e.g.,
x = "AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22-30Jan22 - Video Package - XXXX - Optimize"
The above mentioned regex isn't working in this case
CodePudding user response:
I would use sub()
here with a capture group:
x <- "AB - CEPC - Telephone_BAU_CPM_Link - 2Jan22-30Jan22 - Video Package - XXXX - Optimize"
output <- sub("(.*\\b\\d{1,2}[A-Z][a-z]{2}\\d{2}\\s*-\\s*\\d{1,2}[A-Z][a-z]{2}\\d{2})\\b.*", "\\1", x)
output
[1] "AB - CEPC - Telephone_BAU_CPM_Link - 2Jan22-30Jan22"
CodePudding user response:
Here is my idea. Very similar to the other, but I modified a little bit just in case the date is not always 2 num, 3 letter, 2 num. Maybe you might have "5Jan22-10Jan22". Maybe it would always read "05Jan22-10Jan22" and this wouldn't be an issue.
x = "AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22-30Jan22 - Video Package - XXXX - Optimize"
sub("(^.*\\d{1,2}[A-Z][a-z]{2}\\d{2}.*\\d{1,2}[A-Z][a-z]{2}\\d{2}).*$", "\\1", x)
#> [1] "AB - CEPC - Telephone_BAU_CPM_Link - 20Jan22-30Jan22"