Home > Back-end >  Formatting and Replacing Multiple Dates within a Single String in R
Formatting and Replacing Multiple Dates within a Single String in R

Time:05-27

I have a question very similar to this one. The difference with mine is that I can have text with multiple dates within one string. All the dates are in the same format, as demonstrated below

rep <- "on the evening of june 11 2022, i was too tired to complete my homework that was due on august 4 2022. on august 25 2022 there will be a test "

All my sentences are lower case and all dates follow the %B %d %Y format. I'm able to extract all the dates using the following code:

> pattern <-  paste(month.name, "[:digit:]{1,2}", "[:digit:]{4}", collapse = "|") %>% 
     regex(ignore_case = TRUE)
> str_extract_all(rep, pattern)
[[1]]
[1] "june 11 2022"   "august 4 2022"  "august 25 2022"

what I want to do is replace every instance of a date formatted %B %d %Y with the format %Y-%m-%d. I've tried something like this:

str_replace_all(rep, pattern, as.character(as.Date(str_extract_all(rep, pattern),format = "%B %d %Y")))

Which throws the error do not know how to convert 'str_extract_all' to class "Date". This makes sense to me since Im trying to replace multiple different dates and R doesn't know which one to replace it with.

If I change the str_extract_all to just str_extract I get this:

"on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-06-11. on 2022-06-11 there will be a test "

Which again, makes sense since the str_extract is taking the first instance of a date, converting the format, and applying that same date across all instances of a date.

I would prefer if the solution used the stringr package just because most of my string tidying thus far has been using that package, BUT I am 100% open to any solution that gets the job done.

CodePudding user response:

We may capture the pattern i.e one or more character (\\w ) followed by a space then one or two digits (\\d{1,2}), followed by space and then four digits (\\d{4}) as a group ((...)) and in the replacement pass a function to convert the captured group to Date class

library(stringr)
str_replace_all(rep, "(\\w  \\d{1,2} \\d{4})", function(x) as.Date(x, "%b %d %Y"))

-output

[1] "on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-08-04. on 2022-08-25 there will be a test "

NOTE: It is better to name objects with different names as rep is a base R function name

CodePudding user response:

You can pass a named vector with multiple replacements to str_replace_all():

library(stringr)

rep <- "on the evening of june 11 2022, i was too tired to complete my homework that was due on august 4 2022. on august 25 2022 there will be a test "
pattern <-  paste(month.name, "[:digit:]{1,2}", "[:digit:]{4}", collapse = "|") %>% 
  regex(ignore_case = TRUE)
extracted <- str_extract_all(rep, pattern)[[1]]
replacements <- setNames(as.character(as.Date(extracted, format = "%B %d %Y")), 
                     extracted)
str_replace_all(rep, replacements)
#> [1] "on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-08-04. on 2022-08-25 there will be a test "

Created on 2022-05-26 by the reprex package (v2.0.1)

  • Related