I have a question very similar to this one. The difference with mine is that I can have text with multiple dates within one string. All the dates are in the same format, as demonstrated below
rep <- "on the evening of june 11 2022, i was too tired to complete my homework that was due on august 4 2022. on august 25 2022 there will be a test "
All my sentences are lower case and all dates follow the %B %d %Y
format. I'm able to extract all the dates using the following code:
> pattern <- paste(month.name, "[:digit:]{1,2}", "[:digit:]{4}", collapse = "|") %>%
regex(ignore_case = TRUE)
> str_extract_all(rep, pattern)
[[1]]
[1] "june 11 2022" "august 4 2022" "august 25 2022"
what I want to do is replace every instance of a date formatted %B %d %Y
with the format %Y-%m-%d
. I've tried something like this:
str_replace_all(rep, pattern, as.character(as.Date(str_extract_all(rep, pattern),format = "%B %d %Y")))
Which throws the error do not know how to convert 'str_extract_all' to class "Date"
. This makes sense to me since Im trying to replace multiple different dates and R doesn't know which one to replace it with.
If I change the str_extract_all
to just str_extract
I get this:
"on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-06-11. on 2022-06-11 there will be a test "
Which again, makes sense since the str_extract is taking the first instance of a date, converting the format, and applying that same date across all instances of a date.
I would prefer if the solution used the stringr
package just because most of my string tidying thus far has been using that package, BUT I am 100% open to any solution that gets the job done.
CodePudding user response:
We may capture the pattern i.e one or more character (\\w
) followed by a space then one or two digits (\\d{1,2}
), followed by space and then four digits (\\d{4}
) as a group ((...)
) and in the replacement pass a function to convert the captured group to Date
class
library(stringr)
str_replace_all(rep, "(\\w \\d{1,2} \\d{4})", function(x) as.Date(x, "%b %d %Y"))
-output
[1] "on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-08-04. on 2022-08-25 there will be a test "
NOTE: It is better to name objects with different names as rep
is a base R
function name
CodePudding user response:
You can pass a named vector with multiple replacements to str_replace_all()
:
library(stringr)
rep <- "on the evening of june 11 2022, i was too tired to complete my homework that was due on august 4 2022. on august 25 2022 there will be a test "
pattern <- paste(month.name, "[:digit:]{1,2}", "[:digit:]{4}", collapse = "|") %>%
regex(ignore_case = TRUE)
extracted <- str_extract_all(rep, pattern)[[1]]
replacements <- setNames(as.character(as.Date(extracted, format = "%B %d %Y")),
extracted)
str_replace_all(rep, replacements)
#> [1] "on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-08-04. on 2022-08-25 there will be a test "
Created on 2022-05-26 by the reprex package (v2.0.1)