Im trying to extract dates from a text string. So far I have had some progress with the anydate
. My idea is to extract all dates within a text string,to a string separated by a comma, like this:
str1 = "08/07/2022 FC 08/15/2022 yusubclavio derecho"
paste0(anydate(str_extract_all(str1, "[[:alnum:]] [ /]\\d{2}[ /]\\d{4}")[[1]]), collapse = ", ")
[1] "2022-08-07, 2022-08-15"
My problems start when date format is DD/MM/YYYY.
str1 = "22/08/2022 FC yusubclavio derecho"
paste0(anydate(str_extract_all(str1, "[[:alnum:]] [ /]\\d{2}[ /]\\d{4}")[[1]]), collapse = ", ")
[1] ""
CodePudding user response:
We could use parse_date
from parsedate
- it should be able to parse most of the date format, but 2 digit year can be an issue i.e if the '22' should be parsed as 1922 instead of 2022
library(parsedate)
as.Date( parse_date(unlist(str_extract_all(str1, "\\d /\\d /\\d "))))
-output
[1] "2022-08-22" "2022-08-22" "2022-08-07" "2022-08-15"
data
str1 <- c("08/22/22 FC yusubclavio derecho", "22/08/2022 FC yusubclavio derecho",
"08/07/2022 FC 08/15/2022 yusubclavio derecho")
CodePudding user response:
A base R option might be using scan
grep
> grep("^(\\d|/) $", scan(text = str1, what = "", quiet = TRUE), value = TRUE)
[1] "08/22/22" "22/08/2022" "08/07/2022" "08/15/2022"