Home > database >  How to extract dates from a text string?
How to extract dates from a text string?

Time:08-02

Im trying to extract dates from a text string. So far I have had some progress with the anydate. My idea is to extract all dates within a text string,to a string separated by a comma, like this:

str1 = "08/07/2022 FC 08/15/2022 yusubclavio derecho"
paste0(anydate(str_extract_all(str1, "[[:alnum:]] [ /]\\d{2}[ /]\\d{4}")[[1]]), collapse = ", ")
[1] "2022-08-07, 2022-08-15"

My problems start when date format is DD/MM/YYYY.

str1 = "22/08/2022 FC yusubclavio derecho"
paste0(anydate(str_extract_all(str1, "[[:alnum:]] [ /]\\d{2}[ /]\\d{4}")[[1]]), collapse = ", ")
[1] ""

CodePudding user response:

We could use parse_date from parsedate - it should be able to parse most of the date format, but 2 digit year can be an issue i.e if the '22' should be parsed as 1922 instead of 2022

library(parsedate)
as.Date( parse_date(unlist(str_extract_all(str1, "\\d /\\d /\\d "))))

-output

[1] "2022-08-22" "2022-08-22" "2022-08-07" "2022-08-15"

data

str1 <- c("08/22/22 FC yusubclavio derecho", "22/08/2022 FC yusubclavio derecho", 
"08/07/2022 FC 08/15/2022 yusubclavio derecho")

CodePudding user response:

A base R option might be using scan grep

> grep("^(\\d|/) $", scan(text = str1, what = "", quiet = TRUE), value = TRUE)
[1] "08/22/22"   "22/08/2022" "08/07/2022" "08/15/2022"
  • Related