Home > Software engineering >  How to parse dates from a string without writing a regular expression?
How to parse dates from a string without writing a regular expression?

Time:12-23

readr package has a function called parse_number that returns the numbers in a string:

readr::parse_number("Hello 2022!")

[1] 2022

Is there a similar method for returning a date from a string? The readr has a function called parse_date but it does something different:

readr::parse_date("X2018-01-11_poland")

Warning: 1 parsing failure.
row col   expected             actual
  1  -- date like  X2018-01-11_poland

[1] NA

Desired output:

# the raw string is "X2018-01-11_poland"
2018-01-11

P.S. I am not interested in doing this with a regular expression.

CodePudding user response:

Here is a regex free idea,

parse_date(strsplit(x, '_', fixed = TRUE)[[1]][1], format = 'X%Y-%m-%d')
#[1] "2018-01-11"

However, IF the poland part is also fixed, you can again do,

parse_date(x, format = 'X%Y-%m-%d_poland')
#[1] "2018-01-11"

CodePudding user response:

The lubridate package has parse_date_time2 which is easy to use.

library(lubridate)
dstring <- "X2018-01-11_poland"
date <- parse_date_time2(dstring, orders='Ymd')
date
#[1] "2018-01-11 UTC"

CodePudding user response:

1) This uses only base R and does not use any regular expressions. It assumes that (1) there are only letters and spaces before the date as that is the case in the question but that could easily be relaxed, if necessary, by adding additional characters to lets and (2) the date is in standard Date format. chartr translates the ith character in its first argument to the ith character in its second replacing each letter with a space. Then use as.Date. Note that as.Date ignores junk at the end so it is ok if additional characters not in lets follow the date.

x <- "X2018-01-11_poland"

lets <- paste(letters, collapse = "")
as.Date(chartr(lets, strrep(" ", nchar(lets)), tolower(x)))
## [1] "2018-01-11"

2) If we knew that the string always starts with X and the Date appears right after it then we can just specify the prefix in the as.Date format string. It also does not use any regular expressions and only uses base R.

as.Date(x, "X%Y-%m-%d")
## [1] "2018-01-11"

3) If you are willing to compromise and use a very simple regular expression -- here \D matches any non-digit and backslashes must be doubled within quotes. gsub removes any such character.

as.Date(gsub("\\D", "", x), "%Y%m%d")
## [1] "2018-01-11"

CodePudding user response:

Possible alternatives using base R, or stringr and lubridate

as.Date(substr("X2018-01-11_poland", 2, 11), format = "%Y-%m-%d")
#> [1] "2018-01-11"

library(stringr)
library(lubridate)

ymd(str_sub("X2018-01-11_poland", 2, 11))
#> [1] "2018-01-11"

Created on 2021-12-22 by the reprex package (v2.0.1)

  • Related