I have a list of dates stored as strings and I'd like to convert them to date format. I face two main problems:
- The month-day-year separator is not consistent: sometimes it's
_
, sometimes-
. - The month and day positions in the strings are not consistent: sometimes the day comes before the month and sometimes the other way around.
I wonder if there's a way to write the regex
so that all three strings below are converted to dates.
> mydate <- c('Jan_30_2018','April_3-2018','07_June_2018')
> as.Date(mydate,'%B_%d_%Y')
[1] "2018-01-30" NA NA
> as.Date(mydate,'%B.%d.%Y')
[1] NA NA NA
> as.Date(mydate,'%B*%d*%Y')
[1] NA NA NA
> as.Date(mydate,'%B %d %Y')
[1] NA NA NA
> as.Date(mydate,'%B_%d-%Y')
[1] NA "2018-04-03" NA
CodePudding user response:
as.Date(ifelse(grepl("^[A-Z]",mydate),
as.Date(gsub("_","-",mydate), "%B-%d-%Y"),
as.Date(gsub("_","-",mydate), "%d-%B-%Y")
), origin="1970-01-01")
[1] "2018-01-30" "2018-04-03" "2018-06-07"
Update
This approach provides some nice speed, in case your vector mydate
is of any reasonable size:
library(data.table)
data.table(d=gsub("_","-",mydate))[
, fifelse(grepl("^[A-Za-z]",d),as.Date(d,"%B-%d-%Y"), as.Date(d,"%d-%B-%Y"))]
CodePudding user response:
As hinted by @rawr in a comment an hour ago, the anydate()
function from my anytime package was made for just this:
- does not require a format string but checks a number of possible and sensible ones
- does not require all elements of a vector to use the same format
- does use vectorised and compiled operations so it is fast
Example
> anytime::anydate(c('Jan_30_2018','April_3-2018','07_June_2018'))
[1] "2018-01-30" "2018-04-03" "2018-06-07"
>