Home > Enterprise >  R - Convert strings to dates
R - Convert strings to dates

Time:04-02

I have a list of dates stored as strings and I'd like to convert them to date format. I face two main problems:

  1. The month-day-year separator is not consistent: sometimes it's _, sometimes -.
  2. The month and day positions in the strings are not consistent: sometimes the day comes before the month and sometimes the other way around.

I wonder if there's a way to write the regex so that all three strings below are converted to dates.

> mydate <- c('Jan_30_2018','April_3-2018','07_June_2018')
> as.Date(mydate,'%B_%d_%Y')
[1] "2018-01-30" NA           NA          
> as.Date(mydate,'%B.%d.%Y')
[1] NA NA NA
> as.Date(mydate,'%B*%d*%Y')
[1] NA NA NA
> as.Date(mydate,'%B %d %Y')
[1] NA NA NA
> as.Date(mydate,'%B_%d-%Y')
[1] NA           "2018-04-03" NA   

CodePudding user response:

as.Date(ifelse(grepl("^[A-Z]",mydate),
       as.Date(gsub("_","-",mydate), "%B-%d-%Y"),
       as.Date(gsub("_","-",mydate), "%d-%B-%Y")
       ), origin="1970-01-01")

[1] "2018-01-30" "2018-04-03" "2018-06-07"

Update

This approach provides some nice speed, in case your vector mydate is of any reasonable size:

library(data.table)

data.table(d=gsub("_","-",mydate))[
, fifelse(grepl("^[A-Za-z]",d),as.Date(d,"%B-%d-%Y"), as.Date(d,"%d-%B-%Y"))]

CodePudding user response:

As hinted by @rawr in a comment an hour ago, the anydate() function from my anytime package was made for just this:

  • does not require a format string but checks a number of possible and sensible ones
  • does not require all elements of a vector to use the same format
  • does use vectorised and compiled operations so it is fast

Example

> anytime::anydate(c('Jan_30_2018','April_3-2018','07_June_2018')) 
[1] "2018-01-30" "2018-04-03" "2018-06-07"   
>  
  • Related