Home > Mobile >  Standardize dates of different types
Standardize dates of different types

Time:11-18

I have dates of this mixed type, I would like to standardize all of these to the format yyyy-mm-dd and for dates with just the month I would like to use the "average" of the month (e.g. the 15th). For example, October 28, 2021 should become 2021/10/28 and November 2009 should become 2009/11/15. Also what should I do with the missing dates like with position 979?

Here is a reproducible example-

[909] "October 28, 2021"   "April 7, 2014"      "November 2009"      "January 17, 2018"  
 [913] "January 2023"       "February 2012"      "December 2022"      "July 1999"         
 [917] "November 2006"      "June 2011"          "July 2014"          "January 2015"      
 [921] "July 1, 2020"       "October 15, 2018"   "September 27, 2019" "February 14, 2022" 
 [925] "June 28, 2021"      "June 2016"          "March 2013"         "October 2014"      
 [929] "January 2023"       "July 6, 2022"       "January 2014"       "March 22, 2001"    
 [933] "October 10, 2019"   "May 1, 2008"        "December 2008"      "November 2023"     
 [937] "August 2005"        "May 1, 2022"        "January 8, 2014"    "July 2011"         
 [941] "August 15, 2022"    "May 2004"           "November 2012"      "October 1999"      
 [945] "March 2010"         "May 2014"           "October 2006"       "March 1, 2017"     
 [949] "June 25, 2019"      "October 2004"       "September 2016"     "June 10, 2019"     
 [953] "April 4, 2017"      ""                   "August 30, 2018"    "July 1, 2017"      
 [957] "November 14, 2019"  "November 2006"      "September 1, 2022"  "April 2007"        
 [961] "July 12, 2013"      "August 14, 2015"    "March 2013"         "January 2014"      
 [965] "March 2013"         "June 27, 2019"      "April 2008"         "July 2007"         
 [969] "February 2007"      "May 2013"           "April 2011"         "December 2007"     
 [973] "July 2007"          "December 2008"      "May 5, 2017"        "December 2007"     
 [977] "February 27, 2007"  "February 13, 2018"  ""                   "August 2014"       
 [981] "September 9, 2019"  "October 2010"       "January 30, 2013"   "January 2010"      
 [985] "September 15, 2015" "March 2006"         "April 2016"         "March 2014"        
 [989] "April 2010"         "February 20, 2017"  "October 2015"       "March 2012"        
 [993] "December 2014"      "May 4, 2022"        "October 27, 2020"   "September 22, 2017"
 [997] "November 2009"      "July 2003"          "August 2006"        "March 3, 2017"

CodePudding user response:

Something like:

dates <- c("October 28, 2021", "April 7, 2014", "November 2009")

ddd <- function(d){
  if (lengths(strsplit(d, " ")) == 2) {
    d <- paste("15 ", d)
  }
  d <- anytime::anydate(d)
  return(d)
}

lapply(dates, ddd)
#> [[1]]
#> [1] "2021-10-28"
#> 
#> [[2]]
#> [1] "2014-04-07"
#> 
#> [[3]]
#> [1] "2009-11-15"

Created on 2022-11-18 with reprex v2.0.2

The missing values becomes NA, is up to you what to do with them.

Regards, Grzegorz

CodePudding user response:

    library(lubridate); library(stringr); library(dplyr)
    x <- c("", "October 28, 2021", "April 7, 2014", "November 2009")
    
    if_else(str_length(x) == 0, NA_Date_,
           if_else(str_detect(x, ","), mdy(x), my(x,quiet = TRUE)))

[1] NA           "2021-10-28" "2014-04-07"
[4] "2009-11-01"

... a tidyverse approach

  • Related