I have dates of this mixed type, I would like to standardize all of these to the format yyyy-mm-dd
and for dates with just the month I would like to use the "average" of the month (e.g. the 15th). For example, October 28, 2021
should become 2021/10/28
and November 2009
should become 2009/11/15
. Also what should I do with the missing dates like with position 979?
Here is a reproducible example-
[909] "October 28, 2021" "April 7, 2014" "November 2009" "January 17, 2018"
[913] "January 2023" "February 2012" "December 2022" "July 1999"
[917] "November 2006" "June 2011" "July 2014" "January 2015"
[921] "July 1, 2020" "October 15, 2018" "September 27, 2019" "February 14, 2022"
[925] "June 28, 2021" "June 2016" "March 2013" "October 2014"
[929] "January 2023" "July 6, 2022" "January 2014" "March 22, 2001"
[933] "October 10, 2019" "May 1, 2008" "December 2008" "November 2023"
[937] "August 2005" "May 1, 2022" "January 8, 2014" "July 2011"
[941] "August 15, 2022" "May 2004" "November 2012" "October 1999"
[945] "March 2010" "May 2014" "October 2006" "March 1, 2017"
[949] "June 25, 2019" "October 2004" "September 2016" "June 10, 2019"
[953] "April 4, 2017" "" "August 30, 2018" "July 1, 2017"
[957] "November 14, 2019" "November 2006" "September 1, 2022" "April 2007"
[961] "July 12, 2013" "August 14, 2015" "March 2013" "January 2014"
[965] "March 2013" "June 27, 2019" "April 2008" "July 2007"
[969] "February 2007" "May 2013" "April 2011" "December 2007"
[973] "July 2007" "December 2008" "May 5, 2017" "December 2007"
[977] "February 27, 2007" "February 13, 2018" "" "August 2014"
[981] "September 9, 2019" "October 2010" "January 30, 2013" "January 2010"
[985] "September 15, 2015" "March 2006" "April 2016" "March 2014"
[989] "April 2010" "February 20, 2017" "October 2015" "March 2012"
[993] "December 2014" "May 4, 2022" "October 27, 2020" "September 22, 2017"
[997] "November 2009" "July 2003" "August 2006" "March 3, 2017"
CodePudding user response:
Something like:
dates <- c("October 28, 2021", "April 7, 2014", "November 2009")
ddd <- function(d){
if (lengths(strsplit(d, " ")) == 2) {
d <- paste("15 ", d)
}
d <- anytime::anydate(d)
return(d)
}
lapply(dates, ddd)
#> [[1]]
#> [1] "2021-10-28"
#>
#> [[2]]
#> [1] "2014-04-07"
#>
#> [[3]]
#> [1] "2009-11-15"
Created on 2022-11-18 with reprex v2.0.2
The missing values becomes NA, is up to you what to do with them.
Regards, Grzegorz
CodePudding user response:
library(lubridate); library(stringr); library(dplyr)
x <- c("", "October 28, 2021", "April 7, 2014", "November 2009")
if_else(str_length(x) == 0, NA_Date_,
if_else(str_detect(x, ","), mdy(x), my(x,quiet = TRUE)))
[1] NA "2021-10-28" "2014-04-07"
[4] "2009-11-01"
... a tidyverse approach