I'm trying to format some dates in a dataset in R. The dates are integer values. For example, the some of the dates in the dataset are 10571, 4786, & 82692, which translates to January 5, 1971; April 7, 1986; & August 26, 1992. How can I change the integer values into dates of the format "%Y-%m-%d" (1971-10-05, 1986-04-07, 1992-08-26) in R?
CodePudding user response:
OK, you'll have to account for the possibility of different lengths of your integers and what that translates to for dates. Assuming all years are two digits at the end, and, as you said, within 20th century, you could then have a day or month with one or two digits. If each one has one digit, we prepend a "0" to put in standard format. If there are five total digits, one value is in single digits. As you said, the date should assume to be the single digit, so add a "0" to the beginning. However, there is the case of "10." Assuming month doesn't have a 0 to start, that should represent October, so add the 0 to month.
Throughout, the strategy is to chop the integer into date, month, year chunks and then prepend the appropriate digits. Then recombine into a string and convert to date.
# case of 4 digits
case_when(
nchar(date_integer) == 4 ~
substring(date_integer, c(1,2,3), c(1,2,4)) %>%
paste0(c(0,0,19),.) %>%
paste(., collapse = "") %>%
as.Date(., format = "%m%d%Y", origin = "1970-01-01"),
# 5 digits
nchar(date_integer) == 5 ~
# This accounts for October as a special case, so prepend 0 to dat
case_when (
grepl("0", substring(date_integer, 1,2)) ~
substring(date_integer, c(1,3,4), c(2,3,5)) %>%
paste0(c("",0,19),.) %>%
paste(., collapse = "") %>%
as.Date(., format = "%m%d%Y", origin = "1970-01-01"),
# othewise add 0 to month
TRUE ~
substring(date_integer, c(1,2,4), c(1,3,5)) %>%
paste0(c(0,"",19),.) %>%
paste(., collapse = "") %>%
as.Date(., format = "%m%d%Y", origin = "1970-01-01")
) ,
nchar(date_integer) == 6 ~
substring(date_integer, c(1,3,5), c(2,4,6)) %>%
paste0(c("","",19),.) %>%
paste(., collapse = "") %>%
as.Date(., format = "%m%d%Y", origin = "1970-01-01"),
TRUE ~ as.Date(NA)
)
Tested with all above variations of digit length and produced the correct dates.