I have a data column with two kind of date-format values, the numeric one (es. "38169") and the string one (es. "01/03/2004", always in format "%d/%m/%Y"). I cannot transform them in the same date-format and reconvert them in the standard date-format "%Y-%m-%d".
Forom the example below, I would like to transform the variable date_first in the variable date_clean
Additional information:
The database is imported from Excel. The etherogeneous date format is the result of a string alteration happening in both excel and R
data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
date_first=c(NA,"38169","37926","01/03/2004"))
desidered_data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
date_first=c(NA,"38169","37926","01/03/2004"),
date_clean=c(NA,2004-07-01,2003-11-01,2004-03-01))
> desidered_data
date_all date_first date_clean
1 <NA> <NA> <NA>
2 38169 38169 2004-07-01
3 37926 37926 2003-11-01
4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01
CodePudding user response:
A base R option -
change_mix_date <- function(x) {
#empty date vector to store the results
new_date <- as.Date(NA)
#Check for values that have only numbers in them (excel dates)
inds <- grepl('^\\d $', x)
#Change excel date to date class
new_date[inds] <- as.Date(as.numeric(x[inds]), origin = '1899-12-30')
#Change remaining ones to date class using as.Date
new_date[!inds] <- as.Date(x[!inds], '%d/%m/%Y')
#Return output.
new_date
}
data$date_clean <- change_mix_date(data$date_first)
# date_all date_first date_clean
#1 <NA> <NA> <NA>
#2 38169 38169 2004-07-01
#3 37926 37926 2003-11-01
#4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01
CodePudding user response:
You can use case_when
for this:
data %>%
mutate(date_clean = case_when(grepl("\\d{5}", date_first) ~ as.Date(as.numeric(date_first), origin = "1899-12-30"),
TRUE ~ as.Date(date_first, format = "%d/%m/%Y")))