Home > Net >  A new data conversion issue in R
A new data conversion issue in R

Time:11-12

I have a data column with two kind of date-format values, the numeric one (es. "38169") and the string one (es. "01/03/2004", always in format "%d/%m/%Y"). I cannot transform them in the same date-format and reconvert them in the standard date-format "%Y-%m-%d".

Forom the example below, I would like to transform the variable date_first in the variable date_clean

Additional information:

The database is imported from Excel. The etherogeneous date format is the result of a string alteration happening in both excel and R

data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
                  date_first=c(NA,"38169","37926","01/03/2004"))
                  
desidered_data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
                  date_first=c(NA,"38169","37926","01/03/2004"),
                  date_clean=c(NA,2004-07-01,2003-11-01,2004-03-01))

> desidered_data
                   date_all date_first date_clean
1                      <NA>       <NA>       <NA>
2                     38169      38169 2004-07-01
3                     37926      37926 2003-11-01
4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01


CodePudding user response:

A base R option -

change_mix_date <- function(x) {
  #empty date vector to store the results
  new_date <- as.Date(NA)
  #Check for values that have only numbers in them (excel dates)
  inds <- grepl('^\\d $', x)
  #Change excel date to date class
  new_date[inds] <- as.Date(as.numeric(x[inds]), origin = '1899-12-30')
  #Change remaining ones to date class using as.Date
  new_date[!inds] <- as.Date(x[!inds], '%d/%m/%Y')
  #Return output. 
  new_date
}

data$date_clean <- change_mix_date(data$date_first)

#                   date_all date_first date_clean
#1                      <NA>       <NA>       <NA>
#2                     38169      38169 2004-07-01
#3                     37926      37926 2003-11-01
#4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01

CodePudding user response:

You can use case_when for this:

data %>%
  mutate(date_clean = case_when(grepl("\\d{5}", date_first) ~ as.Date(as.numeric(date_first), origin = "1899-12-30"),
                                TRUE ~ as.Date(date_first, format = "%d/%m/%Y")))
  • Related