Home > Net >  How to convert character to date with two different types of date formats in R?
How to convert character to date with two different types of date formats in R?

Time:04-13

I have a huge dataset with over 2 million obs and all column's classes are character type. I need to convert one of them to date format dd/mm/yyyy, but the dates are written like this:

dates <- c("2022-04-08", "26/01/2021", "14/07/2021", "2021-12-27")

I've already tried some explanations I found in other posts but none of them seemed to work for me. One groupe of dates always turns into NA.

CodePudding user response:

You can do something like:

format_ymd  <- as.Date(dates, format = "%Y-%m-%d")
format_dmy  <- as.Date(dates, format = "%d/%m/%Y")
as.Date(ifelse(is.na(format_ymd), format_dmy, format_ymd), origin = "1970-01-01")
# [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

CodePudding user response:

Similar to SamR's approach

data.table::fifelse(
  grepl("^\\d{4}", dates),
  as.Date(dates,"%Y-%m-%d"),
  as.Date(dates, "%d/%m/%Y")
)

CodePudding user response:

1) Base R Use as.Date with the format vector shown. No packages are used.

as.Date(dates, format = ifelse(grepl("/", dates), "%d/%m/%Y", "%Y-%m-%d"))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

2) Base R - 2 Another approach is to convert the dd/mm/yyyy to yyyy-mm-dd and then just use as.Date. No packages are used.

as.Date(sub("(..)/(..)/(....)", "\\3-\\2-\\1", dates))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

3) lubridate With lubridate use parse_date_time and then convert that to Date class.

library(lubridate)

as.Date(parse_date_time(dates, c("ymd", "dmy")))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

4) coalesce We can use coalesce in dplyr. It takes the first non-NA it finds.

library(dplyr)

coalesce(as.Date(dates), as.Date(dates, "%d/%m/%Y"))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

CodePudding user response:

With the clock package, you can supply multiple formats to date_parse() and it will try them in order. It is nice for cases like this where the formats are drastically different.

library(clock)

dates <- c("2022-04-08", "26/01/2021", "14/07/2021", "2021-12-27")

# Tries each `format` in order. Stops on first success.
date_parse(
  dates,
  format = c("%Y-%m-%d", "%d/%m/%Y")
)
#> [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

Created on 2022-04-12 by the reprex package (v2.0.1)

  • Related