I have a huge dataset with over 2 million obs and all column's classes are character type. I need to convert one of them to date format dd/mm/yyyy, but the dates are written like this:
dates <- c("2022-04-08", "26/01/2021", "14/07/2021", "2021-12-27")
I've already tried some explanations I found in other posts but none of them seemed to work for me. One groupe of dates always turns into NA.
CodePudding user response:
You can do something like:
format_ymd <- as.Date(dates, format = "%Y-%m-%d")
format_dmy <- as.Date(dates, format = "%d/%m/%Y")
as.Date(ifelse(is.na(format_ymd), format_dmy, format_ymd), origin = "1970-01-01")
# [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"
CodePudding user response:
Similar to SamR's approach
data.table::fifelse(
grepl("^\\d{4}", dates),
as.Date(dates,"%Y-%m-%d"),
as.Date(dates, "%d/%m/%Y")
)
CodePudding user response:
1) Base R Use as.Date with the format vector shown. No packages are used.
as.Date(dates, format = ifelse(grepl("/", dates), "%d/%m/%Y", "%Y-%m-%d"))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"
2) Base R - 2 Another approach is to convert the dd/mm/yyyy to yyyy-mm-dd and then just use as.Date. No packages are used.
as.Date(sub("(..)/(..)/(....)", "\\3-\\2-\\1", dates))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"
3) lubridate With lubridate use parse_date_time and then convert that to Date class.
library(lubridate)
as.Date(parse_date_time(dates, c("ymd", "dmy")))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"
4) coalesce We can use coalesce in dplyr. It takes the first non-NA it finds.
library(dplyr)
coalesce(as.Date(dates), as.Date(dates, "%d/%m/%Y"))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"
CodePudding user response:
With the clock package, you can supply multiple format
s to date_parse()
and it will try them in order. It is nice for cases like this where the formats are drastically different.
library(clock)
dates <- c("2022-04-08", "26/01/2021", "14/07/2021", "2021-12-27")
# Tries each `format` in order. Stops on first success.
date_parse(
dates,
format = c("%Y-%m-%d", "%d/%m/%Y")
)
#> [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"
Created on 2022-04-12 by the reprex package (v2.0.1)