I have a CSV dataset that contains dates written in one of two forms, either with or without a timestamp, eg: 21/4/1999 09:52:04
or just 16/1/2003
- these are all in the MM/DD/YYYY format regardless of whether there is a timestamp afterwards.
I am importing this data and need to make Boolean queries on the dates. However, importing it as either Date or DateTime renders one or the other formats as NA. For example:
library(readr)
Audit_Review <- read_csv("Audit_Review.txt",
col_types = cols(DigitisationDate = col_date(format = "%d/%m/%Y")))
This will import dates without a timestamp ok, but will give NAs for those with a timestamp. However, if I run the following to import it:
library(readr)
Audit_Review <- read_csv("Audit_Review.txt",
col_types = cols(DigitisationDate = col_date(format = "%d/%m/%Y %H:%M:%S")))
Then the observations without a timestamp are rendered as NAs.
Given that I do not need the timestamps (just the dates will do; I don't need to be any more granular than a specific day) is there a way I can import the data or alter it once imported so they are all in the YYYY-MM-DD format?
CodePudding user response:
Import as it is and then mutate to the same date format.
# A tibble: 2 x 1
time
<chr>
1 21/4/1999 09:52:04
2 16/1/2003
df %>%
mutate(time = time %>%
as.Date(format = "%d/%m/%Y"))
# A tibble: 2 x 1
time
<date>
1 1999-04-21
2 2003-01-16
CodePudding user response:
1) Create test data with both dates and date/times, read it in and then use as.Date.
Lines <- "date\n21/4/1999 09:52:04\n16/1/2003"
DF <- read.csv(text = Lines)
DF$date <- as.Date(DF$date, "%d/%m/%Y")
DF
## date
## 1 1999-04-21
## 2 2003-01-16
2) Alternately define a custom class.
setClass("justDate")
setAs("character", "justDate", function(from) as.Date(from, "%d/%m/%Y"))
read.csv(text = Lines, colClasses = c(date = "justDate"))
## date
## 1 1999-04-21
## 2 2003-01-16