I'm trying to convert a character column with dates to date format. However, the dates are in an ambiguous format. Some entries are of the format %d.%m.%Y
(e.g., "03.02.2021"), while others are %d %b %Y
(e.g., "3 Feb 2021").
I've tried as.Date(tryFormats=c("%d %b %Y", "%d.%m.%Y"))
, but realized that tryFormats
is only flexible for the first entry, so that the entries of type %d %b %Y
are correctly identified but those of %d.%m.%Y
become NA
s, or vice versa. I've also tried the anytime
package, but that produced NA
s in a similar fashion.
I've made sure that the column doesn't contain any NA
s or empty strings, and I don't receive any error message.
CodePudding user response:
Try the parsedate package :
df <-read.table(header=TRUE,text=
"d
03.02.2021
'3 Feb 2021'
13/3/2021
13-3-2020")
df %>% mutate(date=parsedate::parse_date(d))
## d date
##1 03.02.2021 2021-02-03
##2 3 Feb 2021 2021-02-03
##3 13/3/2021 2021-03-13
##4 13-3-2020 2020-03-13
CodePudding user response:
Similar (but expanded) to Roland's suggestion, my answer here (in the (2) section) suggests a way to deal with multiple candidate formats.
## sample data
x <- c("03.02.2021", "3 Feb 2021")
formats <- c("%d.%m.%Y", "%d %b %Y")
dates <- as.Date(rep(NA, length(x)))
for (fmt in formats) {
nas <- is.na(dates)
dates[nas] <- as.Date(x[nas], format=fmt)
}
dates
# [1] "2021-02-03" "2021-02-03"
It is better to have the most-frequent format first in the formats
vector. One could add a quick-escape to the loop if there are many formats, such as
for (fmt in formats) {
nas <- is.na(dates)
if (!any(nas)) break
dates[nas] <- as.Date(x[nas], format=fmt)
}
but I suspect that it really won't be very beneficial unless both formats
and x
are rather large (I have no sizes in mind to quantify "large").
CodePudding user response:
did you try lubridate
?
df <-read.table(header=TRUE,text=
"d
03.02.2021
'3 Feb 2021'
13/3/2021
13-3-2020")
dmy(df$d)
[1] "2021-02-03" "2021-02-03" "2021-03-13" "2020-03-13"
CodePudding user response:
Using anydate
library(anytime)
addFormats(c("%d/m/%Y", '%d-%m-%Y') )
anydate(df$d)
[1] "2021-02-03" "2021-02-03" "2021-03-13" "2020-03-13"