I have a messy dataframe with thousand variables and want to automate conversion of specific columns to dates without having to specify which columns explicitely. All columns to convert have "Date" in their name. Most are mdy
but they also can be dmy
. Some contain errors, or malformatted dates but in a very very minor proportion <0.1%.
I tried:
df %>% select(contains("Date")) %>% as_Date() #Does not work
df %>% select(contains("Date")) %>% mdy() #selecting only the columns with dates, does not work
df %>% select(contains("Date")) %>% parse_date_time( c("mdy", "dmy")) #also does not work
I think I dont get something fundamental.
CodePudding user response:
Here's a solution based on lubridate
:
Toy data:
df <- data.frame(Date1 = c("01-Mar-2015", "31-01-2012", "15/01/1999"),
Var_Date = c("01-02-2018", "01/08/2016", "17-09-2007"),
More_Dates = c("27/11/2009", "22-Jan-2013", "20-Nov-1987"))
# define formats:
formats <- c("%d-%m-%Y", "%d/%m/%Y", "%d-%b-%Y")
A dyplr
solution:
library(dplyr)
library(lubridate)
df %>%
mutate(across(contains("Date"),
~ parse_date_time(., orders = formats))) %>%
mutate(across(contains("Date"),
~ format(., "%d-%m-%Y")))
Date1 Var_Date More_Dates
1 01-03-2015 01-02-2018 27-11-2009
2 31-01-2012 01-08-2016 22-01-2013
3 15-01-1999 17-09-2007 20-11-1987
A base R
solution:
library(lubridate)
df[,grepl("Date", names(df))] <- apply(df[,grepl("Date", names(df))], 2,
function(x) format(parse_date_time(x, orders = my_formats), "%d-%m-%Y"))
CodePudding user response:
We could use parse_date
from parsedate
library(parsedate)
library(dplyr)
df %>%
mutate(across(everything(), parse_date))
Date1 Var_Date More_Dates
1 2015-03-01 2018-01-02 2009-11-27
2 2012-01-31 2016-01-08 2013-01-22
3 1999-01-15 2007-09-17 1987-11-20
data
df <- structure(list(Date1 = c("01-Mar-2015", "31-01-2012", "15/01/1999"
), Var_Date = c("01-02-2018", "01/08/2016", "17-09-2007"), More_Dates = c("27/11/2009",
"22-Jan-2013", "20-Nov-1987")),
class = "data.frame", row.names = c(NA,
-3L))