Home > OS >  Search and mass convert character columns to date in R with dplyr without explicite specification
Search and mass convert character columns to date in R with dplyr without explicite specification

Time:10-24

I have a messy dataframe with thousand variables and want to automate conversion of specific columns to dates without having to specify which columns explicitely. All columns to convert have "Date" in their name. Most are mdy but they also can be dmy. Some contain errors, or malformatted dates but in a very very minor proportion <0.1%.

I tried:

df %>% select(contains("Date")) %>% as_Date() #Does not work
df %>%  select(contains("Date"))  %>% mdy() #selecting only the columns with dates, does not work
df %>% select(contains("Date")) %>% parse_date_time( c("mdy", "dmy")) #also does not work

I think I dont get something fundamental.

CodePudding user response:

Here's a solution based on lubridate:

Toy data:

df <- data.frame(Date1 = c("01-Mar-2015", "31-01-2012", "15/01/1999"), 
                 Var_Date = c("01-02-2018", "01/08/2016", "17-09-2007"), 
                 More_Dates = c("27/11/2009", "22-Jan-2013", "20-Nov-1987"))

# define formats:
formats <- c("%d-%m-%Y", "%d/%m/%Y", "%d-%b-%Y")

A dyplrsolution:

library(dplyr)
library(lubridate)
df %>% 
  mutate(across(contains("Date"), 
                ~ parse_date_time(., orders = formats))) %>%
  mutate(across(contains("Date"),
                ~ format(., "%d-%m-%Y")))
       Date1   Var_Date More_Dates
1 01-03-2015 01-02-2018 27-11-2009
2 31-01-2012 01-08-2016 22-01-2013
3 15-01-1999 17-09-2007 20-11-1987

A base Rsolution:

library(lubridate)
df[,grepl("Date", names(df))] <- apply(df[,grepl("Date", names(df))], 2, 
                  function(x) format(parse_date_time(x, orders = my_formats), "%d-%m-%Y"))

CodePudding user response:

We could use parse_date from parsedate

library(parsedate)
library(dplyr)
df %>%
    mutate(across(everything(), parse_date))
       Date1   Var_Date More_Dates
1 2015-03-01 2018-01-02 2009-11-27
2 2012-01-31 2016-01-08 2013-01-22
3 1999-01-15 2007-09-17 1987-11-20

data

df <- structure(list(Date1 = c("01-Mar-2015", "31-01-2012", "15/01/1999"
), Var_Date = c("01-02-2018", "01/08/2016", "17-09-2007"), More_Dates = c("27/11/2009", 
"22-Jan-2013", "20-Nov-1987")),
 class = "data.frame", row.names = c(NA, 
-3L))
  • Related