Trying to convert multiple character variables to datetimes. Simplified example:
#create df/tibble with two "datetime" columns still as character
df=tibble(date1=c("2013-11-26 00:10:12.536","2013-11-26 23:04:32.512","2014-02-19 23:34:44.459"),
date2=c("2013-11-26 07:06:40.720","2013-11-27 07:09:50.552","2014-02-20 08:00:03.975"))
datetimeFormat="%Y-%m-%d %H:%M:%OS"
#OK: converting a single var using $
df_temp=df
df_temp$date1=as_datetime(df_temp$date1,format = datetimeFormat)
#not OK: converting a single var using indexing (presumably because df_temp[,"date1"] is still a tibble)
df_temp=df
df_temp[,"date1"]=as_datetime(df_temp[,"date1"],format = datetimeFormat)
#also not OK: converting multiple variables in one go
datetimeVars=c("date1","date2")
df_temp=df
df_temp[,datetimeVars]=as_datetime(df_temp[,datetimeVars],format = datetimeFormat)
How can I convert multiple character columns to datetime at once, specifically using a variable containing the variable names (like datetimeVars above)?
Some context:
- my source csv files are not uniform and contain a variable - often large - number of datetimes-to-be (in custom format as example above). I can determine which variables should become datetime from their names
- read_csv does not consistently recognize the relevant variables as datetime
- read_csv does not appear to allow setting variable type for multiple variables at once, so can't do something like:
df=read_csv("myFile.csv",col_types=cols(datetimeVars=col_datetime(format=datetimeFormat)))
I also can't specify/hardcode variable type for each relevant variable likecols(date1=col_datetime(),date2=col_datettime, date3=...)
because the number of datetime variables isn't known ahead of time
So currently stuck at both the levels of import (read_csv) and conversion (as_datetime). Suggestions welcome.
CodePudding user response:
Tackling the conversion part since the import highly depends on the files and the included formats.
Convert to date
class using as.POSIXct
(keep in mind that the date
class is always shown in the format printed but keeps more info in the class object - see Reading below).
library(dplyr)
datetimeVars <- c("date1", "date2")
df_date <- df %>%
summarise(across(all_of(datetimeVars), as.POSIXct))
df_date
# A tibble: 3 × 2
date1 date2
<dttm> <dttm>
1 2013-11-26 00:10:12 2013-11-26 07:06:40
2 2013-11-26 23:04:32 2013-11-27 07:09:50
3 2014-02-19 23:34:44 2014-02-20 08:00:03
or with column names matching a starting pattern (starts_with()
)
datetimeVars <- c("date")
df_date <- df %>%
summarise(across(starts_with(datetimeVars), as.POSIXct))
df_date
# A tibble: 3 × 2
date1 date2
<dttm> <dttm>
1 2013-11-26 00:10:12 2013-11-26 07:06:40
2 2013-11-26 23:04:32 2013-11-27 07:09:50
3 2014-02-19 23:34:44 2014-02-20 08:00:03
Reading your desired format from date
class with strftime
df_date %>%
summarise(across(starts_with("date"), strftime, format="%Y-%m-%d %H:%M:%OS3"))
# A tibble: 3 × 2
date1 date2
<chr> <chr>
1 2013-11-26 00:10:12.536 2013-11-26 07:06:40.720
2 2013-11-26 23:04:32.512 2013-11-27 07:09:50.552
3 2014-02-19 23:34:44.459 2014-02-20 08:00:03.974