Home > Enterprise >  How to format any column with 'date' in the title as a date
How to format any column with 'date' in the title as a date

Time:08-07

I have imported an excel spreadsheet into R and the data frame has numerous columns with date in the title. I can format a named column as aa date as follows:

df$date <- as.Date(as.numeric(df$date), origin = "1899-12-30")

How would I do this for all columns with 'date' in the title? Here's an example data frame, though it doesn't have anywhere near as many columns as the real thing. Ideally the answer would use dplyr.

df <- structure(list(source = c("Track", "Track", "Track", "Track", 
"Track"), sample_type = c("SQC", "DNA", "PBMC", "PBMC", "PBMC"
), collection_date = c("39646", "39654", "39643", "39644", "40389"
), collection_date2 = c("39646", "39654", "39643", "39644", "40389"
), received_date = c("39651", "39660", "39685", "39685", "40421"
), storage_date = c("39653", "39744", "39685", "39685", "40421"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

CodePudding user response:

We can use across and contains to select all variables that contain the string "date".

library(tidyverse)
df <- structure(list(source = c("Track", "Track", "Track", "Track", 
                                "Track"), sample_type = c("SQC", "DNA", "PBMC", "PBMC", "PBMC"
                                ), collection_date = c("39646", "39654", "39643", "39644", "40389"
                                ), collection_date2 = c("39646", "39654", "39643", "39644", "40389"
                                ), received_date = c("39651", "39660", "39685", "39685", "40421"
                                ), storage_date = c("39653", "39744", "39685", "39685", "40421"
                                )), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
                                ))

df <- df %>%
  mutate(across(contains("date"), ~as.Date(as.numeric(.x), origin = "1899-12-30")))

head(df)
#> # A tibble: 5 × 6
#>   source sample_type collection_date collection_date2 received_date storage_date
#>   <chr>  <chr>       <date>          <date>           <date>        <date>      
#> 1 Track  SQC         2008-07-17      2008-07-17       2008-07-22    2008-07-24  
#> 2 Track  DNA         2008-07-25      2008-07-25       2008-07-31    2008-10-23  
#> 3 Track  PBMC        2008-07-14      2008-07-14       2008-08-25    2008-08-25  
#> 4 Track  PBMC        2008-07-15      2008-07-15       2008-08-25    2008-08-25  
#> 5 Track  PBMC        2010-07-30      2010-07-30       2010-08-31    2010-08-31

CodePudding user response:

Here is an alternative approach. The janitor package has a own function for this excel_numeric_to_date:

library(dplyr)
library(janitor)
df %>% 
  mutate(across(contains("date"), ~excel_numeric_to_date(as.numeric(.))))

  source sample_type collection_date collection_date2 received_date storage_date
  <chr>  <chr>       <date>          <date>           <date>        <date>      
1 Track  SQC         2008-07-17      2008-07-17       2008-07-22    2008-07-24  
2 Track  DNA         2008-07-25      2008-07-25       2008-07-31    2008-10-23  
3 Track  PBMC        2008-07-14      2008-07-14       2008-08-25    2008-08-25  
4 Track  PBMC        2008-07-15      2008-07-15       2008-08-25    2008-08-25  
5 Track  PBMC        2010-07-30      2010-07-30       2010-08-31    2010-08-31  

CodePudding user response:

I would use a loop for this:

for (col in grep('date', names(df))) {
  df[[col]] <- as.Date(as.numeric(df[[col]]), origin="1899-12-30")
}
  • Related