I have imported an excel spreadsheet into R and the data frame has numerous columns with date in the title. I can format a named column as aa date as follows:
df$date <- as.Date(as.numeric(df$date), origin = "1899-12-30")
How would I do this for all columns with 'date' in the title? Here's an example data frame, though it doesn't have anywhere near as many columns as the real thing. Ideally the answer would use dplyr.
df <- structure(list(source = c("Track", "Track", "Track", "Track",
"Track"), sample_type = c("SQC", "DNA", "PBMC", "PBMC", "PBMC"
), collection_date = c("39646", "39654", "39643", "39644", "40389"
), collection_date2 = c("39646", "39654", "39643", "39644", "40389"
), received_date = c("39651", "39660", "39685", "39685", "40421"
), storage_date = c("39653", "39744", "39685", "39685", "40421"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
CodePudding user response:
We can use across
and contains
to select all variables that contain the string "date".
library(tidyverse)
df <- structure(list(source = c("Track", "Track", "Track", "Track",
"Track"), sample_type = c("SQC", "DNA", "PBMC", "PBMC", "PBMC"
), collection_date = c("39646", "39654", "39643", "39644", "40389"
), collection_date2 = c("39646", "39654", "39643", "39644", "40389"
), received_date = c("39651", "39660", "39685", "39685", "40421"
), storage_date = c("39653", "39744", "39685", "39685", "40421"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
df <- df %>%
mutate(across(contains("date"), ~as.Date(as.numeric(.x), origin = "1899-12-30")))
head(df)
#> # A tibble: 5 × 6
#> source sample_type collection_date collection_date2 received_date storage_date
#> <chr> <chr> <date> <date> <date> <date>
#> 1 Track SQC 2008-07-17 2008-07-17 2008-07-22 2008-07-24
#> 2 Track DNA 2008-07-25 2008-07-25 2008-07-31 2008-10-23
#> 3 Track PBMC 2008-07-14 2008-07-14 2008-08-25 2008-08-25
#> 4 Track PBMC 2008-07-15 2008-07-15 2008-08-25 2008-08-25
#> 5 Track PBMC 2010-07-30 2010-07-30 2010-08-31 2010-08-31
CodePudding user response:
Here is an alternative approach. The janitor
package has a own function for this excel_numeric_to_date
:
library(dplyr)
library(janitor)
df %>%
mutate(across(contains("date"), ~excel_numeric_to_date(as.numeric(.))))
source sample_type collection_date collection_date2 received_date storage_date
<chr> <chr> <date> <date> <date> <date>
1 Track SQC 2008-07-17 2008-07-17 2008-07-22 2008-07-24
2 Track DNA 2008-07-25 2008-07-25 2008-07-31 2008-10-23
3 Track PBMC 2008-07-14 2008-07-14 2008-08-25 2008-08-25
4 Track PBMC 2008-07-15 2008-07-15 2008-08-25 2008-08-25
5 Track PBMC 2010-07-30 2010-07-30 2010-08-31 2010-08-31
CodePudding user response:
I would use a loop for this:
for (col in grep('date', names(df))) {
df[[col]] <- as.Date(as.numeric(df[[col]]), origin="1899-12-30")
}