I read in an excel file which contains a column date of death that contains both date and strings. It looked fine in the excel, but when it is read into R, all of the dates got converted into 5 digits - looking something like this:
#> date of death
#> <chr>
#> 1 44673
#> 2 44674
#> 3 alive
#> 4 not known
#> 5 NA
Is there a way where
- I could read in the excel file in a way that it doesn't get converted to the 5 digits?
- If 1 is not possible, is there a way to convert that column into dates but for the numbers only?
CodePudding user response:
If we want to keep the string as is, then we may convert to Date
class on the numeric elements after reading and then coalesce
with the original column. Note that column can have only a single type, so we may need to return as character
type as there are strings
library(dplyr)
df1 <- df1 %>%
mutate(`date of death` = coalesce(as.character(janitor::excel_numeric_to_date(
as.numeric(`date of death`))), `date of death`))
-output
df1
# A tibble: 6 × 2
`date of death` col2
<chr> <dbl>
1 2022-04-22 5
2 2022-04-23 3
3 alive 9
4 not known 10
5 NA 11
Tried with read.xlsx
(from openxlsx
) and read_excel
(from readxl
). Based on the example created, read_excel
does convert to Date
class when specified the col_types
, but that will also result in NA
for the other string values in the same column. However, read.xlsx
with detectDates
does the conversion
library(openxlsx)
df1 <- read.xlsx(file.choose(), detectDates = TRUE)
df1
date.of.death col2
1 2022-04-22 5
2 2022-04-23 3
3 alive 9
4 not known 10
5 <NA> 11
> sapply(df1, class)
date.of.death col2
"character" "numeric"
To keep the spaces in the column names, we may need to specify the check.names
and the sep.names
df1 <- read.xlsx(file.choose(), detectDates = TRUE,
check.names = FALSE, sep.names = " ")