Home > Software engineering >  Converting variable with 5 digit numbers and dates into date values
Converting variable with 5 digit numbers and dates into date values

Time:10-16

I have the following data, which contains some date values as 5 digit character values. When I try to convert to date, the correct date changes to NA value.

dt <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2),
  Registrationdate=c('2019-01-09','2019-01-09','2019-01-09','2019-01-09','2019-01-09',
'2019-01-09',"44105","44105","44105","44105","44105"))

Expected value

   id Registrationdate
1   1       2019-01-09
2   1       2019-01-09
3   1       2019-01-09
4   1       2019-01-09
5   1       2019-01-09
6   1       2019-01-09
7   2       2020-10-01
8   2       2020-10-01
9   2       2020-10-01
10  2       2020-10-01
11  2       2020-10-01

I tried using

library(openxlsx)
dt$Registrationdate <- convertToDate(dt$Registrationdate, origin = "1900-01-01")

But I got

1   1             <NA>
2   1             <NA>
3   1             <NA>
4   1             <NA>
5   1             <NA>
6   1             <NA>
7   2       2020-10-01
8   2       2020-10-01
9   2       2020-10-01
10  2       2020-10-01
11  2       2020-10-01

CodePudding user response:

Here's one approach using a mix of dplyr and base R:

library(dplyr, warn = FALSE)


 dt |> 
  mutate(Registrationdate = if_else(grepl("-", Registrationdate), 
  as.Date(Registrationdate),
  openxlsx::convertToDate(Registrationdate, origin = "1900-01-01")))
#> Warning in openxlsx::convertToDate(Registrationdate, origin = "1900-01-01"): NAs
#> introduced by coercion
#>    id Registrationdate
#> 1   1       2019-01-09
#> 2   1       2019-01-09
#> 3   1       2019-01-09
#> 4   1       2019-01-09
#> 5   1       2019-01-09
#> 6   1       2019-01-09
#> 7   2       2020-10-01
#> 8   2       2020-10-01
#> 9   2       2020-10-01
#> 10  2       2020-10-01
#> 11  2       2020-10-01

Created on 2022-10-15 with reprex v2.0.2

CodePudding user response:

df1 <- dt %>% 
  mutate(`Registrationdate` = coalesce(as.character(janitor::excel_numeric_to_date(
    as.numeric(`Registrationdate`))), `Registrationdate`))

 id Registrationdate
1   1       2019-01-09
2   1       2019-01-09
3   1       2019-01-09
4   1       2019-01-09
5   1       2019-01-09
6   1       2019-01-09
7   2       2020-10-01
8   2       2020-10-01
9   2       2020-10-01
10  2       2020-10-01
11  2       2020-10-01
  • Related