Home > database >  Convert character column to date with mixed numbers and date in R
Convert character column to date with mixed numbers and date in R

Time:05-06

I have to combine a lot of analyst spreadsheets from excel and it seems like every week they find a new way to infuriate me. Here's my latest problem:

data <- tibble(date_column = c(44673, 44674, "2022-04-25"))

# A tibble: 3 x 1
  date_column
  <chr>      
1 44673      
2 44674      
3 2022-04-25 

I've tried a variety of ways to test if it could be a number and then converting that to date one way and if not converting it another way, like this:

library(tidyverse)
library(lubridate)

data %>% 
  mutate(date_column = case_when(
    !is.na(as.numeric(date_column)) ~ as.Date(date_column, origin = "1899-12-30"),
    is.na(as.numeric(date_column)) ~ parse_date_time(date_column, orders = c("mdY","Ymd T", "Ymd"))))

# which throws a 

Error: Problem with `mutate()` column `date_column`.
i `date_column = case_when(...)`.
x character string is not in a standard unambiguous format
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Problem with `mutate()` column `date_column`.
i `date_column = case_when(...)`.
i NAs introduced by coercion 

I feel like this must be a pretty common problem that has been solved before. Help greatly appreciated.

CodePudding user response:

Try janitor convert-to-date:

library(tidyverse)
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

tribble(~date,
        "44673",
        "44674",
        "2022-04-25") |> 
  mutate(clean_date = convert_to_date(date))
#> # A tibble: 3 × 2
#>   date       clean_date
#>   <chr>      <date>    
#> 1 44673      2022-04-22
#> 2 44674      2022-04-23
#> 3 2022-04-25 2022-04-25

Created on 2022-05-05 by the reprex package (v2.0.1)

CodePudding user response:

A possible solution, based on openxlsx::convertToDate:

library(tidyverse)
library(openxlsx)

data <- tibble(date_column = c(44673, 44674, "2022-04-25"))

coalesce(suppressWarnings(convertToDate(data$date_column)) %>% 
   as.character(), data$date_column)

#> [1] "2022-04-22" "2022-04-23" "2022-04-25"
  • Related