Home > Mobile >  Change incorrect dates to provided date
Change incorrect dates to provided date

Time:09-28

Based on the data and code below, how can I change every incorrect date to 01/01/year from year column?

Data:

df <- structure(list(Year = c("2022", "2022", "2022", "2022", "2022", 
    "2022", "2005", "2005", "2020", "2021"), `Reference Number` = c("14784", 
    "14784", "14785", "14785", "14786", "14786", "14787", "14787", 
    "14788", "14788"), `Request Date` = c("1/6/2022", "1/6/2022", 
    "11/19/2022", "Happy New Year", "1899-12-31 02:40:00", "Ongoing", "1//12/05", 
    "01/14/205", "1/25/20`22", NA)), row.names = c(NA, -10L
    ), class = c("tbl_df", "tbl", "data.frame"))

df

# A tibble: 10 × 3
   Year  `Reference Number` `Request Date`     
   <chr> <chr>              <chr>              
 1 2022  14784              1/6/2022           
 2 2022  14784              1/6/2022           
 3 2022  14785              11/19/2022         
 4 2022  14785              Happy New Year     
 5 2022  14786              1899-12-31 02:40:00
 6 2022  14786              Ongoing            
 7 2005  14787              1//12/05           
 8 2005  14787              01/14/205          
 9 2020  14788              1/25/20`22         
10 2021  14788              NA      

Desired output:

desired <- structure(list(Year = c("2022", "2022", "2022", "2022", "2022", 
        "2022", "2005", "2005", "2022", "2021"), `Reference Number` = c("14784", 
        "14784", "14785", "14785", "14786", "14786", "14787", "14787", 
        "14788", "14788"), `Request Date` = c("1/6/2022", "1/6/2022", 
        "11/19/2022", "01/01/2022", "01/01/2022", "01/01/2022", "01/01/2005", 
        "01/01/2005", "01/01/2022", "01/01/2021")), row.names = c(NA, -10L
        ), class = c("tbl_df", "tbl", "data.frame"))

desired

# A tibble: 10 × 3
   Year  `Reference Number` `Request Date`
   <chr> <chr>              <chr>         
 1 2022  14784              1/6/2022      
 2 2022  14784              1/6/2022      
 3 2022  14785              11/19/2022    
 4 2022  14785              01/01/2022    
 5 2022  14786              01/01/2022    
 6 2022  14786              01/01/2022    
 7 2005  14787              01/01/2005    
 8 2005  14787              01/01/2005    
 9 2022  14788              01/01/2022    
10 2021  14788              01/01/2021  

Code:

library(tidyverse)

df = df %>% mutate(`Reference Date` = .....`Reference Date`)

CodePudding user response:

To illustrate the difficulty (impossibility?) of addressing your question satisfactorily, consider the following

library(anytime)
df %>%
    # Remove duplicate separators
    mutate(`Request Date` = str_replace_all(`Request Date`, "//", "/")) %>%
    # Substition logic
    mutate(Date = if_else(
        str_detect(`Request Date`, "^. /. /. $"), 
        anydate(`Request Date`), 
        as.Date(sprintf("%s-01-01", Year))))
## A tibble: 10 × 4
#   Year  `Reference Number` `Request Date`      Date      
#   <chr> <chr>              <chr>               <date>    
# 1 2022  14784              1/6/2022            2022-01-06
# 2 2022  14784              1/6/2022            2022-01-06
# 3 2022  14785              11/19/2022          2022-11-19
# 4 2022  14785              Happy New Year      2022-01-01
# 5 2022  14786              1899-12-31 02:40:00 2022-01-01
# 6 2022  14786              Ongoing             2022-01-01
# 7 2005  14787              1/12/05             NA        
# 8 2005  14787              01/14/205           NA        
# 9 2020  14788              1/25/20`22          NA        
#10 2021  14788              NA                  NA      

Here we're using anytime::anydate which tries to guess a date from an arbitrary date-like string.

The problem are ambiguities: Even if we remove duplicate separators, what date does "1/12/05" correspond to? 1 Dec 2005? 12 January 2005? I don't think this can be answered in a way that it addresses all edge cases automatically. Instead you'll need some heuristic (probably realised through some case_when or nested if_else statements) to get the best-possible solution for your specific situation.

  • Related