Home > OS >  How to filter date numbers, incomplete dates, and NAs from database and convert to uniform date clas
How to filter date numbers, incomplete dates, and NAs from database and convert to uniform date clas

Time:01-14

I have a large database with a date column that has date numbers coming from Excel, incomplete dates that are missing the year (but year is in another column), and some cells with missing date. I found out how to change format of the dates, but the problem is how to filter the three types of cells I have in the date variable (that is date numbers from excel, incomplete dates, and empty cell). I managed to do it by filtering a by a created column (value) that I DON'T have in the real database.

This is my original database:

enter image description here

This is what I required end result:

enter image description here

What I managed to do was to filter the dataset with the fictitious value column and convert the date to the required format. This is what I did:

library(dplyr)

data_a <- read.csv(text = "
year,date,value
2018,43238,1
2017,43267,2
2020,7/25,3
2018,,4
2013,,5
2000,8/23,6
2000,9/21,7")

data_b <- data_a %>% 
  filter(value %in% c(1,2)) %>%
  mutate(data_formatted = as.Date(as.numeric(date), origin = "1899-12-30"))

data_c <- data_a %>%
  filter(value %in% c(3, 6, 7)) %>%
  mutate(data_formatted = as.Date(paste0(year, "/", date)))

data_d <- data_a %>% 
  filter(value %in% c(4, 5)) %>%
  mutate(data_formatted = NA)

data_final <- rbind(data_b, data_c, data_d)  

I need to do the same all at once WITHOUT using the value column.

CodePudding user response:

You can use do conditional for the scenarios and apply different functions to convert to date.

Code

library(dplyr)
library(stringr)
library(lubridate)

data_a %>% 
  mutate(
    data_formatted = case_when(
      !str_detect(date,"/") ~ as.Date(as.numeric(date), origin = "1899-12-30"),
      TRUE ~ ymd(paste0(year, "/", date))
    ) 
  )

Output

  year  date value data_formatted
1 2018 43238     1     2018-05-18
2 2017 43267     2     2018-06-16
3 2020  7/25     3     2020-07-25
4 2018           4           <NA>
5 2013           5           <NA>
6 2000  8/23     6     2000-08-23
7 2000  9/21     7     2000-09-21

CodePudding user response:

Please try

data_a2 <- data_a %>% mutate(date2=as.numeric(ifelse(str_detect(date,'\\/'), '',date)), 
           date2_=as.numeric(as.Date(ifelse(str_detect(date,'\\/'), paste0(year,'/',date),''), format='%Y/%m/%d')),
           date_formatted=as.Date(coalesce(date2,date2_), origin = "1970-01-01")) %>% 
           dplyr::select(-date2,-date2_)
  • Related