I have a large database with a date column that has date numbers coming from Excel, incomplete dates that are missing the year (but year is in another column), and some cells with missing date. I found out how to change format of the dates, but the problem is how to filter the three types of cells I have in the date
variable (that is date numbers from excel, incomplete dates, and empty cell). I managed to do it by filtering a by a created column (value
) that I DON'T have in the real database.
This is my original database:
This is what I required end result:
What I managed to do was to filter the dataset with the fictitious value
column and convert the date
to the required format. This is what I did:
library(dplyr)
data_a <- read.csv(text = "
year,date,value
2018,43238,1
2017,43267,2
2020,7/25,3
2018,,4
2013,,5
2000,8/23,6
2000,9/21,7")
data_b <- data_a %>%
filter(value %in% c(1,2)) %>%
mutate(data_formatted = as.Date(as.numeric(date), origin = "1899-12-30"))
data_c <- data_a %>%
filter(value %in% c(3, 6, 7)) %>%
mutate(data_formatted = as.Date(paste0(year, "/", date)))
data_d <- data_a %>%
filter(value %in% c(4, 5)) %>%
mutate(data_formatted = NA)
data_final <- rbind(data_b, data_c, data_d)
I need to do the same all at once WITHOUT using the value
column.
CodePudding user response:
You can use do conditional for the scenarios and apply different functions to convert to date.
Code
library(dplyr)
library(stringr)
library(lubridate)
data_a %>%
mutate(
data_formatted = case_when(
!str_detect(date,"/") ~ as.Date(as.numeric(date), origin = "1899-12-30"),
TRUE ~ ymd(paste0(year, "/", date))
)
)
Output
year date value data_formatted
1 2018 43238 1 2018-05-18
2 2017 43267 2 2018-06-16
3 2020 7/25 3 2020-07-25
4 2018 4 <NA>
5 2013 5 <NA>
6 2000 8/23 6 2000-08-23
7 2000 9/21 7 2000-09-21
CodePudding user response:
Please try
data_a2 <- data_a %>% mutate(date2=as.numeric(ifelse(str_detect(date,'\\/'), '',date)),
date2_=as.numeric(as.Date(ifelse(str_detect(date,'\\/'), paste0(year,'/',date),''), format='%Y/%m/%d')),
date_formatted=as.Date(coalesce(date2,date2_), origin = "1970-01-01")) %>%
dplyr::select(-date2,-date2_)