Home > Mobile >  Formatting Dates in 4 Different Datasets
Formatting Dates in 4 Different Datasets

Time:12-08

First post on here so please bare with me I apologise in advance for any errors but would really appreciate some help.

I have 4 different Datasets I am trying to plot on to 2 double axis line charts, for this to happen the dates need to be in the same format (see attached images of data sets: Dataset 1, Dataset 2, Dataset 3, Dataset 4)

I would like the universal format to be "31-01-2020".

For Dataset 1 2020 Jan = 31-01-2020, For Dataset 2 this seems pretty simple they just need reversing in order so 2020-01-31 = 31-01-2020, For Dataset 3 Q1 2019 = 31-03-2019 & Q2 2019 = 30-06-2019 etc., For Dataset 4 2020 JAN = 31-01-2020 & 2020 FEB = 29-02-2020 etc.

Is there anyway I can apply the format across all the data sets? Any help would be much appreciate I haven't supplied any code as I don't know where to start with this problem. I have the lubridate package installed.

CodePudding user response:

library(lubridate)

d1 <- c("January 2020", "February 2020")
ceiling_date(my(d1), 'month') %m-% days(1)
# [1] "2020-01-31" "2020-02-29"

d2 <- c("2020-01-31", "2020-02-02")
ceiling_date(ymd(d2), 'month') %m-% days(1)
# [1] "2020-01-31" "2020-02-29"

d3 <- c("Q1 2019", "Q3 2020")
ceiling_date(yq(paste(substr(d3, 4, 7), substr(d3, 1, 2))), 'month') %m-% days(1) # as qy() is not supported
# [1] "2019-01-31" "2020-07-31"

d4 <- c("2020 JAN", "2020 FEB")
ceiling_date(ym(d4), 'month') %m-% days(1)
# [1] "2020-01-31" "2020-02-29"

All above can actually combined IF you are pretty confident about the to be expected formats.

as.Date(ceiling_date(parse_date_time(c(d1, d2, d3, d4), c("%m %y", "%y-%m-%d", "%q %y", "%y %m")), "month") %m-% days(1))

# [1] "2019-12-31" "2020-01-31" "2020-01-31" "2020-02-29" "2018-12-31" "2020-02-29" "2019-12-31" "2020-01-31"

CodePudding user response:

Here is an example how you could do it en-bloc:

  1. simulate your 4 dataframes with the different dates
  2. bind them with bind_rows
  3. define a column Group in long format for later plotting
  4. use case_when with str_detect and month patterns pattern1 and pattern2
library(tidyverse)

# example your dataframes
df1 <- tibble(Date = "January 2020", 
              x = 1)
df2 <- tibble(Date = "2020-01-31", 
              y = 2)
df3 <- tibble(Date = "Q1 2019", 
              z = 3)
df4 <- tibble(Date = "2020 JAN", 
              a = 4)

library(zoo) # for Quartal format
library(lubridate)

pattern1 <- c("January|February|March|April|May|June|July|August|September|October|November|December")
pattern2 <- toupper(c("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"))

# the code:

bind_rows(df1, df2, df3, df4) %>% 
  mutate(across(-Date, ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>% 
  unite(Group, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(value = coalesce(x, y,z,a), .keep="unused") %>% 
  mutate(Date = case_when(str_detect(Date, "Q") ~ as.Date(as.yearqtr(Date, format = "Q%q %Y")),
                          str_detect(Date, pattern1) ~ myd(paste0(Date, " 01")),
                          str_detect(Date, pattern2) ~ ymd(paste0(Date, " 01")),
                          TRUE ~ ymd(Date)
                          
  ))
# A tibble: 4 x 3
  Date       Group value
  <date>     <chr> <dbl>
1 2020-01-01 x         1
2 2020-01-31 y         2
3 2019-01-01 z         3
4 2020-01-01 a         4
  •  Tags:  
  • r
  • Related