First post on here so please bare with me I apologise in advance for any errors but would really appreciate some help.
I have 4 different Datasets I am trying to plot on to 2 double axis line charts, for this to happen the dates need to be in the same format (see attached images of data sets: Dataset 1, Dataset 2, Dataset 3, Dataset 4)
I would like the universal format to be "31-01-2020".
For Dataset 1 2020 Jan = 31-01-2020, For Dataset 2 this seems pretty simple they just need reversing in order so 2020-01-31 = 31-01-2020, For Dataset 3 Q1 2019 = 31-03-2019 & Q2 2019 = 30-06-2019 etc., For Dataset 4 2020 JAN = 31-01-2020 & 2020 FEB = 29-02-2020 etc.
Is there anyway I can apply the format across all the data sets? Any help would be much appreciate I haven't supplied any code as I don't know where to start with this problem. I have the lubridate package installed.
CodePudding user response:
library(lubridate)
d1 <- c("January 2020", "February 2020")
ceiling_date(my(d1), 'month') %m-% days(1)
# [1] "2020-01-31" "2020-02-29"
d2 <- c("2020-01-31", "2020-02-02")
ceiling_date(ymd(d2), 'month') %m-% days(1)
# [1] "2020-01-31" "2020-02-29"
d3 <- c("Q1 2019", "Q3 2020")
ceiling_date(yq(paste(substr(d3, 4, 7), substr(d3, 1, 2))), 'month') %m-% days(1) # as qy() is not supported
# [1] "2019-01-31" "2020-07-31"
d4 <- c("2020 JAN", "2020 FEB")
ceiling_date(ym(d4), 'month') %m-% days(1)
# [1] "2020-01-31" "2020-02-29"
All above can actually combined IF you are pretty confident about the to be expected formats.
as.Date(ceiling_date(parse_date_time(c(d1, d2, d3, d4), c("%m %y", "%y-%m-%d", "%q %y", "%y %m")), "month") %m-% days(1))
# [1] "2019-12-31" "2020-01-31" "2020-01-31" "2020-02-29" "2018-12-31" "2020-02-29" "2019-12-31" "2020-01-31"
CodePudding user response:
Here is an example how you could do it en-bloc:
- simulate your 4 dataframes with the different dates
- bind them with
bind_rows
- define a column
Group
in long format for later plotting - use
case_when
withstr_detect
and month patternspattern1
andpattern2
library(tidyverse)
# example your dataframes
df1 <- tibble(Date = "January 2020",
x = 1)
df2 <- tibble(Date = "2020-01-31",
y = 2)
df3 <- tibble(Date = "Q1 2019",
z = 3)
df4 <- tibble(Date = "2020 JAN",
a = 4)
library(zoo) # for Quartal format
library(lubridate)
pattern1 <- c("January|February|March|April|May|June|July|August|September|October|November|December")
pattern2 <- toupper(c("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"))
# the code:
bind_rows(df1, df2, df3, df4) %>%
mutate(across(-Date, ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
unite(Group, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(value = coalesce(x, y,z,a), .keep="unused") %>%
mutate(Date = case_when(str_detect(Date, "Q") ~ as.Date(as.yearqtr(Date, format = "Q%q %Y")),
str_detect(Date, pattern1) ~ myd(paste0(Date, " 01")),
str_detect(Date, pattern2) ~ ymd(paste0(Date, " 01")),
TRUE ~ ymd(Date)
))
# A tibble: 4 x 3
Date Group value
<date> <chr> <dbl>
1 2020-01-01 x 1
2 2020-01-31 y 2
3 2019-01-01 z 3
4 2020-01-01 a 4