I have a simple dataset which rows need to be grouped by quarter. The code works fine but the issue is that when there are no data for a quarter it doesn't write it and hence not showing on the chart. I would like to add an statement where there are no case for a quarter it put zero instead of skipping it. My current code:
mydata <- dataset %>%
janitor::clean_names() %>%
mutate(test_date = lubridate::dmy(test_date)) %>%
mutate(test_quarter = zoo::as.yearqtr(test_date)) %>%
group_by(test_quarter) %>%
summarise(cases = n())
As an example, the output is something like this with nothing for 2021 Q2 , of course because there are no cases:
I would like the code account for quarters where cases are zero. Any help is appreciated.
CodePudding user response:
You may create a quarterly sequence using seq
function and use it inside tidyr::complete
.
library(dplyr)
dataset %>%
janitor::clean_names() %>%
mutate(test_date = lubridate::dmy(test_date),
test_quarter = zoo::as.yearqtr(test_date)) %>%
count(test_quarter, name = 'cases') %>%
tidyr::complete(test_quarter = seq(min(test_quarter), max(test_quarter), 1/4),
fill = list(cases = 0))
CodePudding user response:
Base R Solution:
# Import data: df1 => data.frame
df1 <- data.frame(
test_quarter = c(
"2019 Q3",
"2019 Q4",
"2020 Q1",
"2020 Q2",
"2020 Q3",
"2020 Q4",
"2020 Q3",
"2021 Q3"
),
cases = c(4, 3, 1, 2, 2, 1, 3, 2)
)
# Create a date vector holding the range:
# date_rng => Date vector
date_rng <- as.Date(
paste0(
with(
df1,
range(
as.integer(
gsub(
"^(\\d{4}).*",
"\\1",
test_quarter
)
)
)
),
c("-01-01", "-12-31")
)
)
# Create a data.frame with each quarter in the range:
# date_rng_df => data.frame
date_rng_df <- transform(
data.frame(
dates = seq.Date(
from = date_rng[1],
to = date_rng[2],
by = "quarter"
)
),
# Derive the quarter vector:
# test_quater => character vector
test_quarter = paste(
gsub(
"^(\\d{4}).*",
"\\1",
dates
),
quarters(dates)
)
)
# Function to generate a sequence between integers:
# seq_int_between => function()
seq_int_between <- function(...){
# Store the inputs as a list variable: pholder => list
pholder <- list(...)
# Extract the range: rng => integer vector
rng <- range(pholder)
# Generate a vector between integer range:
# res => integer vector
res <- seq.int(
from = rng[1],
to = rng[2],
by = 1
)
# Explicitly define the returned object:
# integer vector => env
return(res)
}
# Extend the data.frame appropriately: res => data.frame
res <- transform(
merge(
date_rng_df[
seq_int_between(
match(
df1$test_quarter,
date_rng_df$test_quarter
)
),
"test_quarter",
drop = FALSE
],
df1,
by = "test_quarter",
all.x = TRUE
),
# 0 out NA values: cases => integer vector
cases = ifelse(
is.na(cases),
0,
cases
)
)
# Print to console: data.frame => stdout(console)
res