Home > OS >  Grouping data by quarter in R but accounting for quarters when there are no data
Grouping data by quarter in R but accounting for quarters when there are no data

Time:10-19

I have a simple dataset which rows need to be grouped by quarter. The code works fine but the issue is that when there are no data for a quarter it doesn't write it and hence not showing on the chart. I would like to add an statement where there are no case for a quarter it put zero instead of skipping it. My current code:

    mydata <- dataset %>% 
    janitor::clean_names() %>% 
    mutate(test_date = lubridate::dmy(test_date)) %>% 
  mutate(test_quarter = zoo::as.yearqtr(test_date)) %>% 
    group_by(test_quarter) %>% 
    summarise(cases = n())

As an example, the output is something like this with nothing for 2021 Q2 , of course because there are no cases:

enter image description here

I would like the code account for quarters where cases are zero. Any help is appreciated.

CodePudding user response:

You may create a quarterly sequence using seq function and use it inside tidyr::complete.

library(dplyr)

dataset %>% 
  janitor::clean_names() %>% 
  mutate(test_date = lubridate::dmy(test_date),
         test_quarter = zoo::as.yearqtr(test_date)) %>% 
  count(test_quarter, name = 'cases') %>%
  tidyr::complete(test_quarter = seq(min(test_quarter), max(test_quarter), 1/4), 
                  fill = list(cases = 0))

CodePudding user response:

Base R Solution:

# Import data: df1 => data.frame
df1 <- data.frame(
  test_quarter = c(
    "2019 Q3",
    "2019 Q4",
    "2020 Q1",
    "2020 Q2",
    "2020 Q3",
    "2020 Q4",
    "2020 Q3",
    "2021 Q3"
  ),
  cases = c(4, 3, 1, 2, 2, 1, 3, 2)
)

# Create a date vector holding the range:
# date_rng => Date vector
date_rng <- as.Date(
  paste0(
    with(
      df1, 
      range(
        as.integer(
          gsub(
            "^(\\d{4}).*",
            "\\1",
            test_quarter
          )
        )
      )
    ),
    c("-01-01", "-12-31")
  )
)

# Create a data.frame with each quarter in the range:
# date_rng_df => data.frame
date_rng_df <- transform(
  data.frame(
    dates = seq.Date(
      from = date_rng[1], 
      to = date_rng[2], 
      by = "quarter"
    )
  ),
  # Derive the quarter vector: 
  # test_quater => character vector
  test_quarter = paste(
    gsub(
      "^(\\d{4}).*",
      "\\1", 
      dates
    ),
    quarters(dates)
  )
)

# Function to generate a sequence between integers:
# seq_int_between => function() 
seq_int_between <- function(...){
  # Store the inputs as a list variable: pholder => list
  pholder <- list(...)
  
  # Extract the range: rng => integer vector
  rng <- range(pholder)
  
  # Generate a vector between integer range: 
  # res => integer vector
  res <- seq.int(
    from = rng[1], 
    to = rng[2], 
    by = 1
  )
  
  # Explicitly define the returned object: 
  # integer vector => env
  return(res)
}

# Extend the data.frame appropriately: res => data.frame
res <- transform(
    merge(
      date_rng_df[
        seq_int_between(
          match(
            df1$test_quarter,
            date_rng_df$test_quarter
          )
        ), 
        "test_quarter",
        drop = FALSE
      ],
      df1,
      by = "test_quarter",
      all.x = TRUE
  ),
  # 0 out NA values: cases => integer vector
  cases = ifelse(
    is.na(cases), 
    0, 
    cases
  )
)

# Print to console: data.frame => stdout(console)
res
  • Related