Home > OS >  how to hyphenate adjacent time periods in a vector into a grouped string
how to hyphenate adjacent time periods in a vector into a grouped string

Time:10-28

Say i have a sequence of year-weeks:

s <- c('2020 WK 01', '2021 WK 41', '2021 WK 42', '2021 WK 43', '2021 WK 45')

I want to show this in a plot title to the user but the resulting title is to long. My idea is to hyphenate adjacent year-weeks, e.g. the result i expect:

title <- "2020 WK 01, 2021 WK 41 - 43, 2021 WK 45"

is there an idiomatic way to do this in R?

CodePudding user response:

Here's a base R option -

#Get the week number
week_number <- as.numeric(sub('.*WK\\s ', '', s))

#If the weeks are consecutive group them in one
#get the week number from last value and paste it to first value.
unname(tapply(s, cumsum(c(TRUE, diff(week_number) > 1)), function(x) {
  if(length(x) > 1) paste(x[1], sub('.*WK\\s ', '', x[length(x)]), sep = '-')
  else x
}))

#[1] "2020 WK 01"    "2021 WK 41-43" "2021 WK 45"  

The above code works fine for same year data but returns incorrect output if the input spans multiple years as it does not consider the year value. We can extend the same logic including year value. I have used tidyverse library since it is easy to use.

library(dplyr)
library(tidyr)

s = c('2020 WK 40', '2021 WK 41', '2021 WK 42', '2021 WK 43', '2022 WK 44')

tibble(s) %>%
  separate(s, c('YEAR', 'WEEK_NUM'), sep = '\\s*WK\\s*', 
           convert = TRUE, remove = FALSE) %>%
  arrange(YEAR, WEEK_NUM) %>%
  group_by(YEAR, group = cumsum(c(TRUE, diff(WEEK_NUM) > 1))) %>%
  summarise(title = if(n() > 1) paste(first(s), last(WEEK_NUM), sep = '-') else s) %>%
  pull(title)

#[1] "2020 WK 40"    "2021 WK 41-43" "2022 WK 44"   
  •  Tags:  
  • r
  • Related