Home > Software engineering >  Finding sets of consecutive numbers by group in R
Finding sets of consecutive numbers by group in R

Time:01-20

What would be the easiest way to find sets of consecutive numbers in a dataframe by group? An example of the data I am using has the following format and is associated with heat waves

  year  day  temp
  2011  150  96 
  2011  151  96
  2011  152  97
  2011  166  98
  2011  167  99
  2011  168  98
  2011  169  99
  2012  177  96
  2012  178  96
  2012  179  95
  2012  180  98
  2012  195  100
  2012  196  99
  2012  197  99

For each year I want to count how many times there are a stretch of consecutive days and how long that period is - and the average temperature for each set too. So the following is the desired output

year  num.hw  length.hw  avg.temp
2011  1       3          96
2011  2       4          98.5
2012  1       4          97
2012  2       3          99

I have tried using dplyr and creating an indicator based on the difference in the day to detect gaps but that is not quite getting me where I need. What is the best way to do this? can be dplyr or data.table or another method.

CodePudding user response:

You can use cumsum(c(1, diff(day) != 1)) to generate the groups of consecutive days.

library(dplyr)

df %>%
  group_by(year) %>%
  mutate(num.hw = cumsum(c(1, diff(day) != 1))) %>%
  group_by(year, num.hw) %>%
  summarise(length.hw = n(), avg.temp = mean(temp), .groups = 'drop')

# # A tibble: 4 × 4
#    year num.hw length.hw avg.temp
#   <int>  <dbl>     <int>    <dbl>
# 1  2011      1         3     96.3
# 2  2011      2         4     98.5
# 3  2012      1         4     96.2
# 4  2012      2         3     99.3

Data
df <- read.table(text = "
  year  day  temp
  2011  150  96 
  2011  151  96
  2011  152  97
  2011  166  98
  2011  167  99
  2011  168  98
  2011  169  99
  2012  177  96
  2012  178  96
  2012  179  95
  2012  180  98
  2012  195  100
  2012  196  99
  2012  197  99", header = TRUE)
  • Related