What would be the easiest way to find sets of consecutive numbers in a dataframe by group? An example of the data I am using has the following format and is associated with heat waves
year day temp
2011 150 96
2011 151 96
2011 152 97
2011 166 98
2011 167 99
2011 168 98
2011 169 99
2012 177 96
2012 178 96
2012 179 95
2012 180 98
2012 195 100
2012 196 99
2012 197 99
For each year I want to count how many times there are a stretch of consecutive days and how long that period is - and the average temperature for each set too. So the following is the desired output
year num.hw length.hw avg.temp
2011 1 3 96
2011 2 4 98.5
2012 1 4 97
2012 2 3 99
I have tried using dplyr
and creating an indicator based on the difference in the day to detect gaps but that is not quite getting me where I need. What is the best way to do this? can be dplyr
or data.table
or another method.
CodePudding user response:
You can use cumsum(c(1, diff(day) != 1))
to generate the groups of consecutive days.
library(dplyr)
df %>%
group_by(year) %>%
mutate(num.hw = cumsum(c(1, diff(day) != 1))) %>%
group_by(year, num.hw) %>%
summarise(length.hw = n(), avg.temp = mean(temp), .groups = 'drop')
# # A tibble: 4 × 4
# year num.hw length.hw avg.temp
# <int> <dbl> <int> <dbl>
# 1 2011 1 3 96.3
# 2 2011 2 4 98.5
# 3 2012 1 4 96.2
# 4 2012 2 3 99.3
Data
df <- read.table(text = "
year day temp
2011 150 96
2011 151 96
2011 152 97
2011 166 98
2011 167 99
2011 168 98
2011 169 99
2012 177 96
2012 178 96
2012 179 95
2012 180 98
2012 195 100
2012 196 99
2012 197 99", header = TRUE)