I have daily time-series data for 60 years about the presence and absence of rainfall for 400 stations. The data is in the following format where, in the second column, 1 indicates presence and 0 indicate absence:
Date Rainfall
---------------------
1981-01-01 0
1981-01-02 0
1981-01-03 0
1981-01-04 1
1981-01-05 0
1981-01-06 1
1981-01-07 1
1981-01-08 1
1981-01-09 0
1981-01-10 0
1981-01-11 1
1981-01-12 1
1981-01-13 1
1981-01-14 1
1981-01-15 1
1981-01-16 0
.......... .
Now I have to calculate the number of consecutive wet days for each year when at least 3 consecutive days received rainfall and the longest consecutive days of rainfall in a year. If 3 or more than 3 consecutive days (any number) received rainfall I will consider it as a single event.
My output will be like this
Year No of consecutive wet-days longest consecutive wet-days
1981 2 5
.
.
How can we do this in R? If I can solve for a station, I can iterate for all stations in R.
Thank you in advance for your help :)
CodePudding user response:
Another possible solution (I thank @DarrenTsai for his comments, which have improved this solution):
library(tidyverse)
library(lubridate)
df %>%
group_by(Year = year(ymd(Date))) %>%
mutate(x = list(rle(Rainfall))) %>%
summarise(ncons = sum(x[[1]]$lengths >= 3 & x[[1]]$values == 1),
longest = ifelse(sum(x[[1]]$values == 1) == 0, 0,
max(x[[1]]$lengths[x[[1]]$values == 1])))
#> # A tibble: 2 × 3
#> Year ncons longest
#> <dbl> <int> <int>
#> 1 1981 2 5
#> 2 1982 2 4
CodePudding user response:
You can create event with rle
.
library(dplyr)
df <- df %>%
mutate(event = with(rle(Rainfall), rep(seq_along(lengths), lengths)))
Date Rainfall event
1 1981-01-01 0 1
2 1981-01-02 0 1
3 1981-01-03 0 1
4 1981-01-04 1 2
5 1981-01-05 0 3
6 1981-01-06 1 4
7 1981-01-07 1 4
8 1981-01-08 1 4
9 1981-01-09 0 5
10 1981-01-10 0 5
11 1981-01-11 1 6
12 1981-01-12 1 6
13 1981-01-13 1 6
14 1981-01-14 1 6
15 1981-01-15 1 6
16 1981-01-16 0 7
With that you can tally up the number of consecutive days for each rainfall event.
df %>% filter(Rainfall == 1) %>% group_by(event) %>% tally()
# A tibble: 3 x 2
event n
<int> <int>
1 2 1
2 4 3
3 6 5
Further wrangling with Year and counting the tallied up rain event will give you your expected summary.
CodePudding user response:
A simple solution using only {base} R functions, particularly diff
and tapply
. The summary statistics pertain to events with a start date in that year.
date <- seq(as.Date("2000/1/1"), as.Date("2010/1/1"), "days")
rainfall <- sample(c(0,1),length(date), replace = T)
df <- data.frame(date,rainfall)
events <- df$date[which(c(1,diff(df$rainfall)) > 0)]
event.lengths <- tapply(df$rainfall, cumsum(c(1, diff(df$rainfall) > 0 )), sum)
df.events <- data.frame(events,event.lengths)
Total_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),sum)
Max_consecutive_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),max)
Year <- names(Total_rain_days)
output <- data.frame(Year, Total_rain_days, Max_consecutive_rain_days)
output
> output
Year Total_rain_days Max_consecutive_rain_days
2000 2000 192 8
2001 2001 175 9
2002 2002 196 13
2003 2003 193 9
2004 2004 164 11
2005 2005 183 7
2006 2006 196 10
2007 2007 176 7
2008 2008 179 7
2009 2009 178 9
2010 2010 1 1
CodePudding user response:
Another possible solution, Though I think already better solutions have been suggested
library(lubridate)
library(dplyr)
library(tibble)
rainfall_data <- tibble::tribble(
~ date, ~rainfall,
"1981-01-01", 0,
"1981-01-02", 0,
"1981-01-03", 0,
"1981-01-04", 1,
"1981-01-05", 0,
"1981-01-06", 1,
"1981-01-07", 1,
"1981-01-08", 1,
"1981-01-09", 0,
"1981-01-10", 0,
"1981-01-11", 1,
"1981-01-12", 1,
"1981-01-13", 1,
"1981-01-14", 1,
"1981-01-15", 1,
"1981-01-16", 0
)
rainfall_data %>%
mutate(
csum = ave(rainfall, cumsum(rainfall == 0), FUN = cumsum),
event = ave(csum, cumsum(rainfall == 0), FUN = max)
) %>%
filter(event >= 3) %>%
distinct(event, .keep_all = TRUE) %>%
group_by(year = year(ymd(date))) %>%
summarise(
No_of_consecutive_wet_days = n(),
longest_consecutive_wet_days = max(event)
)
#> # A tibble: 1 × 3
#> year No_of_consecutive_wet_days longest_consecutive_wet_days
#> <dbl> <int> <dbl>
#> 1 1981 2 5
Created on 2022-07-06 by the reprex package (v2.0.1)