Home > Enterprise >  Presence in consecutive day by year in daily time-series data in R
Presence in consecutive day by year in daily time-series data in R

Time:07-07

I have daily time-series data for 60 years about the presence and absence of rainfall for 400 stations. The data is in the following format where, in the second column, 1 indicates presence and 0 indicate absence:

Date         Rainfall
---------------------
1981-01-01   0
1981-01-02   0
1981-01-03   0
1981-01-04   1
1981-01-05   0
1981-01-06   1
1981-01-07   1
1981-01-08   1
1981-01-09   0
1981-01-10   0
1981-01-11   1
1981-01-12   1
1981-01-13   1
1981-01-14   1
1981-01-15   1
1981-01-16   0
..........   .

Now I have to calculate the number of consecutive wet days for each year when at least 3 consecutive days received rainfall and the longest consecutive days of rainfall in a year. If 3 or more than 3 consecutive days (any number) received rainfall I will consider it as a single event.

My output will be like this

Year      No of consecutive wet-days   longest consecutive wet-days
1981      2                            5
.
.

How can we do this in R? If I can solve for a station, I can iterate for all stations in R.

Thank you in advance for your help :)

CodePudding user response:

Another possible solution (I thank @DarrenTsai for his comments, which have improved this solution):

library(tidyverse)
library(lubridate)

df %>% 
  group_by(Year = year(ymd(Date))) %>%
  mutate(x = list(rle(Rainfall))) %>% 
  summarise(ncons = sum(x[[1]]$lengths >= 3 & x[[1]]$values == 1),
            longest = ifelse(sum(x[[1]]$values == 1) == 0, 0, 
                max(x[[1]]$lengths[x[[1]]$values == 1])))

#> # A tibble: 2 × 3
#>    Year ncons longest
#>   <dbl> <int>   <int>
#> 1  1981     2       5
#> 2  1982     2       4

CodePudding user response:

You can create event with rle.

library(dplyr)
df <- df %>%
  mutate(event = with(rle(Rainfall), rep(seq_along(lengths), lengths))) 

         Date Rainfall event
1  1981-01-01        0     1
2  1981-01-02        0     1
3  1981-01-03        0     1
4  1981-01-04        1     2
5  1981-01-05        0     3
6  1981-01-06        1     4
7  1981-01-07        1     4
8  1981-01-08        1     4
9  1981-01-09        0     5
10 1981-01-10        0     5
11 1981-01-11        1     6
12 1981-01-12        1     6
13 1981-01-13        1     6
14 1981-01-14        1     6
15 1981-01-15        1     6
16 1981-01-16        0     7

With that you can tally up the number of consecutive days for each rainfall event.

df %>% filter(Rainfall == 1) %>% group_by(event) %>% tally()
# A tibble: 3 x 2
  event     n
  <int> <int>
1     2     1
2     4     3
3     6     5

Further wrangling with Year and counting the tallied up rain event will give you your expected summary.

CodePudding user response:

A simple solution using only {base} R functions, particularly diff and tapply. The summary statistics pertain to events with a start date in that year.

date <- seq(as.Date("2000/1/1"), as.Date("2010/1/1"), "days")
rainfall <- sample(c(0,1),length(date), replace = T)
df <- data.frame(date,rainfall)

events <- df$date[which(c(1,diff(df$rainfall)) > 0)]
event.lengths <- tapply(df$rainfall, cumsum(c(1, diff(df$rainfall) > 0 )), sum)
df.events <- data.frame(events,event.lengths)

Total_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),sum)
Max_consecutive_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),max)
Year <- names(Total_rain_days)

output <- data.frame(Year, Total_rain_days, Max_consecutive_rain_days)
output

> output
     Year Total_rain_days Max_consecutive_rain_days
2000 2000             192                         8
2001 2001             175                         9
2002 2002             196                        13
2003 2003             193                         9
2004 2004             164                        11
2005 2005             183                         7
2006 2006             196                        10
2007 2007             176                         7
2008 2008             179                         7
2009 2009             178                         9
2010 2010               1                         1

CodePudding user response:

Another possible solution, Though I think already better solutions have been suggested

library(lubridate)
library(dplyr)
library(tibble)


rainfall_data <- tibble::tribble(
      ~ date, ~rainfall,
       "1981-01-01",   0,
       "1981-01-02",   0,
       "1981-01-03",   0,
       "1981-01-04",   1,
       "1981-01-05",   0,
       "1981-01-06",   1,
       "1981-01-07",   1,
       "1981-01-08",   1,
       "1981-01-09",   0,
       "1981-01-10",   0,
       "1981-01-11",   1,
       "1981-01-12",   1,
       "1981-01-13",   1,
       "1981-01-14",   1,
       "1981-01-15",   1,
       "1981-01-16",   0
)


rainfall_data %>%
    mutate(
        csum = ave(rainfall, cumsum(rainfall == 0), FUN = cumsum),
        event = ave(csum, cumsum(rainfall == 0), FUN = max)
    ) %>%
    filter(event >= 3) %>%
    distinct(event, .keep_all = TRUE) %>%
    group_by(year = year(ymd(date))) %>%
    summarise(
        No_of_consecutive_wet_days = n(),
        longest_consecutive_wet_days = max(event)
    )
#> # A tibble: 1 × 3
#>    year No_of_consecutive_wet_days longest_consecutive_wet_days
#>   <dbl>                      <int>                        <dbl>
#> 1  1981                          2                            5

Created on 2022-07-06 by the reprex package (v2.0.1)

  • Related