Home > Net >  Processing data.frame that needs order and cumulative days
Processing data.frame that needs order and cumulative days

Time:11-05

With the small reproducible example below, I'd like to identify the dplyr approach to arrive at the data.frame shown at the end of this note. The features of the dplyr output is that it will ensure that the data.frame is sorted by date (note that the dates 1999-04-13 and 1999-03-12 are out of order) and that it then "accumulate" the number of days within each wy grouping (wy = "water year"; Oct 1-Sep 30) that Q is above a threshold of 3.0.

dat <- read.table(text="
Date wy Q
1997-01-01 1997 9.82
1997-02-01 1997 3.51
1997-02-02 1997 9.35
1997-10-04 1998 0.93
1997-11-01 1998 1.66
1997-12-02 1998 0.81
1998-04-03 1998 5.65
1998-05-05 1998 7.82
1998-07-05 1998 6.33
1998-09-06 1998 0.55
1998-09-07 1998 4.54
1998-10-09 1999 6.50
1998-12-31 1999 2.17
1999-01-01 1999 5.67
1999-04-13 1999 5.66
1999-03-12 1999 4.67
1999-06-05 1999 3.34
1999-09-30 1999 1.99
1999-11-06 2000 5.75
2000-03-04 2000 6.28
2000-06-07 2000 0.81
2000-07-06 2000 9.66
2000-09-09 2000 9.08
2000-09-21 2000 6.72", header=TRUE)
dat$Date <- as.Date(dat$Date)

mdat <- dat %>%
          group_by(wy) %>% 
          filter(Q > 3) %>% 
          ?

Desired results:

      Date   wy    Q abvThreshCum
1997-01-01 1997 9.82            1
1997-02-01 1997 3.51            2
1997-02-02 1997 9.35            3
1997-10-04 1998 0.93            0
1997-11-01 1998 1.66            0
1997-12-02 1998 0.81            0
1998-04-03 1998 5.65            1
1998-05-05 1998 7.82            2
1998-07-05 1998 6.33            3
1998-09-06 1998 0.55            3
1998-09-07 1998 4.54            4
1998-10-09 1999 6.50            1
1998-12-31 1999 2.17            1
1999-01-01 1999 5.67            2
1999-03-12 1999 4.67            3
1999-04-13 1999 5.66            4
1999-06-05 1999 3.34            5
1999-09-30 1999 1.99            5
1999-11-06 2000 5.75            1
2000-03-04 2000 6.28            2
2000-06-07 2000 0.81            2
2000-07-06 2000 9.66            3
2000-09-09 2000 9.08            4
2000-09-21 2000 6.72            5

CodePudding user response:

library(dplyr)
dat %>%
  arrange(Date) %>%
  group_by(wy) %>%
  mutate(abv = cumsum(Q > 3)) %>%
  ungroup()
# # A tibble: 24 x 4
#    Date          wy     Q   abv
#    <date>     <int> <dbl> <int>
#  1 1997-01-01  1997  9.82     1
#  2 1997-02-01  1997  3.51     2
#  3 1997-02-02  1997  9.35     3
#  4 1997-10-04  1998  0.93     0
#  5 1997-11-01  1998  1.66     0
#  6 1997-12-02  1998  0.81     0
#  7 1998-04-03  1998  5.65     1
#  8 1998-05-05  1998  7.82     2
#  9 1998-07-05  1998  6.33     3
# 10 1998-09-06  1998  0.55     3
# # ... with 14 more rows

CodePudding user response:

data.table approach

library(data.table)
setDT(dat, key = "Date")[, abvThreshCum := cumsum(Q > 3), by = .(wy)]
  • Related