Home > Enterprise >  How to find the Consecutive Days from a list of dates or data frame
How to find the Consecutive Days from a list of dates or data frame

Time:05-14

I need to find consecutive days from the data frame but I am having some trouble trying to find them. I tried R, Excel, and Python but still couldn't find the solution.

My data looks like this

Date        PRECTOT
1982/2/1    0.1
1982/2/2    0.14
1982/2/3    0
1982/2/6    0
1982/2/7    0
1982/2/8    0
1982/2/10   0
1982/2/11   0
1982/2/12   0
1982/2/15   0.18
1982/2/16   0
1982/2/20   0.08
1982/2/21   0
1982/2/22   0
1982/2/23   0
1982/2/24   0.03
1982/2/25   0
1982/2/26   0
1982/2/27   0
1982/2/28   0
1982/3/4    0
1982/3/5    0.05
1982/3/8    0.16
1982/3/9    0

My expected output:

Date        PRECTOT  Consecutive
1982/2/1    0.1      3
1982/2/2    0.14    
1982/2/3    0   
1982/2/6    0        3
1982/2/7    0   
1982/2/8    0   
1982/2/10   0        3
1982/2/11   0   
1982/2/12   0   
1982/2/15   0.18     2
1982/2/16   0   
1982/2/20   0.08     9
1982/2/21   0   
1982/2/22   0   
1982/2/23   0   
1982/2/24   0.03    
1982/2/25   0   
1982/2/26   0   
1982/2/27   0   
1982/2/28   0   
1982/3/4    0        2
1982/3/5    0.05    
1982/3/8    0.16     2
1982/3/9    0   

Or

Date        PRECTOT  Consecutive
1982/2/1    0.1      1
1982/2/2    0.14     2
1982/2/3    0        3
1982/2/6    0        1
1982/2/7    0        2
1982/2/8    0        3
1982/2/10   0        1
1982/2/11   0        2
1982/2/12   0        3
1982/2/15   0.18     1
1982/2/16   0        2
1982/2/20   0.08     1
1982/2/21   0        2
1982/2/22   0        3
1982/2/23   0        4
1982/2/24   0.03     5
1982/2/25   0        6
1982/2/26   0        7
1982/2/27   0        8
1982/2/28   0        9
1982/3/4    0        1
1982/3/5    0.05     2
1982/3/8    0.16     1
1982/3/9    0        2

Consecutive values can be formatted, all I need to find out Consecutive Values. I would like to find a solution but I don't know what to do. (R/Python/Excel)

CodePudding user response:

You could try an R approach using dplyr:

library(dplyr)
df %>% 
  group_by(grp = cumsum(c(0, diff(as.Date(Date, "%Y/%m/%d"))) > 1)) %>% 
  mutate(Consecutive = row_number()) %>% 
  ungroup() %>% 
  select(-grp) %>%

This returns

# A tibble: 24 x 3
   Date      PRECTOT Consecutive
   <chr>       <dbl>       <int>
 1 1982/2/1     0.1            1
 2 1982/2/2     0.14           2
 3 1982/2/3     0              3
 4 1982/2/6     0              1
 5 1982/2/7     0              2
 6 1982/2/8     0              3
 7 1982/2/10    0              1
 8 1982/2/11    0              2
 9 1982/2/12    0              3
10 1982/2/15    0.18           1
11 1982/2/16    0              2
12 1982/2/20    0.08           1
13 1982/2/21    0              2
14 1982/2/22    0              3
15 1982/2/23    0              4
16 1982/2/24    0.03           5
17 1982/2/25    0              6
18 1982/2/26    0              7
19 1982/2/27    0              8
20 1982/2/28    0              9
21 1982/3/4     0              1
22 1982/3/5     0.05           2
23 1982/3/8     0.16           1
24 1982/3/9     0              2

CodePudding user response:

I'll use pandas.

Suppose df is an input table you have and Date column has pd.Timestamp type. I want to apply rolling to Date column and it supports only operations with numeric types, so I'll transform it to nanoseconds first:

from operator import attrgetter

date_ns = df.Date.apply(attrgetter("value")) # df.Date.dt.value will not work :(

Then for each date in Date column i'll map 1 if the previous date in terms of Date column is a previous day in terms of the real calendar:

ONE_DAY = pd.Timedelta(days=1).value

def apply_func(x):
   if len(x) == 2 and x[1] - x[0] == ONE_DAY:
       return 1
   return 0

prev_is_prev = date_ns.rolling(2, min_periods=1).apply(apply_func)

All we need now is cumsum:

df["Consecutive"] = prev_is_prev.cumsum()   1
  • Related