I need to find consecutive days from the data frame but I am having some trouble trying to find them. I tried R, Excel, and Python but still couldn't find the solution.
My data looks like this
Date PRECTOT
1982/2/1 0.1
1982/2/2 0.14
1982/2/3 0
1982/2/6 0
1982/2/7 0
1982/2/8 0
1982/2/10 0
1982/2/11 0
1982/2/12 0
1982/2/15 0.18
1982/2/16 0
1982/2/20 0.08
1982/2/21 0
1982/2/22 0
1982/2/23 0
1982/2/24 0.03
1982/2/25 0
1982/2/26 0
1982/2/27 0
1982/2/28 0
1982/3/4 0
1982/3/5 0.05
1982/3/8 0.16
1982/3/9 0
My expected output:
Date PRECTOT Consecutive
1982/2/1 0.1 3
1982/2/2 0.14
1982/2/3 0
1982/2/6 0 3
1982/2/7 0
1982/2/8 0
1982/2/10 0 3
1982/2/11 0
1982/2/12 0
1982/2/15 0.18 2
1982/2/16 0
1982/2/20 0.08 9
1982/2/21 0
1982/2/22 0
1982/2/23 0
1982/2/24 0.03
1982/2/25 0
1982/2/26 0
1982/2/27 0
1982/2/28 0
1982/3/4 0 2
1982/3/5 0.05
1982/3/8 0.16 2
1982/3/9 0
Or
Date PRECTOT Consecutive
1982/2/1 0.1 1
1982/2/2 0.14 2
1982/2/3 0 3
1982/2/6 0 1
1982/2/7 0 2
1982/2/8 0 3
1982/2/10 0 1
1982/2/11 0 2
1982/2/12 0 3
1982/2/15 0.18 1
1982/2/16 0 2
1982/2/20 0.08 1
1982/2/21 0 2
1982/2/22 0 3
1982/2/23 0 4
1982/2/24 0.03 5
1982/2/25 0 6
1982/2/26 0 7
1982/2/27 0 8
1982/2/28 0 9
1982/3/4 0 1
1982/3/5 0.05 2
1982/3/8 0.16 1
1982/3/9 0 2
Consecutive values can be formatted, all I need to find out Consecutive Values. I would like to find a solution but I don't know what to do. (R/Python/Excel)
CodePudding user response:
You could try an R approach using dplyr
:
library(dplyr)
df %>%
group_by(grp = cumsum(c(0, diff(as.Date(Date, "%Y/%m/%d"))) > 1)) %>%
mutate(Consecutive = row_number()) %>%
ungroup() %>%
select(-grp) %>%
This returns
# A tibble: 24 x 3
Date PRECTOT Consecutive
<chr> <dbl> <int>
1 1982/2/1 0.1 1
2 1982/2/2 0.14 2
3 1982/2/3 0 3
4 1982/2/6 0 1
5 1982/2/7 0 2
6 1982/2/8 0 3
7 1982/2/10 0 1
8 1982/2/11 0 2
9 1982/2/12 0 3
10 1982/2/15 0.18 1
11 1982/2/16 0 2
12 1982/2/20 0.08 1
13 1982/2/21 0 2
14 1982/2/22 0 3
15 1982/2/23 0 4
16 1982/2/24 0.03 5
17 1982/2/25 0 6
18 1982/2/26 0 7
19 1982/2/27 0 8
20 1982/2/28 0 9
21 1982/3/4 0 1
22 1982/3/5 0.05 2
23 1982/3/8 0.16 1
24 1982/3/9 0 2
CodePudding user response:
I'll use pandas
.
Suppose df
is an input table you have and Date
column has pd.Timestamp
type. I want to apply rolling
to Date
column and it supports only operations with numeric types, so I'll transform it to nanoseconds first:
from operator import attrgetter
date_ns = df.Date.apply(attrgetter("value")) # df.Date.dt.value will not work :(
Then for each date in Date
column i'll map 1
if the previous date in terms of Date
column is a previous day in terms of the real calendar:
ONE_DAY = pd.Timedelta(days=1).value
def apply_func(x):
if len(x) == 2 and x[1] - x[0] == ONE_DAY:
return 1
return 0
prev_is_prev = date_ns.rolling(2, min_periods=1).apply(apply_func)
All we need now is cumsum
:
df["Consecutive"] = prev_is_prev.cumsum() 1