Home > Software design >  How to detect three consecutive days in Pandas DataFrame?
How to detect three consecutive days in Pandas DataFrame?

Time:04-01

I have DataFrame with index in Datetime format. I need to keep the days which are adjacent to at least 2 additional days (I mean 3 consecutive days should come together). Please, share your solution.

For example,

Date
2021.11.08 #<-
2021.11.09 #<-
2021.11.10 #<-
2021.11.12
2021.11.13
2021.11.16 #<-
2021.11.17 #<-
2021.11.18 #<-
2021.11.19 #<-
2021.11.22
2021.11.23

<- to be selected

CodePudding user response:

Try with groupby:

#convert to datetime
df["Date"] = pd.to_datetime(df["Date"], format="%Y.%m.%d")

#check if adjacent rows are 1 day apart
adjacent = df["Date"].diff().dt.days.fillna(1).eq(1)

#get sequences with a minimum length of 3
mask = df.groupby(adjacent.ne(adjacent.shift()).cumsum())["Date"].transform('count').ge(3)
output = df[mask|mask.shift(-1)]

>>> output
        Date
0 2021-11-08
1 2021-11-09
2 2021-11-10
5 2021-11-16
6 2021-11-17
7 2021-11-18
8 2021-11-19

CodePudding user response:

With groupby filter

oneday = pd.offsets.Day(1)
diff = df.Date.diff().bfill()

df.groupby(
    diff.ne(oneday).cumsum()
).filter(lambda d: len(d) > 2)


        Date
0 2021-11-08
1 2021-11-09
2 2021-11-10
5 2021-11-16
6 2021-11-17
7 2021-11-18
8 2021-11-19

CodePudding user response:

Using slicing with a mask:

N=3

# find start of groups
m = ~pd.to_datetime(df['Date']).diff().eq('1d')

# check size and keep if ≥ N
df[m.groupby(m.cumsum()).transform('size').ge(N)]

Output:

         Date
0  2021.11.08
1  2021.11.09
2  2021.11.10
5  2021.11.16
6  2021.11.17
7  2021.11.18
8  2021.11.19
keep every second element
N = 3
m = ~pd.to_datetime(df['Date']).diff().eq('1d')

g = m.groupby(m.cumsum())

m1 = g.transform('size').ge(N)
m2 = g.cumcount().mod(2) # odd lines

df[m1&m2]

Output:

         Date
1  2021.11.09
6  2021.11.17
8  2021.11.19

NB. If you only want the second and not every second, use eq(1) in place of mod(2)

  • Related