I have DataFrame with index in Datetime format. I need to keep the days which are adjacent to at least 2 additional days (I mean 3 consecutive days should come together). Please, share your solution.
For example,
Date
2021.11.08 #<-
2021.11.09 #<-
2021.11.10 #<-
2021.11.12
2021.11.13
2021.11.16 #<-
2021.11.17 #<-
2021.11.18 #<-
2021.11.19 #<-
2021.11.22
2021.11.23
<- to be selected
CodePudding user response:
Try with groupby
:
#convert to datetime
df["Date"] = pd.to_datetime(df["Date"], format="%Y.%m.%d")
#check if adjacent rows are 1 day apart
adjacent = df["Date"].diff().dt.days.fillna(1).eq(1)
#get sequences with a minimum length of 3
mask = df.groupby(adjacent.ne(adjacent.shift()).cumsum())["Date"].transform('count').ge(3)
output = df[mask|mask.shift(-1)]
>>> output
Date
0 2021-11-08
1 2021-11-09
2 2021-11-10
5 2021-11-16
6 2021-11-17
7 2021-11-18
8 2021-11-19
CodePudding user response:
With groupby
filter
oneday = pd.offsets.Day(1)
diff = df.Date.diff().bfill()
df.groupby(
diff.ne(oneday).cumsum()
).filter(lambda d: len(d) > 2)
Date
0 2021-11-08
1 2021-11-09
2 2021-11-10
5 2021-11-16
6 2021-11-17
7 2021-11-18
8 2021-11-19
CodePudding user response:
Using slicing with a mask:
N=3
# find start of groups
m = ~pd.to_datetime(df['Date']).diff().eq('1d')
# check size and keep if ≥ N
df[m.groupby(m.cumsum()).transform('size').ge(N)]
Output:
Date
0 2021.11.08
1 2021.11.09
2 2021.11.10
5 2021.11.16
6 2021.11.17
7 2021.11.18
8 2021.11.19
keep every second element
N = 3
m = ~pd.to_datetime(df['Date']).diff().eq('1d')
g = m.groupby(m.cumsum())
m1 = g.transform('size').ge(N)
m2 = g.cumcount().mod(2) # odd lines
df[m1&m2]
Output:
Date
1 2021.11.09
6 2021.11.17
8 2021.11.19
NB. If you only want the second and not every second, use eq(1)
in place of mod(2)