I have a pandas data frame with dates. I need to know if every other date pair is consecutive.
2 1988-01-01
3 2015-01-31
4 2015-02-01
5 2015-05-31
6 2015-06-01
7 2021-11-16
11 2021-11-17
12 2022-10-05
8 2022-10-06
9 2022-10-12
10 2022-10-13
# How to build this example dataframe
df=pd.DataFrame({'date':pd.to_datetime(['1988-01-01','2015-01-31','2015-02-01', '2015-05-31','2015-06-01', '2021-11-16', '2021-11-17', '2022-10-05', '2022-10-06', '2022-10-12', '2022-10-13'])})
Each pair should be consecutive. I have tried different sorting but everything I see relates to the entire series being consecutive. I need to compare each pair of dates after the first date.
cb_gap = cb_sorted.sort_values('dates').groupby('dates').diff() > pd.to_timedelta('1 day')
What I need to see is this...
2 1988-01-01 <- Ignore the start date
3 2015-01-31 <- these dates have no gap
4 2015-02-01
5 2015-05-31 <- these dates have no gap
6 2015-06-01
7 2021-11-16 <- these have a gap!!!!
11 2021-11-18
12 2022-10-05 <- these have no gap
8 2022-10-06
9 2022-10-12
CodePudding user response:
here is one way to do it
btw, what is your expected output? the answer get you the difference b/w the consecutive dates skipping the first row and populate diff column
# make date into datetime
df['date'] = pd.to_datetime(df['date'])
# create two intermediate DF skipping the first and taking alternate values
# and concat them along x-axis
df2=pd.concat([df.iloc[1:].iloc[::2].reset_index()[['id','date']],
df.iloc[2:].iloc[::2].reset_index()[['id','date']]
],axis=1 )
# take the difference of second date from the first one
df2['diff']=df2.iloc[:,3]-df2.iloc[:,1]
df2
id date id date diff
0 3 2015-01-31 4 2015-02-01 1 days
1 5 2015-05-31 6 2015-06-01 1 days
2 7 2021-11-16 11 2021-11-17 1 days
3 12 2022-10-05 8 2022-10-06 1 days
4 9 2022-10-12 10 2022-10-13 1 days
CodePudding user response:
One way is to use shift and compute differences.
pd.DataFrame({'date':df.date,'diff':df.date.shift(-1)-df.date})[1::2]
returns
date diff
1 2015-01-31 1 days
3 2015-05-31 1 days
5 2021-11-16 1 days
7 2022-10-05 1 days
9 2022-10-12 1 days
It is also faster
Method | Timeit |
---|---|
Naveed's | 4.23 ms |
This one | 0.93 ms |