Home > Back-end >  Python - Tell if there is a non consecutive date in pandas dataframe
Python - Tell if there is a non consecutive date in pandas dataframe

Time:11-18

I have a pandas data frame with dates. I need to know if every other date pair is consecutive.

2    1988-01-01
3    2015-01-31
4    2015-02-01
5    2015-05-31
6    2015-06-01
7    2021-11-16
11   2021-11-17
12   2022-10-05
8    2022-10-06
9    2022-10-12
10   2022-10-13
# How to build this example dataframe
df=pd.DataFrame({'date':pd.to_datetime(['1988-01-01','2015-01-31','2015-02-01', '2015-05-31','2015-06-01', '2021-11-16', '2021-11-17', '2022-10-05', '2022-10-06', '2022-10-12', '2022-10-13'])})

Each pair should be consecutive. I have tried different sorting but everything I see relates to the entire series being consecutive. I need to compare each pair of dates after the first date.

cb_gap = cb_sorted.sort_values('dates').groupby('dates').diff() > pd.to_timedelta('1 day')

What I need to see is this...

2    1988-01-01 <- Ignore the start date

3    2015-01-31 <- these dates have no gap
4    2015-02-01

5    2015-05-31 <- these dates have no gap
6    2015-06-01

7    2021-11-16 <- these have a gap!!!!
11   2021-11-18

12   2022-10-05 <- these have no gap
8    2022-10-06

9    2022-10-12

CodePudding user response:

here is one way to do it

btw, what is your expected output? the answer get you the difference b/w the consecutive dates skipping the first row and populate diff column

# make date into datetime
df['date'] = pd.to_datetime(df['date'])

# create two intermediate DF skipping the first and taking alternate values
# and concat them along x-axis
df2=pd.concat([df.iloc[1:].iloc[::2].reset_index()[['id','date']],
           df.iloc[2:].iloc[::2].reset_index()[['id','date']]
          ],axis=1  )

# take the difference of second date from the first one
df2['diff']=df2.iloc[:,3]-df2.iloc[:,1]
df2

    id       date   id       date   diff
0   3   2015-01-31  4   2015-02-01  1 days
1   5   2015-05-31  6   2015-06-01  1 days
2   7   2021-11-16  11  2021-11-17  1 days
3   12  2022-10-05  8   2022-10-06  1 days
4   9   2022-10-12  10  2022-10-13  1 days

CodePudding user response:

One way is to use shift and compute differences.

pd.DataFrame({'date':df.date,'diff':df.date.shift(-1)-df.date})[1::2]

returns

        date   diff
1 2015-01-31 1 days
3 2015-05-31 1 days
5 2021-11-16 1 days
7 2022-10-05 1 days
9 2022-10-12 1 days

It is also faster

Method Timeit
Naveed's 4.23 ms
This one 0.93 ms
  • Related