Home > Net >  Check if dates are in sequence in pandas column
Check if dates are in sequence in pandas column

Time:12-23

if I have dataframe :

Class   Dates
1       2022.01.12   
2       2022.01.13
3       2022.01.15
4       2022.01.20
5       2022.01.21
6       2022.01.22
7       2022.01.22

And I want to get

Class   Dates          Notes
1       2022.01.12      Min
2       2022.01.13      Max
3       2022.01.15     Singledate
4       2022.01.20      Min
5       2022.01.21
6       2022.01.22
7       2022.01.22       Max
8       2022.01.30     Singledate

Note column consist of period information. If there is a time range period (marked by IF th dates are consecutive) so put min on first day, and max on end of the period. If its only single dates/no consecutive then write 'singledate'.

Tried filling 'Notes' with this code, but seems lead to nowhere and got stuck

for idx, dates in df['Dates]:
    df['Notes']='min
    if df['Dates'].diff() == 1 :
       df['Notes']=''
    elif :
        df['Notes']='single'
        df['Notes'][idx-1] = 'max'

CodePudding user response:

import pandas as pd

    
df['Dates'] = pd.to_datetime(df['Dates'], errors='raise')
df['Dif'] = df['Dates'].diff().dt.days
df['Dif'] = df['Dif'].replace(0, 1)
df['Notes'] = ''

def my_func(x):
    a = df[x:][df.loc[x:, 'Dif'] != 1]
    if len(a) > 0:
        df.loc[[x - 1, a.index[0] - 1], 'Notes'] = ['Min', 'Max']
        df.loc[x - 1 : a.index[0] - 1, 'Dif'] = 1
    else:
        df.loc[[x - 1, len(df) - 1], 'Notes'] = ['Min', 'Max']
        df.loc[x - 1 : len(df) - 1, 'Dif'] = 1


[my_func(i) for i in range(1, len(df)) if df.loc[i - 1, 'Dif'] != df.loc[i, 'Dif'] and df.loc[i, 'Dif'] == 1]

df.loc[df[df['Dif'] != 1].index, 'Notes'] = 'Singledate'

print(df)

Output

   Class      Dates  Dif       Notes
0      1 2022-01-12  1.0         Min
1      2 2022-01-13  1.0         Max
2      3 2022-01-15  2.0  Singledate
3      4 2022-01-20  1.0         Min
4      5 2022-01-21  1.0            
5      6 2022-01-22  1.0            
6      7 2022-01-22  1.0         Max
7      8 2022-01-30  8.0  Singledate

The 'Dates' column is converted to date format. A 'Dif' column is created with a difference in days. 0 are replaced with 1.

I used list comprehension, as it is many times faster than a loop. The list comprehension checks the condition: current 'Dif' = 1, past is not equal to current. When the condition is triggered, a function is called that fills in the desired values through loc. At the end, the lines with 'Singledate' are filled.

  • Related