Home > database >  Overlap of dates in pandas
Overlap of dates in pandas

Time:09-02

I have a dataframe as the following one :

Start End Unit Nominal power Unavailable power
2022-07-28 15:00:00 2022-09-23 17:00:00 A 485.0 0
2022-07-29 06:00:00 2022-09-23 17:00:00 A 485.0 0
2022-09-23 17:00:00 2022-10-12 17:00:00 A 485.0 0
2022-10-02 17:00:00 2023-01-13 17:00:00 A 485.0 0
2023-03-29 17:00:00 2022-03-30 05:00:00 A 485.0 150
2023-03-30 05:00:00 2023-04-02 17:00:00 A 485.0 290
2022-05-17 10:00:00 2022-05-26 10:00:00 B 512.0 0
2023-05-19 05:00:00 2023-05-24 17:00:00 B 512.0 0

I wrote a code to detect and drop the overlaps between start dates and end dates on different rows for each unit. But I can note achieve to include a criteria on unavailable power.

What I would like is the following one :

Start End Unit Nominal power Unavailable power
2022-07-28 15:00:00 2023-01-13 17:00:00 A 485.0 0
2023-03-29 17:00:00 2022-03-30 05:00:00 A 485.0 150
2023-03-30 05:00:00 2023-04-02 17:00:00 A 485.0 290
2022-05-17 10:00:00 2022-05-26 10:00:00 B 512.0 0

I wrote this code to succeed :

def overlapping(data):
    i=100000000
    j=0
    df = data.sort_values(['Unit', 'Start'])
    g = df['Start'].gt(df['End'].shift()).groupby(df['Unit']).cumsum()
    df_out = df.groupby(['Unit', g], as_index=False).agg({'Start': 'min', 'End': 'max'})
    while i > j:
        i = len(df_out)
        df = data.sort_values(['Unit', 'Start'])
        g = df['Start'].gt(df['End'].shift()).groupby(df['Unit']).cumsum()
        df_out = df.groupby(['Unit', g], as_index=False).agg({'Start': 'min', 'End': 'max'})
        j = len(df_out) 
    return df_out

But unfortunately it doesn't take into account the "Unavailable criteria". It gives the following table :

Start End Unit
2022-07-28 15:00:00 2023-01-13 17:00:00 A
2023-03-29 17:00:00 2023-04-02 17:00:00 A
2022-05-17 10:00:00 2023-05-24 10:00:00 B

Any idea of this?

CodePudding user response:

Check Below code, What is not clear is how in your output last row has END date as 2022-05-26 10:00:00 instead of 2023-05-24 17:00:00

df['Group_Col'] = (df['Unavailable power'].shift().bfill() != df['Unavailable power']).cumsum()

(df.groupby(['Unit','Nominal power','Group_Col','Unavailable power']).
 agg({'Start':'min','End':'max'}).
 reset_index()[['Start',    'End',  'Unit', 'Nominal power', 'Unavailable power']]
 )

Output:

enter image description here

  • Related