I have a dataframe as the following one :
Start | End | Unit | Nominal power | Unavailable power |
---|---|---|---|---|
2022-07-28 15:00:00 | 2022-09-23 17:00:00 | A | 485.0 | 0 |
2022-07-29 06:00:00 | 2022-09-23 17:00:00 | A | 485.0 | 0 |
2022-09-23 17:00:00 | 2022-10-12 17:00:00 | A | 485.0 | 0 |
2022-10-02 17:00:00 | 2023-01-13 17:00:00 | A | 485.0 | 0 |
2023-03-29 17:00:00 | 2022-03-30 05:00:00 | A | 485.0 | 150 |
2023-03-30 05:00:00 | 2023-04-02 17:00:00 | A | 485.0 | 290 |
2022-05-17 10:00:00 | 2022-05-26 10:00:00 | B | 512.0 | 0 |
2023-05-19 05:00:00 | 2023-05-24 17:00:00 | B | 512.0 | 0 |
I wrote a code to detect and drop the overlaps between start dates and end dates on different rows for each unit. But I can note achieve to include a criteria on unavailable power.
What I would like is the following one :
Start | End | Unit | Nominal power | Unavailable power |
---|---|---|---|---|
2022-07-28 15:00:00 | 2023-01-13 17:00:00 | A | 485.0 | 0 |
2023-03-29 17:00:00 | 2022-03-30 05:00:00 | A | 485.0 | 150 |
2023-03-30 05:00:00 | 2023-04-02 17:00:00 | A | 485.0 | 290 |
2022-05-17 10:00:00 | 2022-05-26 10:00:00 | B | 512.0 | 0 |
I wrote this code to succeed :
def overlapping(data):
i=100000000
j=0
df = data.sort_values(['Unit', 'Start'])
g = df['Start'].gt(df['End'].shift()).groupby(df['Unit']).cumsum()
df_out = df.groupby(['Unit', g], as_index=False).agg({'Start': 'min', 'End': 'max'})
while i > j:
i = len(df_out)
df = data.sort_values(['Unit', 'Start'])
g = df['Start'].gt(df['End'].shift()).groupby(df['Unit']).cumsum()
df_out = df.groupby(['Unit', g], as_index=False).agg({'Start': 'min', 'End': 'max'})
j = len(df_out)
return df_out
But unfortunately it doesn't take into account the "Unavailable criteria". It gives the following table :
Start | End | Unit |
---|---|---|
2022-07-28 15:00:00 | 2023-01-13 17:00:00 | A |
2023-03-29 17:00:00 | 2023-04-02 17:00:00 | A |
2022-05-17 10:00:00 | 2023-05-24 10:00:00 | B |
Any idea of this?
CodePudding user response:
Check Below code, What is not clear is how in your output last row has END date as 2022-05-26 10:00:00 instead of 2023-05-24 17:00:00
df['Group_Col'] = (df['Unavailable power'].shift().bfill() != df['Unavailable power']).cumsum()
(df.groupby(['Unit','Nominal power','Group_Col','Unavailable power']).
agg({'Start':'min','End':'max'}).
reset_index()[['Start', 'End', 'Unit', 'Nominal power', 'Unavailable power']]
)
Output: