I have a dataframe with times when the court is not free:

df = pd.DataFrame(
    [
        {'court_name': 'Court 1', 'reserved_fr': '2021-11-15T08:00:00', 'reserved_to': '2021-11-15T12:00:00'}, 
        {'court_name': 'Court 1', 'reserved_fr': '2021-11-15T15:00:00', 'reserved_to': '2021-11-15T16:00:00'}, 
        {'court_name': 'Court 1', 'reserved_fr': '2021-11-15T16:00:00', 'reserved_to': '2021-11-15T21:00:00'}, 
        {'court_name': 'Court 2', 'reserved_fr': '2021-11-15T20:00:00', 'reserved_to': '2021-11-15T21:00:00'}
    ]
)


|    | court_name   | reserved_fr         | reserved_to         |
|---:|:-------------|:--------------------|:--------------------|
|  0 | Court 1      | 2021-11-15T08:00:00 | 2021-11-15T12:00:00 |
|  1 | Court 1      | 2021-11-15T15:00:00 | 2021-11-15T16:00:00 |
|  2 | Court 1      | 2021-11-15T16:00:00 | 2021-11-15T21:00:00 |
|  3 | Court 2      | 2021-11-15T20:00:00 | 2021-11-15T21:00:00 |

If each court working time is from 7 am to 11 pm, I would like to know when the court is free.

For example courts are free:

Court 1     2021-11-15 07:00:00   2021-11-15 08:00:00
Court 1     2021-11-15 12:00:00   2021-11-15 15:00:00
Court 1     2021-11-15 21:00:00   2021-11-15 23:00:00
Court 2     2021-11-15 07:00:00   2021-11-15 20:00:00
Court 2     2021-11-15 21:00:00   2021-11-15 23:00:00

How to transform dataframe to the another dataframe in format above?

CodePudding user response：

Solution without defined exact days for times between 7:00 and 23:00 is:

#reshape for hours to one column date
L = [pd.date_range(s,e, freq='H') 
     for s, e in df[['reserved_fr','reserved_to']].to_numpy()]
df['date'] = L

df1 = df.explode('date').drop_duplicates(['court_name','date'])
print (df1)
  court_name          reserved_fr          reserved_to                date
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 08:00:00
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 09:00:00
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 10:00:00
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 11:00:00
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 12:00:00
1    Court 1  2021-11-15T15:00:00  2021-11-15T16:00:00 2021-11-15 15:00:00
1    Court 1  2021-11-15T15:00:00  2021-11-15T16:00:00 2021-11-15 16:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 17:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 18:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 19:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 20:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 21:00:00
3    Court 2  2021-11-15T20:00:00  2021-11-15T21:00:00 2021-11-15 20:00:00
3    Court 2  2021-11-15T20:00:00  2021-11-15T21:00:00 2021-11-15 21:00:00

#added missing values between 7:00 and 23:00 if not exist
def f(x):
    r = pd.date_range(x.index.min().normalize()   pd.Timedelta('7H'),
                      x.index.max().normalize()   pd.Timedelta('23H'), freq='H')
    return x.reindex(r)
        
    
s = df1.set_index('date').groupby('court_name')['court_name'].apply(f)

#create groups for missing values and aggregate first with last
mask = s.notna()
df = (mask.cumsum()[~mask].reset_index(name='new')
          .groupby(['court_name','new'])['level_1']
          .agg(['min','max'])
          .reset_index(level=1, drop=True))

#change by subtract and add 1 hour if not 7:00 and 23:00
df['min'] = df['min'].where(df['min'].dt.hour.eq(7), df['min'] - pd.Timedelta('1H'))
df['max'] = df['max'].where(df['max'].dt.hour.eq(23), df['max']   pd.Timedelta('1H'))

print (df)
                           min                 max
court_name                                        
Court 1    2021-11-15 07:00:00 2021-11-15 08:00:00
Court 1    2021-11-15 12:00:00 2021-11-15 15:00:00
Court 1    2021-11-15 21:00:00 2021-11-15 23:00:00
Court 2    2021-11-15 07:00:00 2021-11-15 20:00:00
Court 2    2021-11-15 21:00:00 2021-11-15 23:00:00

CodePudding user response：

Using `pandas`'s built-in `pd.DatetimeIndex` features

Here's a short solution I think works quite well to utilize the existing pandas DatetimeIndex series functionalities most tersely.
This solution assumes one-hour slot reservations, and that you meant it closes at 11pm sharp (so e.g. no one could book say 11pm-midnight, just to be super clear - hence you'll see I set the date ranges to end instead actually on the 22nd hour).

1) Create new column "reserved_hours" of dtype `pd.DatetimeIndex` for all reservations (per court)

Note: This introduces a list trickiness, easily handled though, where we will have to later on ensure we combine and remove dupes from all such lists of reservations (stored as pd.DatetimeIndex objects) - such functionality is totally built-in to pandas already

Using pandas's built-in pd.DatetimeIndex features

1) Create new column "reserved_hours" of dtype pd.DatetimeIndex for all reservations (per court)

Using `pandas`'s built-in `pd.DatetimeIndex` features

1) Create new column "reserved_hours" of dtype `pd.DatetimeIndex` for all reservations (per court)