I have a dataframe with times when the court is not free:
df = pd.DataFrame(
[
{'court_name': 'Court 1', 'reserved_fr': '2021-11-15T08:00:00', 'reserved_to': '2021-11-15T12:00:00'},
{'court_name': 'Court 1', 'reserved_fr': '2021-11-15T15:00:00', 'reserved_to': '2021-11-15T16:00:00'},
{'court_name': 'Court 1', 'reserved_fr': '2021-11-15T16:00:00', 'reserved_to': '2021-11-15T21:00:00'},
{'court_name': 'Court 2', 'reserved_fr': '2021-11-15T20:00:00', 'reserved_to': '2021-11-15T21:00:00'}
]
)
| | court_name | reserved_fr | reserved_to |
|---:|:-------------|:--------------------|:--------------------|
| 0 | Court 1 | 2021-11-15T08:00:00 | 2021-11-15T12:00:00 |
| 1 | Court 1 | 2021-11-15T15:00:00 | 2021-11-15T16:00:00 |
| 2 | Court 1 | 2021-11-15T16:00:00 | 2021-11-15T21:00:00 |
| 3 | Court 2 | 2021-11-15T20:00:00 | 2021-11-15T21:00:00 |
If each court working time is from 7 am to 11 pm, I would like to know when the court is free.
For example courts are free:
Court 1 2021-11-15 07:00:00 2021-11-15 08:00:00
Court 1 2021-11-15 12:00:00 2021-11-15 15:00:00
Court 1 2021-11-15 21:00:00 2021-11-15 23:00:00
Court 2 2021-11-15 07:00:00 2021-11-15 20:00:00
Court 2 2021-11-15 21:00:00 2021-11-15 23:00:00
How to transform dataframe to the another dataframe in format above?
CodePudding user response:
Solution without defined exact days for times between 7:00
and 23:00
is:
#reshape for hours to one column date
L = [pd.date_range(s,e, freq='H')
for s, e in df[['reserved_fr','reserved_to']].to_numpy()]
df['date'] = L
df1 = df.explode('date').drop_duplicates(['court_name','date'])
print (df1)
court_name reserved_fr reserved_to date
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 08:00:00
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 09:00:00
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 10:00:00
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 11:00:00
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 12:00:00
1 Court 1 2021-11-15T15:00:00 2021-11-15T16:00:00 2021-11-15 15:00:00
1 Court 1 2021-11-15T15:00:00 2021-11-15T16:00:00 2021-11-15 16:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 17:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 18:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 19:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 20:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 21:00:00
3 Court 2 2021-11-15T20:00:00 2021-11-15T21:00:00 2021-11-15 20:00:00
3 Court 2 2021-11-15T20:00:00 2021-11-15T21:00:00 2021-11-15 21:00:00
#added missing values between 7:00 and 23:00 if not exist
def f(x):
r = pd.date_range(x.index.min().normalize() pd.Timedelta('7H'),
x.index.max().normalize() pd.Timedelta('23H'), freq='H')
return x.reindex(r)
s = df1.set_index('date').groupby('court_name')['court_name'].apply(f)
#create groups for missing values and aggregate first with last
mask = s.notna()
df = (mask.cumsum()[~mask].reset_index(name='new')
.groupby(['court_name','new'])['level_1']
.agg(['min','max'])
.reset_index(level=1, drop=True))
#change by subtract and add 1 hour if not 7:00 and 23:00
df['min'] = df['min'].where(df['min'].dt.hour.eq(7), df['min'] - pd.Timedelta('1H'))
df['max'] = df['max'].where(df['max'].dt.hour.eq(23), df['max'] pd.Timedelta('1H'))
print (df)
min max
court_name
Court 1 2021-11-15 07:00:00 2021-11-15 08:00:00
Court 1 2021-11-15 12:00:00 2021-11-15 15:00:00
Court 1 2021-11-15 21:00:00 2021-11-15 23:00:00
Court 2 2021-11-15 07:00:00 2021-11-15 20:00:00
Court 2 2021-11-15 21:00:00 2021-11-15 23:00:00
CodePudding user response:
Using pandas
's built-in pd.DatetimeIndex
features
- Here's a short solution I think works quite well to utilize the existing
pandas
DatetimeIndex series functionalities most tersely. - This solution assumes one-hour slot reservations, and that you meant it closes at 11pm sharp (so e.g. no one could book say 11pm-midnight, just to be super clear - hence you'll see I set the date ranges to end instead actually on the 22nd hour).
1) Create new column "reserved_hours" of dtype pd.DatetimeIndex
for all reservations (per court)
Note: This introduces a list trickiness, easily handled though, where we will have to later on ensure we combine and remove dupes from all such lists of reservations (stored as
pd.DatetimeIndex
objects) - such functionality is totally built-in to pandas already