I have a data table that looks like this:
ID ARRIVAL_DATE_TIME DISPOSITION_DATE
1 2021-11-07 08:35:00 2021-11-07 17:58:00
2 2021-11-07 13:16:00 2021-11-08 02:52:00
3 2021-11-07 15:12:00 2021-11-07 21:08:00
I want to be able to count the number of patients in our location by date/hour and hour. I imagine I would eventually have to transform this data into a format seen below and then create a pivot table, but I'm not sure how to first transform this data. So for example, ID 1 would have a row for each date/hour and hour between '2021-11-07 08:35:00' and '2021-11-07 17:58:00'.
ID DATE_HOUR_IN_ED HOUR_IN_ED
1 2021-11-07 08:00:00 8:00
1 2021-11-07 09:00:00 9:00
1 2021-11-07 10:00:00 10:00
1 2021-11-07 11:00:00 11:00
...
2 2021-11-07 13:00:00 13:00
2 2021-11-07 14:00:00 14:00
2 2021-11-07 15:00:00 15:00
....
CodePudding user response:
Use to_datetime
with Series.dt.floor
for remove times, then concat
with repeat date_range
and last create DataFrame
by constructor:
df['ARRIVAL_DATE_TIME'] = pd.to_datetime(df['ARRIVAL_DATE_TIME']).dt.floor('H')
s = pd.concat([pd.Series(r.ID,pd.date_range(r.ARRIVAL_DATE_TIME,
r.DISPOSITION_DATE, freq='H'))
for r in df.itertuples()])
df1 = pd.DataFrame({'ID':s.to_numpy(),
'DATE_HOUR_IN_ED':s.index,
'HOUR_IN_ED': s.index.strftime('%H:%M')})
print (df1)
ID DATE_HOUR_IN_ED HOUR_IN_ED
0 1 2021-11-07 08:00:00 08:00
1 1 2021-11-07 09:00:00 09:00
2 1 2021-11-07 10:00:00 10:00
3 1 2021-11-07 11:00:00 11:00
4 1 2021-11-07 12:00:00 12:00
5 1 2021-11-07 13:00:00 13:00
6 1 2021-11-07 14:00:00 14:00
7 1 2021-11-07 15:00:00 15:00
8 1 2021-11-07 16:00:00 16:00
9 1 2021-11-07 17:00:00 17:00
10 2 2021-11-07 13:00:00 13:00
11 2 2021-11-07 14:00:00 14:00
12 2 2021-11-07 15:00:00 15:00
13 2 2021-11-07 16:00:00 16:00
14 2 2021-11-07 17:00:00 17:00
15 2 2021-11-07 18:00:00 18:00
16 2 2021-11-07 19:00:00 19:00
17 2 2021-11-07 20:00:00 20:00
18 2 2021-11-07 21:00:00 21:00
19 2 2021-11-07 22:00:00 22:00
20 2 2021-11-07 23:00:00 23:00
21 2 2021-11-08 00:00:00 00:00
22 2 2021-11-08 01:00:00 01:00
23 2 2021-11-08 02:00:00 02:00
24 3 2021-11-07 15:00:00 15:00
25 3 2021-11-07 16:00:00 16:00
26 3 2021-11-07 17:00:00 17:00
27 3 2021-11-07 18:00:00 18:00
28 3 2021-11-07 19:00:00 19:00
29 3 2021-11-07 20:00:00 20:00
30 3 2021-11-07 21:00:00 21:00
Alternative solution:
df['ARRIVAL_DATE_TIME'] = pd.to_datetime(df['ARRIVAL_DATE_TIME']).dt.floor('H')
L = [pd.date_range(s,e, freq='H')
for s, e in df[['ARRIVAL_DATE_TIME','DISPOSITION_DATE']].to_numpy()]
df['DATE_HOUR_IN_ED'] = L
df = (df.drop(['ARRIVAL_DATE_TIME','DISPOSITION_DATE'], axis=1)
.explode('DATE_HOUR_IN_ED')
.reset_index(drop=True)
.assign(HOUR_IN_ED = lambda x: x['DATE_HOUR_IN_ED'].dt.strftime('%H:%M')))
CodePudding user response:
Try this:
import pandas as pd
import numpy as np
df = pd.read_excel('test.xls')
df1 = (df.set_index(['ID'])
.assign(DATE_HOUR_IN_ED=lambda x: [pd.date_range(s,d, freq='H')
for s,d in zip(x.ARRIVAL_DATE_TIME, x.DISPOSITION_DATE)])
['DATE_HOUR_IN_ED'].explode()
.reset_index()
)
df1['DATE_HOUR_IN_ED'] = df1['DATE_HOUR_IN_ED'].dt.floor('H')
df1['HOUR_IN_ED'] = df1['DATE_HOUR_IN_ED'].dt.strftime('%H:%M')
print(df1)
Output:
ID DATE_HOUR_IN_ED HOUR_IN_ED
0 1 2021-11-07 08:00:00 08:00
1 1 2021-11-07 09:00:00 09:00
2 1 2021-11-07 10:00:00 10:00
3 1 2021-11-07 11:00:00 11:00
4 1 2021-11-07 12:00:00 12:00
5 1 2021-11-07 13:00:00 13:00
6 1 2021-11-07 14:00:00 14:00
7 1 2021-11-07 15:00:00 15:00
8 1 2021-11-07 16:00:00 16:00
9 1 2021-11-07 17:00:00 17:00
10 2 2021-11-07 13:00:00 13:00
11 2 2021-11-07 14:00:00 14:00
12 2 2021-11-07 15:00:00 15:00
13 2 2021-11-07 16:00:00 16:00
14 2 2021-11-07 17:00:00 17:00
15 2 2021-11-07 18:00:00 18:00
16 2 2021-11-07 19:00:00 19:00
17 2 2021-11-07 20:00:00 20:00
18 2 2021-11-07 21:00:00 21:00
19 2 2021-11-07 22:00:00 22:00
20 2 2021-11-07 23:00:00 23:00
21 2 2021-11-08 00:00:00 00:00
22 2 2021-11-08 01:00:00 01:00
23 2 2021-11-08 02:00:00 02:00
24 3 2021-11-07 15:00:00 15:00
25 3 2021-11-07 16:00:00 16:00
26 3 2021-11-07 17:00:00 17:00
27 3 2021-11-07 18:00:00 18:00
28 3 2021-11-07 19:00:00 19:00
29 3 2021-11-07 20:00:00 20:00