I have a dataframe df
hour calls received appointment
6:48 4 2 2
4:02 21 3 2
12:52 31 7 4
2:14 32 5 2
6:45 13 3 2
The hour
column is string
I want to calculate the sum and group by the hour in the format like 1-2
,2-3
.
My approach would be:
df[['hour','calls','received','appointment']].groupby('hour').sum()
Also, I would like to check for every hour, if data is not present for any hour, fill it by zero.
I want the output as :
hour calls received appointment
0-1 0 0 0
1-2 0 0 0
2-3 32 5 2
3-4 0 0 0
4-5 21 3 2
5-6 0 0 0
6-7 17 5 4
...
CodePudding user response:
You can use pandas.resmaple
base hour
then compute sum on ['calls','received','appointment']
and at the end rename datetime to desired format.
df['time'] = pd.to_datetime(df['hour'])
df = df.set_index('time').resample('H')[['calls','received','appointment']].sum().reset_index()
# rename 2022-07-24 02:00:00 -> (2-3)
df['time'] = df['time'].apply(lambda x: f"{x.hour}-{x.hour 1}")
print(df)
time calls received appointment
0 2-3 32 5 2
1 3-4 0 0 0
2 4-5 21 3 2
3 5-6 0 0 0
4 6-7 17 5 4
5 7-8 0 0 0
6 8-9 0 0 0
7 9-10 0 0 0
8 10-11 0 0 0
9 11-12 0 0 0
10 12-13 31 7 4
CodePudding user response:
You can use pd.cut
on hour
column:
# Create labels: 0-1, 1-2, 2-3, ...
labels = [f"{i}-{i 1}" for i in range(24)]
# Extract the hour part and convert it as int
hours = df['hour'].str.split(':').str[0].astype(int)
# Classify your data. The output is a Series with a 'category' dtype
df['hour'] = pd.cut(hours, range(25), labels=labels, right=False)
# Group by range and sum [...]
out = df.groupby('hour', as_index=False).sum()
[...] and as the grouper is Categorical, all observations are displayed :)
observed : bool, default False
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
Output:
>>> out
hour calls received appointment
0 0-1 0 0 0
1 1-2 0 0 0
2 2-3 32 5 2
3 3-4 0 0 0
4 4-5 21 3 2
5 5-6 0 0 0
6 6-7 17 5 4
7 7-8 0 0 0
8 8-9 0 0 0
9 9-10 0 0 0
10 10-11 0 0 0
11 11-12 0 0 0
12 12-13 31 7 4
13 13-14 0 0 0
14 14-15 0 0 0
15 15-16 0 0 0
16 16-17 0 0 0
17 17-18 0 0 0
18 18-19 0 0 0
19 19-20 0 0 0
20 20-21 0 0 0
21 21-22 0 0 0
22 22-23 0 0 0
23 23-24 0 0 0
CodePudding user response:
# Split into separate columns:
df[['hour', 'minute']] = df.hour.str.split(':', expand=True).astype(int)
# Pivot the table, summing hours together:
cols = ['calls', 'received', 'appointment']
df = df.pivot_table(index='hour', values=cols, aggfunc='sum')
# Create a new interval range DataFrame:
new_index = pd.DataFrame(pd.interval_range(0, 13), columns=['hour'])
# Join the interval range and the dataframe:
df = new_index.join(df, how='outer')
# Fill NaN and turn back to integers:
df[cols] = df[cols].fillna(0).astype(int)
# Printing to match your col order:
print(df[['hour'] cols])
Output:
hour calls received appointment
0 (0, 1] 0 0 0
1 (1, 2] 0 0 0
2 (2, 3] 32 5 2
3 (3, 4] 0 0 0
4 (4, 5] 21 3 2
5 (5, 6] 0 0 0
6 (6, 7] 17 5 4
7 (7, 8] 0 0 0
8 (8, 9] 0 0 0
9 (9, 10] 0 0 0
10 (10, 11] 0 0 0
11 (11, 12] 0 0 0
12 (12, 13] 31 7 4