How to group by hour in pandas and check data for every hour?-CodePudding

I have a dataframe df

hour  calls  received appointment
6:48  4        2         2
4:02  21       3         2
12:52 31       7         4
2:14  32       5         2
6:45  13       3         2

The hour column is string

I want to calculate the sum and group by the hour in the format like 1-2,2-3.

My approach would be:

df[['hour','calls','received','appointment']].groupby('hour').sum()

Also, I would like to check for every hour, if data is not present for any hour, fill it by zero.

I want the output as :

hour calls received appointment
0-1   0     0         0
1-2   0     0         0
2-3   32    5         2
3-4   0     0         0
4-5   21    3         2
5-6   0     0         0
6-7   17    5         4
...

CodePudding user response：

You can use pandas.resmaple base hour then compute sum on ['calls','received','appointment'] and at the end rename datetime to desired format.

df['time'] = pd.to_datetime(df['hour'])
df = df.set_index('time').resample('H')[['calls','received','appointment']].sum().reset_index()

# rename 2022-07-24 02:00:00 -> (2-3)
df['time'] = df['time'].apply(lambda x: f"{x.hour}-{x.hour 1}")
print(df)

     time  calls  received  appointment
0     2-3     32         5            2
1     3-4      0         0            0
2     4-5     21         3            2
3     5-6      0         0            0
4     6-7     17         5            4
5     7-8      0         0            0
6     8-9      0         0            0
7    9-10      0         0            0
8   10-11      0         0            0
9   11-12      0         0            0
10  12-13     31         7            4

CodePudding user response：

You can use pd.cut on hour column:

# Create labels: 0-1, 1-2, 2-3, ...
labels = [f"{i}-{i 1}" for i in range(24)]

# Extract the hour part and convert it as int
hours = df['hour'].str.split(':').str[0].astype(int)

# Classify your data. The output is a Series with a 'category' dtype
df['hour'] = pd.cut(hours, range(25), labels=labels, right=False)

# Group by range and sum [...]
out = df.groupby('hour', as_index=False).sum()

[...] and as the grouper is Categorical, all observations are displayed :)

observed : bool, default False
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.

Output:

>>> out
     hour  calls  received  appointment
0     0-1      0         0            0
1     1-2      0         0            0
2     2-3     32         5            2
3     3-4      0         0            0
4     4-5     21         3            2
5     5-6      0         0            0
6     6-7     17         5            4
7     7-8      0         0            0
8     8-9      0         0            0
9    9-10      0         0            0
10  10-11      0         0            0
11  11-12      0         0            0
12  12-13     31         7            4
13  13-14      0         0            0
14  14-15      0         0            0
15  15-16      0         0            0
16  16-17      0         0            0
17  17-18      0         0            0
18  18-19      0         0            0
19  19-20      0         0            0
20  20-21      0         0            0
21  21-22      0         0            0
22  22-23      0         0            0
23  23-24      0         0            0

CodePudding user response：

# Split into separate columns:
df[['hour', 'minute']] = df.hour.str.split(':', expand=True).astype(int)

# Pivot the table, summing hours together:
cols = ['calls', 'received', 'appointment']
df = df.pivot_table(index='hour', values=cols, aggfunc='sum')

# Create a new interval range DataFrame:
new_index = pd.DataFrame(pd.interval_range(0, 13), columns=['hour'])

# Join the interval range and the dataframe:
df = new_index.join(df, how='outer')

# Fill NaN and turn back to integers:
df[cols] = df[cols].fillna(0).astype(int)

# Printing to match your col order:
print(df[['hour']   cols])

Output:

        hour  calls  received  appointment
0     (0, 1]      0         0            0
1     (1, 2]      0         0            0
2     (2, 3]     32         5            2
3     (3, 4]      0         0            0
4     (4, 5]     21         3            2
5     (5, 6]      0         0            0
6     (6, 7]     17         5            4
7     (7, 8]      0         0            0
8     (8, 9]      0         0            0
9    (9, 10]      0         0            0
10  (10, 11]      0         0            0
11  (11, 12]      0         0            0
12  (12, 13]     31         7            4