Home > Enterprise >  Efficient way to populate missing indexes from pandas group by
Efficient way to populate missing indexes from pandas group by

Time:10-05

I grouped a column in a pandas dataframe by the number of occurrences of an event per hour of the day like so:

df_sep.hour.groupby(df_sep.time.dt.hour).size()

Which gives the following result:

time
2     31
3      6
4      7
5      4
6     38
7      9
8      5
9     31
10     8
11     2
12     5
13    30
14     1
15     1
16    28
18     1
20     4
21    29
Name: hour, dtype: int64

For plotting, I would like to complete the series for each hour of the day. ie, there are no occurrences at midnight (0). So for every missing hour, I would like to create that index and add zero to the corresponding value.

To solve this I created two lists (x and y) using the following loop, but it feels a bit hacky... is there a better way to solve this?

x = []
y = []

for i in range(24):
    if i not in df_sep.hour.groupby(df_sep.time.dt.hour).size().index:
        x.append(i)
        y.append(0)
    else:
        x.append(i)
        y.append(df_sep.hour.groupby(df_sep.time.dt.hour).size().loc[i])

result:

for i, j in zip(x, y):
    print(i, j)

0 0
1 0
2 31
3 6
4 7
5 4
6 38
7 9
8 5
9 31
10 8
11 2
12 5
13 30
14 1
15 1
16 28
17 0
18 1
19 0
20 4
21 29
22 0
23 0

CodePudding user response:

Use Series.reindex with range(24):

df_sep.hour.groupby(df_sep.time.dt.hour).size().reindex(range(24), fill_value=0)
  • Related