Home > Blockchain >  How to insert missing dates into the index wthin a dataframe with groupby
How to insert missing dates into the index wthin a dataframe with groupby

Time:10-08

So I'm going to try and explain this, but the outputs below make much more sense! I have a dataframe, df, and I want the whole date range to be the min and max of the Date column for each code. Just wondered whether this was possible? The desired output shown below I hope will explain this.

Output:

              Code                Count
Date                            
2021-01-31    H8101                  15
2021-02-28    H8101                   7
2021-03-31    H8101                  19
2021-04-30    H8101                  12
2021-05-31    H8101                  20
2021-06-30    H8101                  12
2021-07-31    H8101                   7
2021-08-31    H8101                  10
2021-09-30    H8101                  16
2021-10-31    H8101                   4
2021-03-31    H8162                   1
2021-05-31    H8162                   2
2021-09-30    H8162                   1
2019-12-31    H8406                   2

Desired Output:

              Code                Count
Date    
2019-12-31    H8101                 NaN
2020-01-31    H8101                 NaN
2020-02-28    H8101                 NaN
2020-03-31    H8101                 NaN
2020-04-30    H8101                 NaN
2020-05-31    H8101                 NaN
2020-06-30    H8101                 NaN
2020-07-31    H8101                 NaN
2020-08-31    H8101                 NaN
2020-09-30    H8101                 NaN
2020-10-31    H8101                 NaN
2020-11-30    H8101                 NaN
2020-12-31    H8101                 NaN                       
2021-01-31    H8101                  15
2021-02-28    H8101                   7
2021-03-31    H8101                  19
2021-04-30    H8101                  12
2021-05-31    H8101                  20
2021-06-30    H8101                  12
2021-07-31    H8101                   7
2021-08-31    H8101                  10
2021-09-30    H8101                  16
2021-10-31    H8101                   4
2019-12-31    H8162                 NaN
2020-01-31    H8162                 NaN
2020-02-28    H8162                 NaN
2020-03-31    H8162                 NaN
2020-04-30    H8162                 NaN
2020-05-31    H8162                 NaN
2020-06-30    H8162                 NaN
2020-07-31    H8162                 NaN
2020-08-31    H8162                 NaN
2020-09-30    H8162                 NaN
2020-10-31    H8162                 NaN
2020-11-30    H8162                 NaN
2020-12-31    H8162                 NaN
2021-01-31    H8162                 NaN
2021-02-28    H8162                 NaN
2021-03-31    H8162                   1
2021-04-30    H8162                 NaN
2021-05-31    H8162                   2
2021-06-30    H8162                 NaN
2021-07-31    H8162                 NaN
2021-08-31    H8162                 NaN
2021-09-30    H8162                   1
2021-10-31    H8162                 NaN
2019-12-31    H8406                   2
2020-01-31    H8406                 NaN
2020-02-28    H8406                 NaN
2020-03-31    H8406                 NaN
2020-04-30    H8406                 NaN
2020-05-31    H8406                 NaN
2020-06-30    H8406                 NaN
2020-07-31    H8406                 NaN
2020-08-31    H8406                 NaN
2020-09-30    H8406                 NaN
2020-10-31    H8406                 NaN
2020-11-30    H8406                 NaN
2020-12-31    H8406                 NaN
2021-01-31    H8406                 NaN
2021-02-28    H8406                 NaN
2021-03-31    H8406                 NaN
2021-04-30    H8406                 NaN
2021-05-31    H8406                 NaN
2021-06-30    H8406                 NaN
2021-07-31    H8406                 NaN
2021-08-31    H8406                 NaN
2021-09-30    H8406                 NaN
2021-10-31    H8406                 NaN

CodePudding user response:

Given df dataframe as shown in question, first create index idx containing full range of daily dates:

idx = pd.date_range(df.Date.min(),df.Date.max(), name='Date')

Then create function reindexing dataframe using idx:

def df_resample(df_):
    return df_.set_index('Date').reindex(idx).resample('M').first()

Finally apply above function to each group of Code in dataframe df

df_result = df.groupby(['Code']).apply(df_resample).drop(columns='Code').reset_index()

The result is:

    Code    Date    Count
0   H8101   2019-12-31  
1   H8101   2020-01-31  
2   H8101   2020-02-29  
3   H8101   2020-03-31  
4   H8101   2020-04-30  
5   H8101   2020-05-31  
6   H8101   2020-06-30  
7   H8101   2020-07-31  
8   H8101   2020-08-31  
9   H8101   2020-09-30  
10  H8101   2020-10-31  
11  H8101   2020-11-30  
12  H8101   2020-12-31  
13  H8101   2021-01-31  15.0
14  H8101   2021-02-28  7.0
15  H8101   2021-03-31  19.0
16  H8101   2021-04-30  12.0
17  H8101   2021-05-31  20.0
18  H8101   2021-06-30  12.0
19  H8101   2021-07-31  7.0
20  H8101   2021-08-31  10.0
21  H8101   2021-09-30  16.0
22  H8101   2021-10-31  4.0
23  H8162   2019-12-31  
24  H8162   2020-01-31  
25  H8162   2020-02-29  
26  H8162   2020-03-31  
27  H8162   2020-04-30  
28  H8162   2020-05-31  
29  H8162   2020-06-30  
30  H8162   2020-07-31  
31  H8162   2020-08-31  
32  H8162   2020-09-30  
33  H8162   2020-10-31  
34  H8162   2020-11-30  
35  H8162   2020-12-31  
36  H8162   2021-01-31  
37  H8162   2021-02-28  
38  H8162   2021-03-31  1.0
39  H8162   2021-04-30  
40  H8162   2021-05-31  2.0
41  H8162   2021-06-30  
42  H8162   2021-07-31  
43  H8162   2021-08-31  
44  H8162   2021-09-30  1.0
45  H8162   2021-10-31  
46  H8406   2019-12-31  2.0
47  H8406   2020-01-31  
48  H8406   2020-02-29  
49  H8406   2020-03-31  
50  H8406   2020-04-30  
51  H8406   2020-05-31  
52  H8406   2020-06-30  
53  H8406   2020-07-31  
54  H8406   2020-08-31  
55  H8406   2020-09-30  
56  H8406   2020-10-31  
57  H8406   2020-11-30  
58  H8406   2020-12-31  
59  H8406   2021-01-31  
60  H8406   2021-02-28  
61  H8406   2021-03-31  
62  H8406   2021-04-30  
63  H8406   2021-05-31  
64  H8406   2021-06-30  
65  H8406   2021-07-31  
66  H8406   2021-08-31  
67  H8406   2021-09-30  
68  H8406   2021-10-31
  • Related