creating and filling empty dates with zeroes-CodePudding

I have a dataframe df

df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/x_restock.csv')
df

I want to fill the missing dates for each Product_ID with restocking_events=0. To start, I have created a date_range dataframe using dfdate=pd.DataFrame({'Date':pd.date_range(simple.Date.min(), simple.Date.max())}) where simple is some master dataframe and min and max dates are '2021-11-13' and '2021-11-30'.

CodePudding user response：

Use:

#added parse_dates for datetimes
df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/x_restock.csv', 
               parse_dates=['Date'])

First solution is for add complete range of datetimes from minimal and maximal datetimes in DataFrame.reindex by MultiIndex.from_product:

mux = pd.MultiIndex.from_product([df['Product_ID'].unique(),
                                  pd.date_range(df.Date.min(), df.Date.max())], 
                                 names=['Product_ID','Dates'])
                                  
df1 = df.set_index(['Product_ID','Date']).reindex(mux, fill_value=0).reset_index()
print (df1)
      Product_ID      Dates  restocking_events
0        1004746 2021-11-13                  0
1        1004746 2021-11-14                  0
2        1004746 2021-11-15                  0
3        1004746 2021-11-16                  1
4        1004746 2021-11-17                  0
         ...        ...                ...
3379      976460 2021-11-26                  1
3380      976460 2021-11-27                  0
3381      976460 2021-11-28                  0
3382      976460 2021-11-29                  0
3383      976460 2021-11-30                  0

[3384 rows x 3 columns]

Another idea with helper DataFrame:

from  itertools import product

dfdate=pd.DataFrame(product(df['Product_ID'].unique(), 
                            pd.date_range(df.Date.min(), df.Date.max())),
                    columns=['Product_ID','Date'])
print (dfdate)
      Product_ID       Date
0        1004746 2021-11-13
1        1004746 2021-11-14
2        1004746 2021-11-15
3        1004746 2021-11-16
4        1004746 2021-11-17
         ...        ...
3379      976460 2021-11-26
3380      976460 2021-11-27
3381      976460 2021-11-28
3382      976460 2021-11-29
3383      976460 2021-11-30

[3384 rows x 2 columns]

df = dfdate.merge(df, how='left').fillna({'restocking_events':0}, downcast='int')
print (df)
      Product_ID       Date  restocking_events
0        1004746 2021-11-13                  0
1        1004746 2021-11-14                  0
2        1004746 2021-11-15                  0
3        1004746 2021-11-16                  1
4        1004746 2021-11-17                  0
         ...        ...                ...
3379      976460 2021-11-26                  1
3380      976460 2021-11-27                  0
3381      976460 2021-11-28                  0
3382      976460 2021-11-29                  0
3383      976460 2021-11-30                  0

[3384 rows x 3 columns]

Or if need consecutive datetimes per groups use DataFrame.asfreq:

df2 = (df.set_index('Date')
         .groupby('Product_ID')['restocking_events']
         .apply(lambda x: x.asfreq('d', fill_value=0))
         .reset_index())
print (df2)
      Product_ID       Date  restocking_events
0         112714 2021-11-15                  1
1         112714 2021-11-16                  1
2         112714 2021-11-17                  0
3         112714 2021-11-18                  1
4         112714 2021-11-19                  0
         ...        ...                ...
2209     3630918 2021-11-25                  0
2210     3630918 2021-11-26                  0
2211     3630918 2021-11-27                  0
2212     3630918 2021-11-28                  0
2213     3630918 2021-11-29                  1

[2214 rows x 3 columns]