Home > Enterprise >  Efficient approach to append zeroes to product id which are not present on a particular date
Efficient approach to append zeroes to product id which are not present on a particular date

Time:09-12

I'm trying to append zeroes to product id's data which are not present for any particular date and my code takes lot of time to append zeroes. Looking for an alternate approach in pandas/numpy.

Here is the sample data:

rpt_date    product_id    total_views   total_cart_adds total_order_unit    total_gmv  
30-07-2022  mp000000006243574   7   1                   0                   0  
30-07-2022  mp000000006292285   1   0                   0                   0  
30-07-2022  mp000000006294016   18  1                   0                   0  
31-07-2022  mp000000006243574   8   2                   0                   0  
31-07-2022  mp000000006292285   5   0                   0                   0

For Eg if data for product id 'mp000000006294016' is not present on 31 or any other day then it should append 0 for the respective columns

Below is my code

df_ans=pd.DataFrame()
for x in prod_df['product_id']:
df2=pd.DataFrame()
    print(prod_df[prod_df['product_id']==x])

    df2 = prod_df[prod_df['product_id']==x]

    if df2.shape[0] == 1:
        formatted_date1 = time.strptime(prod_df['rpt_date'][0], "%d-%m-%Y")
        formatted_date2 = time.strptime('30-07-2022', "%d-%m-%Y")
        if formatted_date1==formatted_date2:
             df2.loc[-1] = ['31-07-2022', x, '0', '0','0','0']  # adding a row
             df2.index = df2.index   1  # shifting index
             df2 = df2.sort_index()
        else:
             df2.loc[-1] = ['30-07-2022', x, '0', '0','0','0']  # adding a row
             df2.index = df2.index   1  # shifting index
             df2 = df2.sort_index()
        print(df2)
    df_ans= pd.concat([df_ans,df2])
    print("***************************************")

CodePudding user response:

You should almost certainly be solving this problem in the query that generated the data to begin with. But one way to work around it is to create a "blank" dataFrame with all the rpt_date and product_id combinations, then concat that to the original data and drop all the duplicate rows:

df_prid = prod_df[['product_id']].drop_duplicates()
df_date = prod_df[['rpt_date']].drop_duplicates()
df_blank = df_date.merge(df_prid, how='cross')
df_blank[['total_views', 'total_cart_adds', 'total_order_unit', 'total_gmv']] = [0,0,0,0]
df_final = pd.concat([prod_df, df_blank]).drop_duplicates(subset=['rpt_date', 'product_id'], keep='first').reset_index()

Output:

   index    rpt_date         product_id  total_views  total_cart_adds  total_order_unit  total_gmv
0      0  30-07-2022  mp000000006243574            7                1                 0          0
1      1  30-07-2022  mp000000006292285            1                0                 0          0
2      2  30-07-2022  mp000000006294016           18                1                 0          0
3      3  31-07-2022  mp000000006243574            8                2                 0          0
4      4  31-07-2022  mp000000006292285            5                0                 0          0
5      5  31-07-2022  mp000000006294016            0                0                 0          0

CodePudding user response:

First convert values to datetimes for correct ordering, create MultiIndex and add missing categories by combinations all rpt_date and product_id by DataFrame.reindex, last convert to original format in Series.dt.strftime:

df['rpt_date'] = pd.to_datetime(df['rpt_date'], dayfirst=True)

mux = pd.MultiIndex.from_product([df['rpt_date'].unique(), df['product_id'].unique()], 
                                 names=['rpt_date','product_id'])
df = df.set_index(['rpt_date','product_id']).reindex(mux, fill_value=0).reset_index()
df['rpt_date'] = df['rpt_date'].dt.strftime("%d-%m-%Y")
print (df)
     rpt_date         product_id  total_views  total_cart_adds  \
0  30-07-2022  mp000000006243574            7                1   
1  30-07-2022  mp000000006292285            1                0   
2  30-07-2022  mp000000006294016           18                1   
3  31-07-2022  mp000000006243574            8                2   
4  31-07-2022  mp000000006292285            5                0   
5  31-07-2022  mp000000006294016            0                0   

   total_order_unit  total_gmv  
0                 0          0  
1                 0          0  
2                 0          0  
3                 0          0  
4                 0          0  
5                 0          0  
  • Related