Show all missing months based on id and value column should be filled with average of last above 3 r-CodePudding

I have a pandas dataframe like below having 3 columns [ID, Datetime , Value]

based on each individual ID if any of the months got missed that particular months should be filled with NaN and calculate average of above 3 rows wherever row is filled with NaN

 ```
      Input Dataframe   
        Id     Datetime    Value
       1002    18-01-2021   45
       1002    27-02-2021   36
       1002    19-04-2021   54
       1002    03-07-2021   63
       1002    18-10-2021   72
       1003    15-01-2021   68
       1003    19-04-2021   90
       1004    08-03-2021   14
       1004    03-07-2021   50
       1005    18-05-2021   23
       1005    12-10-2021   108
   
    Output Dataframe
       Id     Datetime    value  
       1002    18-01-2021   45
       1002    27-02-2021   36
      *1002    01-03-2021   32.3      16 45 36=97/3= 32.3 [16 value is 01-12-2020]
       1002    19-04-2021   54
     **1002    01-05-2021   40.7      54 32.3 36= 122.3/3 =40.7
     **1002    03-06-2021   42.3      40.7 54 32.3=127/3 =42.3
       1002    03-07-2021   63
     **1002    01-08-2021   48.6      63 42.3 40.7=146/3=48.6
     **1002    01-09-2021   51.3      48.6 63 42.3= 153.9/3= 51.3
       1002    18-10-2021   72
     **1002    15-11-2021   57.3      72 51.3 48.6=171.9/3=57.3
     **1002    07-12-2021   60.2      57.3 72 51.3=180.6/3=60.2
       1003    01-01-2021   68
       1003    01-02-2021   66      
       1003    01-03-2021   43
       1003    19-04-2021   90
     **1003    01-05-2021   66.3      90 43 66=199/3 = 66.3      
     **1003    01-06-2021   66.4      90 43 66=199/3 = 66.3
     **1003    01-07-2021   74.2      66.4 66.3 90= 222.7/3= 74.2     
     **1003    01-08-2021   68.96     74.2 66.4 66.3=206.9/3= 68.96
     **1003    01-09-2021   69.85     68.96 74.2 66.4=209.56/3=69.85  
     **1003    01-10-2021    71       69.85 68.96 74.2= 213/3=71
       1003    01-11-2021    45           
     **1003    01-12-2021    61.95    45 71 69.85=185.85/3=61.95    
       1004    08-01-2021    36
       1004    01-02-2021    51 
       1004    01-03-2021    41
     **1004    01-04-2021   42.6     41 51 36=128/3=42.6 
     **1004    01-05-2021   44.8     42.6 41 51=134.6/3=44.8 
       1004    01-06-2021   42
       1004    16-07-2021   50
       1004    12-08-2021   14
       1004    02-09-2021   50
       1004    05-10-2021   45
     **1004    01-11-2021   36.3.    45 50 14=109/3=36.3
       1004    01-12-2021   50

  ```

The values should be filled based on groupby of ID column considering individual month of particular year

CodePudding user response：

Try this

df.set_index('Datetime',inplace=True)
l = pd.Series()
for i in df['Id'].unique():
    temp_df = df.loc[df['Id'].eq(i)]     
    s = pd.Series(temp_df['Value'].fillna(temp_df['Value'].rolling('3M').avg()))
    ns = pd.concat([l,s])
    # This variable will be your wanted series

Since there is no sample data i am not sure this will work, and i am sure there is a way through groupby

CodePudding user response：

df_a['YearMonth'] = pd.to_datetime(df_a['YearMonth'])
df = (df_a.set_index('YearMonth')
      .reindex(pd.date_range('2021-01-01', '2023-12-01', freq='MS'))
      .rename_axis(['YearMonth'])
      .fillna("NaN")
      .reset_index())

am able to fill missing months with NaN but how to calculate nan values based on above 3 rows average