I have a pandas dataframe like below having 3 columns [ID, Datetime , Value]
based on each individual ID if any of the months got missed that particular months should be filled with NaN and calculate average of above 3 rows wherever row is filled with NaN
```
Input Dataframe
Id Datetime Value
1002 18-01-2021 45
1002 27-02-2021 36
1002 19-04-2021 54
1002 03-07-2021 63
1002 18-10-2021 72
1003 15-01-2021 68
1003 19-04-2021 90
1004 08-03-2021 14
1004 03-07-2021 50
1005 18-05-2021 23
1005 12-10-2021 108
Output Dataframe
Id Datetime value
1002 18-01-2021 45
1002 27-02-2021 36
*1002 01-03-2021 32.3 16 45 36=97/3= 32.3 [16 value is 01-12-2020]
1002 19-04-2021 54
**1002 01-05-2021 40.7 54 32.3 36= 122.3/3 =40.7
**1002 03-06-2021 42.3 40.7 54 32.3=127/3 =42.3
1002 03-07-2021 63
**1002 01-08-2021 48.6 63 42.3 40.7=146/3=48.6
**1002 01-09-2021 51.3 48.6 63 42.3= 153.9/3= 51.3
1002 18-10-2021 72
**1002 15-11-2021 57.3 72 51.3 48.6=171.9/3=57.3
**1002 07-12-2021 60.2 57.3 72 51.3=180.6/3=60.2
1003 01-01-2021 68
1003 01-02-2021 66
1003 01-03-2021 43
1003 19-04-2021 90
**1003 01-05-2021 66.3 90 43 66=199/3 = 66.3
**1003 01-06-2021 66.4 90 43 66=199/3 = 66.3
**1003 01-07-2021 74.2 66.4 66.3 90= 222.7/3= 74.2
**1003 01-08-2021 68.96 74.2 66.4 66.3=206.9/3= 68.96
**1003 01-09-2021 69.85 68.96 74.2 66.4=209.56/3=69.85
**1003 01-10-2021 71 69.85 68.96 74.2= 213/3=71
1003 01-11-2021 45
**1003 01-12-2021 61.95 45 71 69.85=185.85/3=61.95
1004 08-01-2021 36
1004 01-02-2021 51
1004 01-03-2021 41
**1004 01-04-2021 42.6 41 51 36=128/3=42.6
**1004 01-05-2021 44.8 42.6 41 51=134.6/3=44.8
1004 01-06-2021 42
1004 16-07-2021 50
1004 12-08-2021 14
1004 02-09-2021 50
1004 05-10-2021 45
**1004 01-11-2021 36.3. 45 50 14=109/3=36.3
1004 01-12-2021 50
```
The values should be filled based on groupby of ID column considering individual month of particular year
CodePudding user response:
Try this
df.set_index('Datetime',inplace=True)
l = pd.Series()
for i in df['Id'].unique():
temp_df = df.loc[df['Id'].eq(i)]
s = pd.Series(temp_df['Value'].fillna(temp_df['Value'].rolling('3M').avg()))
ns = pd.concat([l,s])
# This variable will be your wanted series
Since there is no sample data i am not sure this will work, and i am sure there is a way through groupby
CodePudding user response:
df_a['YearMonth'] = pd.to_datetime(df_a['YearMonth'])
df = (df_a.set_index('YearMonth')
.reindex(pd.date_range('2021-01-01', '2023-12-01', freq='MS'))
.rename_axis(['YearMonth'])
.fillna("NaN")
.reset_index())
am able to fill missing months with NaN but how to calculate nan values based on above 3 rows average