How to match Datetimeindex for all but the year?-CodePudding

I have a dataset with missing values and a Datetimeindex. I would like to fill this values with the mean values of other values reported at the same month, day and hour. If there is no values reported at this specific month/day/hour for all years I would like to get the interpolated value mean values of the nearest hour reported. How can I achieve this? Right now my approach is this:

df_Na = df_Na[df_Na['Generation'].isna()]
df_raw = df_raw[~df_raw['Generation'].isna()]
# reduce to month
same_month = df_raw[df_raw.index.month.isin(df_Na.index.month)]
# reduce to same day
same_day = same_month[same_month.index.day.isin(df_Na.index.day)]
# reduce to hour
same_hour = same_day[same_day.index.hour.isin(df_Na.index.hour)]

df_Na are all missing values I liked to fill and df_raw are all reported values from which I liked to get the mean value. I have a huge dataset which is why I would like to avoid a for loop at all cost.

My Data looks like this: df_Na

                     Generation
2017-12-02 19:00:00         NaN
2021-01-12 00:00:00         NaN
2021-01-12 01:00:00         NaN
..............................
2021-02-12 20:00:00         NaN
2021-02-12 21:00:00         NaN
2021-02-12 22:00:00         NaN

df_raw

                     Generation
2015-09-12 00:00:00         0.0
2015-09-12 01:00:00        19.0
2015-09-12 02:00:00         0.0
..............................
2021-12-11 21:00:00         0.0
2021-12-11 22:00:00       180.0
2021-12-11 23:00:00         0.0

many thanks for all help and advice :)

best fidu13

CodePudding user response：

Use GroupBy.transform with mean for averages per MM-DD HH and replace missing values by DataFrame.fillna:

df = df.fillna(df.groupby(df.index.strftime('%m-%d %H')).transform('mean'))

And then if necessary add DataFrame.interpolate:

df = df.interpolate(method='nearest')