I need to find the monthly climatology of some data that has daily values across several years. The code below sufficiently summarizes what I am trying to do. monthly_mean
holds the averages over all years for specific months. I then need to assign that average in a new column for each day in a specific month over all of the years. For whatever reason, my assignment, df['A Climatology'] = group['A Climatology']
, is only assigning values to the month of December. How can I make the assignment happen for all months?
data = np.random.randint(5,30,size=(365*3,3))
df = pd.DataFrame(data, columns=['A', 'B', 'C'], index=pd.date_range('2021-01-01', periods=365*3))
df['A Climatology'] = np.nan
monthly_mean = df['A'].groupby(df.index.month).mean()
for month, group in df.groupby(df.index.month):
group['A Climatology'] = monthly_mean.loc[month]
df['A Climatology'] = group['A Climatology']
df
CodePudding user response:
Your code is setting the column == to the group, so every iteration of your loop you're setting the df's values only for that group---which is why your df ends on December, the last month in the list.
monthly_mean = df['A'].groupby(df.index.month).mean()
for month, group in df.groupby(df.index.month):
df.loc[lambda df: df.index.month == month, 'A Climatology'] = monthly_mean.loc[month]
Instead, you could directly set the df's values where the month == the iterable month.
CodePudding user response:
merged_df = pd.merge(df,
monthly_mean,
how='left',
left_on=df.index.month,
right_on=monthly_mean.index).drop('key_0', axis=1).set_index(df.index)
A_x B C A Climatology A_y
2021-01-01 12 20 18 NaN 16.752688
2021-01-02 24 26 11 NaN 16.752688
2021-01-03 18 27 15 NaN 16.752688
2021-01-04 18 5 22 NaN 16.752688
2021-01-05 10 15 25 NaN 16.752688
... ... ... ... ... ...
2023-12-27 19 15 11 16.11828 16.118280
2023-12-28 16 23 25 16.11828 16.118280
2023-12-29 6 13 16 16.11828 16.118280
2023-12-30 10 9 14 16.11828 16.118280
2023-12-31 15 22 17 16.11828 16.118280
Or to do this without creating a new data frame:
df = df.reset_index().merge(monthly_mean, how='left', left_on=df.index.month, right_on=monthly_mean.index).set_index('index')
monthly_means
:
1 16.752688
2 16.476190
3 16.795699
4 17.111111
5 17.795699
6 18.111111
7 16.806452
8 15.236559
9 15.600000
10 18.279570
11 16.555556
12 16.118280
Name: A, dtype: float64