Home > Mobile >  Monthly climatology across several years, repeated for each day in that month over all years
Monthly climatology across several years, repeated for each day in that month over all years

Time:12-09

I need to find the monthly climatology of some data that has daily values across several years. The code below sufficiently summarizes what I am trying to do. monthly_mean holds the averages over all years for specific months. I then need to assign that average in a new column for each day in a specific month over all of the years. For whatever reason, my assignment, df['A Climatology'] = group['A Climatology'], is only assigning values to the month of December. How can I make the assignment happen for all months?

data = np.random.randint(5,30,size=(365*3,3))
df = pd.DataFrame(data, columns=['A', 'B', 'C'], index=pd.date_range('2021-01-01', periods=365*3))
df['A Climatology'] = np.nan

monthly_mean = df['A'].groupby(df.index.month).mean()
for month, group in df.groupby(df.index.month):
    group['A Climatology'] = monthly_mean.loc[month]
    df['A Climatology'] = group['A Climatology']
    
df

CodePudding user response:

Your code is setting the column == to the group, so every iteration of your loop you're setting the df's values only for that group---which is why your df ends on December, the last month in the list.

monthly_mean = df['A'].groupby(df.index.month).mean()
for month, group in df.groupby(df.index.month):
    df.loc[lambda df: df.index.month == month, 'A Climatology'] = monthly_mean.loc[month]

Instead, you could directly set the df's values where the month == the iterable month.

CodePudding user response:

merged_df = pd.merge(df, 
             monthly_mean, 
             how='left', 
             left_on=df.index.month, 
             right_on=monthly_mean.index).drop('key_0', axis=1).set_index(df.index)

            A_x B   C   A Climatology   A_y
2021-01-01  12  20  18  NaN             16.752688
2021-01-02  24  26  11  NaN             16.752688
2021-01-03  18  27  15  NaN             16.752688
2021-01-04  18  5   22  NaN             16.752688
2021-01-05  10  15  25  NaN             16.752688
... ... ... ... ... ...
2023-12-27  19  15  11  16.11828        16.118280
2023-12-28  16  23  25  16.11828        16.118280
2023-12-29  6   13  16  16.11828        16.118280
2023-12-30  10  9   14  16.11828        16.118280
2023-12-31  15  22  17  16.11828        16.118280

Or to do this without creating a new data frame:

df = df.reset_index().merge(monthly_mean, how='left', left_on=df.index.month, right_on=monthly_mean.index).set_index('index')

monthly_means:

1     16.752688
2     16.476190
3     16.795699
4     17.111111
5     17.795699
6     18.111111
7     16.806452
8     15.236559
9     15.600000
10    18.279570
11    16.555556
12    16.118280
Name: A, dtype: float64
  • Related