I am creating multiple dataframes for each unique value in a column. It works properly.
regions = dataDF['region'].unique().tolist() df_dict = {name:
dataDF.loc[dataDF['region'] == name] for name in regions}
However, now I would like to calculate the average for the temperature and then calculate the mean afterward for every newly created dataframe.
for df in df_dict:
df['avg'] = (df['tmax'] df['tmin'])/2
df = pd.DataFrame(df.groupby(df['date'].dt.year)['avg'].mean())
Thanks for the help in advance.
CodePudding user response:
Dictionary of DataFrames is not necessary, you can aggregate by year
and column region
:
out = (dataDF[['tmax', 'tmin']].mean(axis=1)
.groupby([dataDF['region'], dataDF['date'].dt.year])
.mean())
Or:
out = (dataDF.assign(avg = dataDF[['tmax', 'tmin']].mean(axis=1),
y = dataDF['date'].dt.year)
.groupby(['region', 'y'])['avg']
.mean())