Home > Software engineering >  Apply calculation for dataframe columns for multiple dataframes at the same time
Apply calculation for dataframe columns for multiple dataframes at the same time

Time:09-28

I am creating multiple dataframes for each unique value in a column. It works properly.

regions = dataDF['region'].unique().tolist()  df_dict = {name:
dataDF.loc[dataDF['region'] == name] for name in regions}

However, now I would like to calculate the average for the temperature and then calculate the mean afterward for every newly created dataframe.

for df in df_dict:
    df['avg'] = (df['tmax']   df['tmin'])/2
    df = pd.DataFrame(df.groupby(df['date'].dt.year)['avg'].mean())

Thanks for the help in advance.

CodePudding user response:

Dictionary of DataFrames is not necessary, you can aggregate by year and column region:

out = (dataDF[['tmax', 'tmin']].mean(axis=1)
                               .groupby([dataDF['region'], dataDF['date'].dt.year])
                               .mean())

Or:

out = (dataDF.assign(avg = dataDF[['tmax', 'tmin']].mean(axis=1), 
                     y = dataDF['date'].dt.year)
             .groupby(['region', 'y'])['avg']
             .mean())
  • Related