How to average column values every n rows in pandas-CodePudding

Could you let know how to average column values with same value in the first column as below? I'd like to calculate calculate every n rows of all columns. for example, the average of all the values from a1~a6 with site value '1' equals to 3. (my data has numerous columns and rows)

raw = {'site':['1','1','1','1','2','2','2','2'],
       'a1':['1','2','3','4','5','6','7','8'],
       'a2':['2','3','4','5','6','7','8','9'],
       'a3':['2','3','4','5','6','7','8','9'],
       'a4':['2','3','4','5','6','7','8','9'],
       'a5':['2','3','4','5','6','7','8','9'],
       'a6':['2','3','4','5','6','7','8','9']
      }
df_tmp = pd.DataFrame(raw,columns = ['site','a1','a2','a3','a4','a5','a6'])
print(df_tmp)

Thank you

CodePudding user response：

You may use:

df.mean(axis=0) # by each row
df.mean(axis=1) # by each column

Also you may do the following:

df[[list_of_columns]].mean(axis=0)
df[[list_of_columns]].mean(axis=1)

CodePudding user response：

You can use df_tmp.iloc[row_index, col_index] to slice with index or df_tmp.loc[row_index, list_of_col_name] to slice with col_name and row index.

For this case: df_tmp.loc[0:3,['a1']] returns

df_tmp.iloc[0:3,1:5] returns

  a1 a2 a3 a4
0  1  2  2  2
1  2  3  3  3
2  3  4  4  4

To get the mean value, you basically take the sliced df, and call mean() df_tmp.iloc[0:3,1:5].mean(axis=0) will calculate mean value in respect of each col.

To calculate the mean value of the whole sheet, you can do something like...

df_tmp.iloc[0:3,1:5].mean().mean() returns 68.75

CodePudding user response：

IIUC, DataFrame.melt mean for each site with GroupBy.mean

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()

Or:

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated

Output

site
1    3.333333
2    7.333333
Name: value, dtype: float64