Home > Software design >  How to average column values every n rows in pandas
How to average column values every n rows in pandas

Time:02-24

Could you let know how to average column values with same value in the first column as below? I'd like to calculate calculate every n rows of all columns. for example, the average of all the values from a1~a6 with site value '1' equals to 3. (my data has numerous columns and rows)

raw = {'site':['1','1','1','1','2','2','2','2'],
       'a1':['1','2','3','4','5','6','7','8'],
       'a2':['2','3','4','5','6','7','8','9'],
       'a3':['2','3','4','5','6','7','8','9'],
       'a4':['2','3','4','5','6','7','8','9'],
       'a5':['2','3','4','5','6','7','8','9'],
       'a6':['2','3','4','5','6','7','8','9']
      }
df_tmp = pd.DataFrame(raw,columns = ['site','a1','a2','a3','a4','a5','a6'])
print(df_tmp)

Thank you

CodePudding user response:

You may use:

df.mean(axis=0) # by each row
df.mean(axis=1) # by each column

Also you may do the following:

df[[list_of_columns]].mean(axis=0)
df[[list_of_columns]].mean(axis=1)

CodePudding user response:

You can use df_tmp.iloc[row_index, col_index] to slice with index or df_tmp.loc[row_index, list_of_col_name] to slice with col_name and row index.

For this case: df_tmp.loc[0:3,['a1']] returns

  a1
0  1
1  2
2  3
3  4

df_tmp.iloc[0:3,1:5] returns

  a1 a2 a3 a4
0  1  2  2  2
1  2  3  3  3
2  3  4  4  4

To get the mean value, you basically take the sliced df, and call mean() df_tmp.iloc[0:3,1:5].mean(axis=0) will calculate mean value in respect of each col.

a1    41.0
a2    78.0
a3    78.0
a4    78.0

To calculate the mean value of the whole sheet, you can do something like...

df_tmp.iloc[0:3,1:5].mean().mean() returns 68.75

CodePudding user response:

IIUC, DataFrame.melt mean for each site with GroupBy.mean

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()

Or:

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated 

Output

site
1    3.333333
2    7.333333
Name: value, dtype: float64
  • Related