Could you let know how to average column values with same value in the first column as below? I'd like to calculate calculate every n rows of all columns. for example, the average of all the values from a1~a6 with site value '1' equals to 3. (my data has numerous columns and rows)
raw = {'site':['1','1','1','1','2','2','2','2'],
'a1':['1','2','3','4','5','6','7','8'],
'a2':['2','3','4','5','6','7','8','9'],
'a3':['2','3','4','5','6','7','8','9'],
'a4':['2','3','4','5','6','7','8','9'],
'a5':['2','3','4','5','6','7','8','9'],
'a6':['2','3','4','5','6','7','8','9']
}
df_tmp = pd.DataFrame(raw,columns = ['site','a1','a2','a3','a4','a5','a6'])
print(df_tmp)
Thank you
CodePudding user response:
You may use:
df.mean(axis=0) # by each row
df.mean(axis=1) # by each column
Also you may do the following:
df[[list_of_columns]].mean(axis=0)
df[[list_of_columns]].mean(axis=1)
CodePudding user response:
You can use df_tmp.iloc[row_index, col_index]
to slice with index or df_tmp.loc[row_index, list_of_col_name]
to slice with col_name and row index.
For this case:
df_tmp.loc[0:3,['a1']]
returns
a1
0 1
1 2
2 3
3 4
df_tmp.iloc[0:3,1:5]
returns
a1 a2 a3 a4
0 1 2 2 2
1 2 3 3 3
2 3 4 4 4
To get the mean value, you basically take the sliced df, and call mean()
df_tmp.iloc[0:3,1:5].mean(axis=0)
will calculate mean value in respect of each col.
a1 41.0
a2 78.0
a3 78.0
a4 78.0
To calculate the mean value of the whole sheet, you can do something like...
df_tmp.iloc[0:3,1:5].mean().mean()
returns 68.75
CodePudding user response:
IIUC, DataFrame.melt
mean for each site with GroupBy.mean
# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()
Or:
# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated
Output
site
1 3.333333
2 7.333333
Name: value, dtype: float64