Home > Blockchain >  Calculating average of values in pandas dataframe, but only at certain values?
Calculating average of values in pandas dataframe, but only at certain values?

Time:06-20

I'm currently working with a pandas dataframe, and I wish to create two average values, one called overload_average, and one called baseline_average. Working with a column strain, I wish to take the average of the first value of the column, and then every 5 values onwards, for overload_average. The other average, baseline_average, would be an average of all the values in between, so values at positions 1-4, 6-9, and so forth. This is what I'm working with so far, and how I thought I could achieve this with iloc. (shrunk for reproducibility)

data = pd.DataFrame({"strain":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9]} 

overload_average = data['strain'].iloc[0:5].average()
baseline_average = data['strain'].iloc[1:4].average()

However, I'm struggling to make this work and think of a way to a) make this a repeating statement and b) How to seperate the values for overload (as I don't want the values in between to be included in my average. I'm also thinking I could maybe use numpy for this? Any help would be greatly appreciated!

CodePudding user response:

You can easily get every 5 items with slicing (::5), save this as index and use it to get the difference for your baseline:

idx = data.index[::5]

overload_average = data.loc[idx, 'strain'].mean()
# 7.5
baseline_average = data.loc[data.index.difference(idx), 'strain'].mean()
# 5.0

Or, build a mask using numpy and modulo:

mask = np.arange(len(data))%5

overload_average = data.loc[mask==0, 'strain'].mean()
# 7.5
baseline_average = data.loc[mask!=0, 'strain'].mean()
# 5.0
  • Related