I'm currently working with a pandas dataframe, and I wish to create two average values, one called overload_average
, and one called baseline_average
. Working with a column strain
, I wish to take the average of the first value of the column, and then every 5 values onwards, for overload_average
. The other average, baseline_average
, would be an average of all the values in between, so values at positions 1-4, 6-9, and so forth. This is what I'm working with so far, and how I thought I could achieve this with iloc. (shrunk for reproducibility)
data = pd.DataFrame({"strain":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9]}
overload_average = data['strain'].iloc[0:5].average()
baseline_average = data['strain'].iloc[1:4].average()
However, I'm struggling to make this work and think of a way to a) make this a repeating statement and b) How to seperate the values for overload (as I don't want the values in between to be included in my average. I'm also thinking I could maybe use numpy for this? Any help would be greatly appreciated!
CodePudding user response:
You can easily get every 5 items with slicing (::5
), save this as index and use it to get the difference for your baseline:
idx = data.index[::5]
overload_average = data.loc[idx, 'strain'].mean()
# 7.5
baseline_average = data.loc[data.index.difference(idx), 'strain'].mean()
# 5.0
Or, build a mask using numpy
and modulo:
mask = np.arange(len(data))%5
overload_average = data.loc[mask==0, 'strain'].mean()
# 7.5
baseline_average = data.loc[mask!=0, 'strain'].mean()
# 5.0