I need to calculate some measures on a window of my dataframe, with the value of interest in the centre of the window. To be more clear I use an example: if I have a dataset of 10 rows and a window size of 2, when I am in the 5th row I need to compute for example the mean of the values in 3rd, 4th, 5th, 6th and 7th row. When I am in the first row, I will not have the previous rows so I need to use only the following ones (so in the example, to compute the mean of 1st, 2nd and 3rd rows); if there are some rows but not enough, I need to use all the rows that are present (so fpr example if I am in the 2nd row, I will use 1st, 2nd, 3rd and 4th).
How can I do that? As the title of my question suggest, the first idea I had was to count the number of rows preceding and following the current one, but I don't know how to do that. I am not forced to use this method, so if you have any suggestions on a better method feel free to share it.
CodePudding user response:
What you want is a rolling
mean with min_periods=1, center=True
:
df = pd.DataFrame({'col': range(10)})
N = 2 # numbers of rows before/after to include
df['rolling_mean'] = df['col'].rolling(2*N 1, min_periods=1, center=True).mean()
output:
col rolling_mean
0 0 1.0
1 1 1.5
2 2 2.0
3 3 3.0
4 4 4.0
5 5 5.0
6 6 6.0
7 7 7.0
8 8 7.5
9 9 8.0
CodePudding user response:
I assume that you have the target_row
and window_size
numbers as an input. You are trying to do an operation on a window_size
of rows around the target_row
in a dataframe df
, and I gather from your question that you already know that you can't just grab /- the window size, because it might exceed the size of the dataframe. Instead, just quickly define the resulting start and end rows based on the dataframe size, and then pull out the window you want:
start_row = max(target_row - window_size, 0)
end_row = min(target_row window_size, len(df)-1)
window = df.iloc[start_row:end_row 1,:]
Then you can perform whatever operation you want on the window
such as taking an average with window.mean()
.