Home > database >  Python Pandas: calculate median for every row over every n rows (like overlapping groups)
Python Pandas: calculate median for every row over every n rows (like overlapping groups)

Time:11-11

I have a dataframe like this, but much bigger:

Index    Duration
1        100          
2        300
3        350
4        200
5        500
6        1000
7        350
8        200
9        400

I want to calculate a new column with the median for every 3 rows, but in every row. Like this:

Index    Duration     Median
1        100          
2        300          300
3        350          300
4        200          350
5        500          500
6        1000         500
7        350          350
8        200          350
9        400

So for every median row it takes 3 rows, starting at the beginning. But it always has to take one row before and one row after the own one. So that the row the median is being written, is in the middle. Because of that the first and last row has to be empty. The rest of the dataframe has to stay like it is.

Here is an example picture of it looks like in Excel:

enter image description here

enter image description here

In Excel it's easy. You can just move the formula down the rows. I tried many solutions I found here, but they're not doing what I want it to be.

For example I tried this, but it just makes me 3 groups, which I don't want:

df.groupby(np.arange(len(df))//3).median()

#output:

Index     Duration     Median
1         100          500
2         300          350
3         350          350

I want the groups to overlap like I showed above. I hope you understood my problem and can help me with that.

CodePudding user response:

Use Series.rolling with center=True parameter:

df['Median'] = df['Duration'].rolling(3, center=True).median()
print (df)
   Index  Duration  Median
0      1       100     NaN
1      2       300   300.0
2      3       350   300.0
3      4       200   350.0
4      5       500   500.0
5      6      1000   500.0
6      7       350   350.0
7      8       200   350.0
8      9       400     NaN

Another idea is shifting by 1 row:

df['Median'] = df['Duration'].rolling(3).median().shift(-1)

CodePudding user response:

You should consider rolling functionality of Pandas.

The syntax goes like: DF.rolling(n-rows, center=True / False).median()

This will make you work much better.
In your case:

df.rolling(3).median()

I hope it Helps.

  • Related