Home > database >  Find the maximum of the rows in each specific window
Find the maximum of the rows in each specific window

Time:10-18

I have a large dataframe, I want to find the maximum of each row in a window size of 5. Here is a simple example of df with shape 3*15. For the first row, 10 is the maximum in the first 5 values, for the second window, the maximum is 20 and for the third window at row 1, the maximum is 101.

import pandas as pd
df = pd.DataFrame()
df['a'] = [1, 0, 1]
df['b'] = [10,12,4]
df['c'] = [10, 0, 6]
df['d'] = [10,102,1]
df['aa'] = [-1, 10, 21]
df['bb'] = [10,12,31]
df['ac'] = [-1, 0, 41]
df['bv'] = [20,121,11]
df['a1'] = [3, 10, 1]
df['bx'] = [10,12,51]
df['ad'] = [6, 10, 1]
df['bn'] = [101,12,31]
df['aq'] = [2, 0, 1]
df['bp'] = [1,112,12]
df['ak'] = [23, 0, 1]

    a   b   c   d   aa  bb  ac  bv  a1  bx  ad  bn  aq  bp  ak
0   1   10  10  10  -1  10  -1  20  3   10  6   101 2   1   23
1   0   12  0   102 10  12  0   121 10  12  10  12  0   112 0
2   1   4   6   1   21  31  41  11  1   51  1   31  1   12  1

Output:

    v1    v2    v3
 0  10    20     101
 1  102    121  112
 2  21    51    31

Could you please help me with that?

CodePudding user response:

I would use numpy to reshape it to be 5 wide than use numpy max, and finally reshape it back to be 3 wide

print(df.values.reshape((-1,5)).max(axis=1).reshape(-1,3))

[[ 10  20 101]
 [102 121 112]
 [ 21  51  31]]

CodePudding user response:

You can first split the dataframe column into 3 parts

split = [df.columns[i*5:(i 1)*5] for i in range(3)]

Then, with the help of df.loc, you can get the desired dataframe.

pd.DataFrame([df.loc[:, i].max(axis=1) for i in split], index = ['v1','v2','v3'])
 
      0    1   2
v1   10  102  21
v2   20  121  51
v3  101  112  31

CodePudding user response:

If you are looking to generalise it for any number of columns or window size:

from tqdm import tqdm
tqdm.pandas()

w = 5 # window size

df.progress_apply(lambda row: pd.Series([max(sub) for sub in arr.reshape(int(len(row)/w), w)], index=["v" str(i) for i in range(1, int(len(row)/w) 1)]), axis=1)

>>    v1  v2   v3
>> 0  10  20  101
>> 1  10  20  101
>> 2  10  20  101

Plus you mentioned that you've a large dataframe - progress_apply will help to track progress, if the dataset is insanely large.

  • Related