Build an array from a dataframe based on a sliding window-CodePudding

I have a dataframe with three columns. I want to build an array from this dataframe. I want to use a window to move across the rows. I want to start from row 0 until end, and put the value of three columns in one row of array based on the column b in data frame.

For example, with window size of 3, I put the 1,2,3 which are the three value in the first window and in column a put with the three value in column c, -1,1,5 and at the last column the value of column b is 1. This process will contintue until the column b changed. Then, for the next value of column b we have the same process. Thank you in advance Here is the dataframe which I have:

import pandas as pd
df = pd.DataFrame()
df ['a'] = [1,2,3,4,5,6,7,8,9,10, 1,2,3]
df ['b'] = [1,1,1,1,2,2,2,2,7,7,7,7,7]
df ['c'] = [-1,1,5,6,7,3,1,6,8,3,9,10,1]
df

     a  b   c
0    1  1  -1
1    2  1   1
2    3  1   5
3    4  1   6
4    5  2   7
5    6  2   3
6    7  2   1
7    8  2   6
8    9  7   8
9   10  7   3
10   1  7   9
11   2  7  10
12   3  7   1

The array which I want is:

array([[ 1,  2,  3, -1,  1,  5,  1],
       [ 2,  3,  4,  1,  5,  6,  1],
       [ 5,  6,  7,  7,  3,  1,  2],
       [ 6,  7,  8,  3,  1,  6,  2],
       [ 9, 10,  1,  8,  3,  9,  7],
       [10,  1,  2,  3,  9, 10,  7],
       [ 1,  2,  3,  9, 10,  1,  7]])

I tried, but I think that I am far away from a solution.

w = 3

A = np.zeros((w,len(df)-w 1))

for i in range(len(df)-w 1):
    A[:,i] = df.iloc[i:w i,0]

A = A.T

CodePudding user response：

IIUC, you could use a GroupBy.apply and a bit of numpy:

# group by consecutive b
group = df['b'].diff().abs().gt(0).cumsum()
# or if simply groupby by 'b' overall (not consecutive)
# group = 'b'

def flatten_group(g):
    # pandas' rolling only aggregates, doing a custom rolling
    # that flattens the columns a & c and add the 1st value of b
    return [np.r_[g.iloc[i:i 3][['a', 'c']].to_numpy().ravel('F'),
                  [g['b'].iloc[0]]]
            for i in range(len(g)-2)]

# apply per group and stack as numpy array
a = np.vstack(df.groupby(group).apply(flatten_group).explode())

output:

array([[ 1,  2,  3, -1,  1,  5,  1],
       [ 2,  3,  4,  1,  5,  6,  1],
       [ 5,  6,  7,  7,  3,  1,  2],
       [ 6,  7,  8,  3,  1,  6,  2],
       [ 9, 10,  1,  8,  3,  9,  7],
       [10,  1,  2,  3,  9, 10,  7],
       [ 1,  2,  3,  9, 10,  1,  7]])

CodePudding user response：

another way:

res = []
for n,g in df.groupby('b'):
    for r in g[['a','c']].rolling(3):
        if len(r)==3:
            res.append(r.T.values.ravel().tolist() [n])

np.array(res)
'''
array([[ 1,  2,  3, -1,  1,  5,  1],
       [ 2,  3,  4,  1,  5,  6,  1],
       [ 5,  6,  7,  7,  3,  1,  2],
       [ 6,  7,  8,  3,  1,  6,  2],
       [ 9, 10,  1,  8,  3,  9,  7],
       [10,  1,  2,  3,  9, 10,  7],
       [ 1,  2,  3,  9, 10,  1,  7]])