Fast method for generation of the sub sequences initial data-CodePudding

I have DataFrame and I'd like to make the sub sequences of the its data

d = pd.DataFrame({'t' : [1,2,3,4,5,6]})

x = []
window = 3
for i in range(0, len(d) - window   1):
    x.append(d[i: i   window].t.values)
    
pd.DataFrame(x, columns = ['t1','t2', 't3'])

I receive the result like this:

    t1  t2  t3
0   1   2   3
1   2   3   4
2   3   4   5
3   4   5   6

It works but very slow for large DataFrame. Is it possible to make the procedure faster?

CodePudding user response：

You can use numpy

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

W = 3
pd.DataFrame(sliding_window_view(d['t'], W), 
             columns=[f't{i 1}' for i in range(W)])

#   t1  t2  t3
#0   1   2   3
#1   2   3   4
#2   3   4   5
#3   4   5   6

CodePudding user response：

You can use this trick with Pandas:

lst = []
df.rolling(3).apply(lambda x: lst.append(x.apply(int).tolist()) or 0)
result = pd.DataFrame.from_records(lst, columns=['t1','t2','t3'])

Here is the result:

   t1  t2  t3
0   1   2   3
1   2   3   4
2   3   4   5
3   4   5   6