Home > Software engineering >  Using a sliding window to generate a single dataframe using pandas
Using a sliding window to generate a single dataframe using pandas

Time:03-07

I want to apply a sliding window of size 3 on the below dataframe and return a new dataframe containing the windowed data.

datetime,val
2008-11-01 00:00:00,14
2008-11-01 00:00:01,11
2008-11-01 00:00:02,22
2008-11-01 00:00:09,56
2008-11-01 00:00:10,32
2008-11-01 00:00:12,11
2008-11-01 00:00:13,95
2008-11-01 00:00:15,77
2008-11-01 00:00:16,49
2008-11-01 00:00:17,66

My desired output is as below:

           datetime  val
2008-11-01 00:00:00   14
2008-11-01 00:00:01   11
2008-11-01 00:00:02   22
2008-11-01 00:00:01   11
2008-11-01 00:00:02   22
2008-11-01 00:00:09   56
2008-11-01 00:00:02   22
2008-11-01 00:00:09   56
2008-11-01 00:00:10   32
2008-11-01 00:00:09   56
2008-11-01 00:00:10   32
2008-11-01 00:00:12   11
2008-11-01 00:00:10   32
2008-11-01 00:00:12   11
2008-11-01 00:00:13   95
2008-11-01 00:00:12   11
2008-11-01 00:00:13   95
2008-11-01 00:00:15   77
2008-11-01 00:00:13   95
2008-11-01 00:00:15   77
2008-11-01 00:00:16   49
2008-11-01 00:00:15   77
2008-11-01 00:00:16   49
2008-11-01 00:00:17   66

I have tried the below which generates the desired windows (as below) but the dataframe is not in the desired format as it returns the columns for each window. How can I convert the current output to the desired single dataframe (i.e., one with only single row of columns at the start).

import pandas as pd 
import numpy as np 

def df_sliding_windows(data, window=0):
    for i in range(0, len(df) - window 1):
        yield df.iloc[i : i   window]

if __name__ == '__main__':
    df = pd.read_csv('sample.csv')
    df['datetime'] = pd.to_datetime(df['datetime'])

    df_slide_windows = df_sliding_windows(df, 3)
    for j in df_slide_windows:
        print(j)

             datetime  val
0 2008-11-01 00:00:00   14
1 2008-11-01 00:00:01   11
2 2008-11-01 00:00:02   22
             datetime  val
1 2008-11-01 00:00:01   11
2 2008-11-01 00:00:02   22
3 2008-11-01 00:00:09   56
             datetime  val
2 2008-11-01 00:00:02   22
3 2008-11-01 00:00:09   56
4 2008-11-01 00:00:10   32
             datetime  val
3 2008-11-01 00:00:09   56
4 2008-11-01 00:00:10   32
5 2008-11-01 00:00:12   11
             datetime  val
4 2008-11-01 00:00:10   32
5 2008-11-01 00:00:12   11
6 2008-11-01 00:00:13   95
             datetime  val
5 2008-11-01 00:00:12   11
6 2008-11-01 00:00:13   95
7 2008-11-01 00:00:15   77
             datetime  val
6 2008-11-01 00:00:13   95
7 2008-11-01 00:00:15   77
8 2008-11-01 00:00:16   49
             datetime  val
7 2008-11-01 00:00:15   77
8 2008-11-01 00:00:16   49
9 2008-11-01 00:00:17   66

CodePudding user response:

w = 3
inds = [r   i for r in range(len(df) - w   1) for i in range(w)]
df.iloc[inds]

CodePudding user response:

Change your code

if __name__ == '__main__':
    df = pd.read_csv('sample.csv')
    df['datetime'] = pd.to_datetime(df['datetime'])

    df = pd.concat(df_sliding_windows(df, 3))
  • Related