I want to apply a sliding window of size 3 on the below dataframe and return a new dataframe containing the windowed data.
datetime,val
2008-11-01 00:00:00,14
2008-11-01 00:00:01,11
2008-11-01 00:00:02,22
2008-11-01 00:00:09,56
2008-11-01 00:00:10,32
2008-11-01 00:00:12,11
2008-11-01 00:00:13,95
2008-11-01 00:00:15,77
2008-11-01 00:00:16,49
2008-11-01 00:00:17,66
My desired output is as below:
datetime val
2008-11-01 00:00:00 14
2008-11-01 00:00:01 11
2008-11-01 00:00:02 22
2008-11-01 00:00:01 11
2008-11-01 00:00:02 22
2008-11-01 00:00:09 56
2008-11-01 00:00:02 22
2008-11-01 00:00:09 56
2008-11-01 00:00:10 32
2008-11-01 00:00:09 56
2008-11-01 00:00:10 32
2008-11-01 00:00:12 11
2008-11-01 00:00:10 32
2008-11-01 00:00:12 11
2008-11-01 00:00:13 95
2008-11-01 00:00:12 11
2008-11-01 00:00:13 95
2008-11-01 00:00:15 77
2008-11-01 00:00:13 95
2008-11-01 00:00:15 77
2008-11-01 00:00:16 49
2008-11-01 00:00:15 77
2008-11-01 00:00:16 49
2008-11-01 00:00:17 66
I have tried the below which generates the desired windows (as below) but the dataframe is not in the desired format as it returns the columns for each window. How can I convert the current output to the desired single dataframe (i.e., one with only single row of columns at the start).
import pandas as pd
import numpy as np
def df_sliding_windows(data, window=0):
for i in range(0, len(df) - window 1):
yield df.iloc[i : i window]
if __name__ == '__main__':
df = pd.read_csv('sample.csv')
df['datetime'] = pd.to_datetime(df['datetime'])
df_slide_windows = df_sliding_windows(df, 3)
for j in df_slide_windows:
print(j)
datetime val
0 2008-11-01 00:00:00 14
1 2008-11-01 00:00:01 11
2 2008-11-01 00:00:02 22
datetime val
1 2008-11-01 00:00:01 11
2 2008-11-01 00:00:02 22
3 2008-11-01 00:00:09 56
datetime val
2 2008-11-01 00:00:02 22
3 2008-11-01 00:00:09 56
4 2008-11-01 00:00:10 32
datetime val
3 2008-11-01 00:00:09 56
4 2008-11-01 00:00:10 32
5 2008-11-01 00:00:12 11
datetime val
4 2008-11-01 00:00:10 32
5 2008-11-01 00:00:12 11
6 2008-11-01 00:00:13 95
datetime val
5 2008-11-01 00:00:12 11
6 2008-11-01 00:00:13 95
7 2008-11-01 00:00:15 77
datetime val
6 2008-11-01 00:00:13 95
7 2008-11-01 00:00:15 77
8 2008-11-01 00:00:16 49
datetime val
7 2008-11-01 00:00:15 77
8 2008-11-01 00:00:16 49
9 2008-11-01 00:00:17 66
CodePudding user response:
w = 3
inds = [r i for r in range(len(df) - w 1) for i in range(w)]
df.iloc[inds]
CodePudding user response:
Change your code
if __name__ == '__main__':
df = pd.read_csv('sample.csv')
df['datetime'] = pd.to_datetime(df['datetime'])
df = pd.concat(df_sliding_windows(df, 3))