Home > Software design >  How do I create a timeseries sliding window tensorflow dataset where some features have different ba
How do I create a timeseries sliding window tensorflow dataset where some features have different ba

Time:11-25

Currently I am able to create a timeseries sliding window batched dataset that contains ordered 'feature sets' like 'inputs', 'targets', 'benchmarks', etc. Originally I had developed my model and dataset wherein the targets would be of the same batch size as all other inputs, however that has proven to be detrimental to tuning the input batch size and also won't be helpful when it comes time to run this on live data where I only care to produce a single sample output of the shape (1, horizon, targets) or perhaps just (horizon, targets) given an input dataset of (samples, horizon, features).

As an overview, I want to take N historical samples of horizon length features at time T, run them through the model and output a single sample of horizon length targets; repeat until the dataset is run through in its entirety.

Assuming a pandas DataFrame of length Z, all resulting Datasets should have a length of Z - horizon. The 'targets' Dataset should have a batch size of 1, and the 'inputs' Dataset should have a batch size of batch_size.


Here's a stripped down snippet of what I currently use in order to generate a standard batch size for all feature sets:


import tensorflow as tf
import pandas as pd

horizon = 5
batch_size = 10
columns = {
    "inputs": ["input_1", "input_2"],
    "targets": ["target_1"],
}
batch_options = {
    "drop_remainder": True,
    "deterministic": True,
}

d = range(100)
df = pd.DataFrame(data={'input_1': d, 'input_2': d, 'target_1': d})

slices = tuple(df[x].astype("float32") for x in columns.values())
data = (
    tf.data.Dataset.from_tensor_slices(slices)
    .window(horizon, shift=1, drop_remainder=True)
    .flat_map(
        lambda *c: tf.data.Dataset.zip(
            tuple(
                col.batch(horizon, **batch_options)
                for col in c
            )
        )
    )
    .batch(
        batch_size,
        **batch_options,
    )
)

CodePudding user response:

We can create two sliding windowed dataset and zip them.

inputs = df[['input_1', 'input_1']].to_numpy()
labels = df['target_1'].to_numpy()


window_size = 10
stride =1
data1 = tf.data.Dataset.from_tensor_slices(inputs).window(window_size, shift=stride, drop_remainder=True).flat_map(lambda x: x.batch(window_size))
data2 = tf.data.Dataset.from_tensor_slices(inputs).window(1, shift=stride, drop_remainder=True).flat_map(lambda x: x.batch(1))
data = tf.data.Dataset.zip((data1, data2))
  • Related