I have the following dataframe df
:
sales
2015-10-05 -0.462626
2015-10-06 -0.540147
2015-10-07 -0.450222
2015-10-08 -0.448672
2015-10-09 -0.451773
... ...
2019-10-16 -0.594413
2019-10-17 -0.620770
2019-10-18 -0.586660
2019-10-19 -0.586660
2019-10-20 -0.671934
11340 rows × 1 columns
which I turn into a tf.data.Dataset
like so:
data = np.array(df)
ds = tf.keras.utils.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=4,
sequence_stride=1,
shuffle=False,
batch_size=1,)
The dataset gives me records looking as such
print(next(iter(ds)))
tf.Tensor(
[[[-0.4626256 ]
[-0.54014736]
[-0.4502221 ]
[-0.44867167]]], shape=(1, 4, 1), dtype=float32)
Which I use for training my ML model, however, I need a way of finding the dates corresponding to the values I fetch from the dataset. Using the example fetch from the dataset above, I want to find the dates corresponding to those consecutive values, which from the dataframe we can see is [2015-10-05, 2015-10-06, 2015-10-07, 2015-10-08]
. Ideally, I would like to get other attributes as well if the dataframe has several columns. Is there a way of doing so?
CodePudding user response:
You could try using another dataset as a lookup. That way you can add further attributes if needed:
import pandas as pd
import numpy as np
import tensorflow as tf
df = pd.DataFrame(data={'date': ['2015-10-05', '2015-10-06', '2015-10-07', '2015-10-08', '2015-10-09', '2019-10-16', '2019-10-17', '2019-10-18', '2019-10-19', '2019-10-20'],
'sales': [-0.462626, -0.540147, -0.450222, -0.448672, -0.451773, -0.594413, -0.620770, -0.586660, -0.586660, -0.671934]})
data = np.array(df['sales'])
ds = tf.keras.utils.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=4,
sequence_stride=1,
shuffle=False,
batch_size=1,)
d = tf.data.Dataset.from_tensor_slices((df['date'].to_numpy())).batch(1)
dates = d.flat_map(tf.data.Dataset.from_tensor_slices).window(4, shift=1, stride=1).flat_map(lambda x: x.batch(4)).batch(1)
d = tf.data.Dataset.zip((dates, ds))
def lookup(tensor, dataset):
dataset = dataset.filter(lambda x, y: tf.reduce_all(tf.equal(y, tensor)))
return [x.numpy().decode('utf-8') for x in list(dataset.map(lambda x, y: tf.squeeze(x, axis=0)))[0]]
result = lookup(next(iter(ds)), d)
print(result)
['2015-10-05', '2015-10-06', '2015-10-07', '2015-10-08']