How to process .csv for time series classification-CodePudding

I was wondering how to handle the recorded time series data to feed it into a RNN.

I want to take the data of 16 time steps and the labels of 15 to make the RNN classify the 16th time step(if that makes any sense). By using every third entry for the batch I can cover about 3 Seconds of data with a reasonable amount of entries per second.

Here is a smaller .csv of the recorded data. The Columns "Time" and "Mayday" are just for reference to make sure that everything is labeled correctly and can therefore be dropped.
This is what my data looks like after dropping the unrelated columns

Here is what I have tried so far in google colab unfortunately this approach doesnt work and I get an "AttributeError: 'tuple' object has no attribute 'shape'" when calling model.fit.

Alternatively I have also tried this:

data = pd.read_csv("slim.csv", sep=",")
data.drop(['Time', 'Mayday'], axis=1)
dataset = tf.data.Dataset.from_tensor_slices(data)

But from there on out I am not sure how to handle the data to get the desired result as calling tf.keras.preprocessing.timeseries_dataset_from_array() on dataset terminates with the error message

'TensorSliceDataset' object is not subscriptable

CodePudding user response：

Your idea is fine. The problem is that train_target and test_target are returning tuples, since as the docs state:

Returns a tf.data.Dataset instance. If targets was passed, the dataset yields tuple (batch_of_sequences, batch_of_targets). If not, the dataset yields only batch_of_sequences.

Since you are only interested in the targets in this case, you can run:

data_set = tf.data.Dataset.zip( (train_input ,train_target.map(lambda x, y: y)))
test_set = tf.data.Dataset.zip( (test_input ,test_target.map(lambda x, y: y)))

But note that this will still not work, because your targets have the shape (32, 11) and your model's output shape is (32, 3). So you should ask yourself what exactly you are trying to achieve.

Update 1

Try:


import tensorflow as tf

data = pd.read_csv("slim.csv", sep=",")
data.drop(['Time', 'Mayday'], axis=1)

window_size = 16
dataset = tf.data.Dataset.from_tensor_slices((data.values)).window(window_size, shift=3, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_size)).batch(32)
dataset = dataset.map(lambda x: (x[:, :, :10], x[:, 15, -1]))

model = tf.keras.models.Sequential([
tf.keras.layers.GRU(input_shape=(None, 10), units= 128),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(units=128, activation='tanh'),
tf.keras.layers.Dense(units=3, activation='softmax')
])

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

model.fit(dataset, epochs=5)