How can I use a sequence of numbers to predict a single number in Tensorflow?-CodePudding

I am trying to build a machine learning model which predicts a single number from a series of numbers. I am using a Sequential model from the keras API of Tensorflow.

You can imagine my dataset to look something like this:

Index	x data	y data
0	`np.ndarray(shape (1209278,) )`	`numpy.float32`
1	`np.ndarray(shape (1211140,) )`	`numpy.float32`
2	`np.ndarray(shape (1418411,) )`	`numpy.float32`
3	`np.ndarray(shape (1077132,) )`	`numpy.float32`
...	...	...

This was my first attempt:

I tried using a numpy ndarray which contains numpy ndarrays which finally contain floats as my xdata, so something like this:

array([
    array([3.59280851, 3.60459062, 3.60459062, ..., 4.02911493])
    array([3.54752101, 3.56740332, 3.56740332, ..., 4.02837855])
    array([3.61048168, 3.62152741, 3.62152741, ..., 4.02764217])
])

My y data is a numpy ndarray containing floats, which looks something like this

array([2.9864411, 3.0562437, ... , 2.7750807, 2.8712902], dtype=float32)

But when I tried to train the model using model.fit() it yields this error:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

I was able to solve this error by asking a question related to this: How can I have a series of numpy ndarrays as the input data to train a tensorflow machine learning model?

My latest attempt: Because Tensorflow does not seem to be able to convert a ndarray of ndarrays to a tensor, I tried to convert my x data to a list of ndarrays like this:

[
    array([3.59280851, 3.60459062, 3.60459062, ..., 4.02911493])
    array([3.54752101, 3.56740332, 3.56740332, ..., 4.02837855])
    array([3.61048168, 3.62152741, 3.62152741, ..., 4.02764217])
]

I left my y data untouched, so as a ndarray of floats. Sadly my attempt of using a list of ndarrays instead of a ndarray of ndarrays yielded this error:

ValueError: Data cardinality is ambiguous:
  x sizes: 1304593, 1209278, 1407624, ...
  y sizes: 46
Make sure all arrays contain the same number of samples.

As you can see, my x data consists of arrays which all have a different shape. But I don't think that this should be a problem.

Question:

My guess is that Tensorflow tries to use my list of arrays as multiple inputs. Tensorflow fit() documentation

But I don't want to use my x data as multiple inputs. Easily said I just want my model to predict a number from a sequence of numbers. For example like this:

array([3.59280851, 3.60459062, 3.60459062, ...]) => 2.8989773
array([3.54752101, 3.56740332, 3.56740332, ...]) => 3.0893357
...

How can I use a sequence of numbers to predict a single number in Tensorflow?

EDIT Maybe I should have added that I want to use a RNN, especially a LSTM. I have had a look at the Keras documentation, and in their simplest example they are using a Embedding layer. But I don't really know what to do.

All in all I think that my question ist pretty general and should be easy to answer if you know how to tackle this problem, unlike me. Thanks in advance!

CodePudding user response：

Try something like this:

import numpy as np
import tensorflow as tf

# add additional dimension for lstm layer
x_train = np.asarray(train_set["x data"].values))[..., None] 
y_train = np.asarray(train_set["y data"]).astype(np.float32)

model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(units=32))
model.add(tf.keras.layers.Dense(units=1))
model.compile(loss="mean_squared_error", optimizer="adam", metrics="mse")
model.fit(x=x_train,y=y_train,epochs=10)

Or with a ragged input for different sequence lengths:

x_train = tf.ragged.constant(train_set["x data"].values[..., None]) # add additional dimension for lstm layer
y_train = np.asarray(train_set["y data"]).astype(np.float32)

model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=[None, x_train.bounding_shape()[-1]], batch_size=2, dtype=tf.float32, ragged=True))
model.add(tf.keras.layers.LSTM(units=32))
model.add(tf.keras.layers.Dense(units=1))
model.compile(loss="mean_squared_error", optimizer="adam", metrics="mse")
model.fit(x=x_train,y=y_train,epochs=10)

Or:

x_train = tf.ragged.constant([np.array(list(v))[..., None] for v in train_set["x data"].values]) # add additional dimension for lstm layer