Home > Software design >  How can I use a sequence of numbers to predict a single number in Tensorflow?
How can I use a sequence of numbers to predict a single number in Tensorflow?

Time:07-06

I am trying to build a machine learning model which predicts a single number from a series of numbers. I am using a Sequential model from the keras API of Tensorflow.

You can imagine my dataset to look something like this:

Index x data y data
0 np.ndarray(shape (1209278,) ) numpy.float32
1 np.ndarray(shape (1211140,) ) numpy.float32
2 np.ndarray(shape (1418411,) ) numpy.float32
3 np.ndarray(shape (1077132,) ) numpy.float32
... ... ...

This was my first attempt:

I tried using a numpy ndarray which contains numpy ndarrays which finally contain floats as my xdata, so something like this:

array([
    array([3.59280851, 3.60459062, 3.60459062, ..., 4.02911493])
    array([3.54752101, 3.56740332, 3.56740332, ..., 4.02837855])
    array([3.61048168, 3.62152741, 3.62152741, ..., 4.02764217])
])

My y data is a numpy ndarray containing floats, which looks something like this

array([2.9864411, 3.0562437, ... , 2.7750807, 2.8712902], dtype=float32)

But when I tried to train the model using model.fit() it yields this error:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

I was able to solve this error by asking a question related to this: How can I have a series of numpy ndarrays as the input data to train a tensorflow machine learning model?

My latest attempt: Because Tensorflow does not seem to be able to convert a ndarray of ndarrays to a tensor, I tried to convert my x data to a list of ndarrays like this:

[
    array([3.59280851, 3.60459062, 3.60459062, ..., 4.02911493])
    array([3.54752101, 3.56740332, 3.56740332, ..., 4.02837855])
    array([3.61048168, 3.62152741, 3.62152741, ..., 4.02764217])
]

I left my y data untouched, so as a ndarray of floats. Sadly my attempt of using a list of ndarrays instead of a ndarray of ndarrays yielded this error:

ValueError: Data cardinality is ambiguous:
  x sizes: 1304593, 1209278, 1407624, ...
  y sizes: 46
Make sure all arrays contain the same number of samples.

As you can see, my x data consists of arrays which all have a different shape. But I don't think that this should be a problem.

Question:

My guess is that Tensorflow tries to use my list of arrays as multiple inputs. Tensorflow fit() documentation

But I don't want to use my x data as multiple inputs. Easily said I just want my model to predict a number from a sequence of numbers. For example like this:

  • array([3.59280851, 3.60459062, 3.60459062, ...]) => 2.8989773
  • array([3.54752101, 3.56740332, 3.56740332, ...]) => 3.0893357
  • ...

How can I use a sequence of numbers to predict a single number in Tensorflow?

EDIT Maybe I should have added that I want to use a RNN, especially a LSTM. I have had a look at the Keras documentation, and in their simplest example they are using a Embedding layer. But I don't really know what to do.

All in all I think that my question ist pretty general and should be easy to answer if you know how to tackle this problem, unlike me. Thanks in advance!

CodePudding user response:

Try something like this:

import numpy as np
import tensorflow as tf

# add additional dimension for lstm layer
x_train = np.asarray(train_set["x data"].values))[..., None] 
y_train = np.asarray(train_set["y data"]).astype(np.float32)

model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(units=32))
model.add(tf.keras.layers.Dense(units=1))
model.compile(loss="mean_squared_error", optimizer="adam", metrics="mse")
model.fit(x=x_train,y=y_train,epochs=10)

Or with a ragged input for different sequence lengths:

x_train = tf.ragged.constant(train_set["x data"].values[..., None]) # add additional dimension for lstm layer
y_train = np.asarray(train_set["y data"]).astype(np.float32)

model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=[None, x_train.bounding_shape()[-1]], batch_size=2, dtype=tf.float32, ragged=True))
model.add(tf.keras.layers.LSTM(units=32))
model.add(tf.keras.layers.Dense(units=1))
model.compile(loss="mean_squared_error", optimizer="adam", metrics="mse")
model.fit(x=x_train,y=y_train,epochs=10)

Or:

x_train = tf.ragged.constant([np.array(list(v))[..., None] for v in train_set["x data"].values]) # add additional dimension for lstm layer
  • Related