Home > Enterprise >  Why cannot predict in TensorFlow a equation of third degree?
Why cannot predict in TensorFlow a equation of third degree?

Time:01-10

I'm new to TensorFlow. I was able to make simple predication. But when I made changes it stopped working. Why? and how to fix it?

I have used this demo. And I was able to solve an equation like this:

y=2x-1

By using this code:

model=Sequential([Dense(units=1,input_shape=[1])])
model.compile(optimizer='sgd',loss='mean_squared_error')

xs=np.array([-1.0,0.0,1.0,2.0])
ys=np.array([-3.0,-1.0,1.0,3.0])

model.fit(xs,ys,epochs=400)

print(model.predict([11,0]))

Then I tried the same concept to solve this equation:

3x^3 5x^2 10

This is the new code:

model=Sequential([Dense(units=1,input_shape=[1])])
model.compile(optimizer='sgd',loss='mean_squared_error')

xs=np.array([5.0,6.0,7.0,8.0,10.0])
ys=np.array([435.0,730.0,1137.0,1674.0,3210.0])

model.fit(xs,ys,epochs=1000)

print(model.predict([11,0]))

My question is, how to change my code so that it will solve it correctly?

CodePudding user response:

It's a bit weird for newbies, but you need to have much more degrees of freedom than an initial task. Also, you need to have a lot of data to train your model.

For the equation y=2x-1, you only need one weight (the coefficient of x) and one bias (the constant term) to fit the model. However, for the equation 3x^3 2*11^2 10, you need at least four weights (one for each term in the equation) and one bias to fit the model correctly. But even this would be too hard for the model, because there is an enormous number of possible combinations of weights and biases that can fit that 5 data points (for example, you can have a model that fits the data perfectly, but it would be a totally irrelevant curve that goes through all 5 points), but won't be able to generalize to other data points. So, you need to have more data points to train your model. I would suggest you to use a dataset with at least 1000 data points, so that your model would have much more constraints to fit the data and, therefore, it would be able to generalize to other data points.

But even so, you would still have a problem, because the equation 3x^3 2*11^2 10 is not a linear equation, so you can't use a linear model to fit it. You would need to use more layers in your model to simulate, for example, a x^3 term.

Even if you would bypass this problem (for example, by feeding into the model the values of x^3 instead of x), you would still have a problem, because the equation 3x^3 2*11^2 10 has a huge range of its terms. For example, the term 10, in a perfect scenario, would require up to 10 / learning_rate batches to be achieved. SGD's standard learning rate is 0.01, so it would take at least 1000 batches to achieve the term 10, from the initial value of close to 0. But, on the other hand, the term 3x^3 has a smaller range, so it would be achieved in a few batches. So, you would have a problem of convergence, because the model would be trying to fit the term 10, which is very far from the initial value, while the other terms would be already close to the correct value. To overcome this problem, you would need to use overparameterized model. In this case, each term would be represented by a lot of small subterms, so that the model would be able to fit each term in a few batches.

Lastly, you would still have a problem, because the range of the input x and target y is very big. SGD, along with other optimization algorithms, works better when the range of the input and target is small. So, you would need to normalize your input and target. For example, you could normalize the input x to be in the range [0, 1] and the target y to be in the range [-1, 1]. In this case, the magnitude of the gradients would be much smaller, so the model would be able to converge faster.

Putting all this together, I would suggest you use a model like this:

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

def f(x):
  return 3.0 * x ** 3.0   2.0 * 11.0 ** 2   10.0

x_train = np.linspace(-5, 5, 100_000) # very big training set
X = x_train # save x_train for later use
y_train = f(x_train)

# calculate the normalization factor for the x and y data
# simple scaling to [-1, 1] range
x_max = np.max(np.abs(x_train))
y_max = np.max(np.abs(y_train))

# normalize the data
x_train /= x_max
y_train /= y_max

# create test data that slightly out of the training range
# so, we can see how the model generalizes to unseen data ([-6, -5] and [5, 6])
x_test = np.concatenate([
  np.linspace(-6, -5, 1000),
  np.linspace(5, 6, 1000)
])
y_test = f(x_test)
# normalize the data by the same factor
x_test /= x_max
y_test /= y_max
###################################
activation = 'linear' # 'linear', 'relu', 'tanh', 'sigmoid'
NDims = 256 # number of neurons in each layer
dropoutRate = 0.0 # dropout rate. 0.0 means no dropout, try up to ~0.5
layers = [
  Dense(NDims, input_shape=[1], activation=activation), # input layer
]
for _ in range(3): # N hidden layers
  if 0.0 < dropoutRate:
    layers.append(Dropout(dropoutRate))
  layers.append(Dense(NDims, activation=activation))
  continue
layers.append(Dense(1)) # output layer

model = Sequential(layers)
model.compile(optimizer='sgd', loss='mean_squared_error')

model.fit(
  x_train, y_train,
  validation_data=(x_test, y_test),
  batch_size=32,
  shuffle=True, # shuffle the training data before each epoch
  epochs=10,
  # for restoring the best model after training
  callbacks=[
    tf.keras.callbacks.ModelCheckpoint(
      'model.h5',
      save_best_only=True,
      monitor='val_loss',
      verbose=1,
    ),
  ]
)
model.load_weights('model.h5') # load the best model
# evaluate the model on the In Distribution data, i.e. data that is very close to the training data
# data from the same distribution as the training data but with noise
noiseStd = np.diff(X).mean() * 1.0
x_idd = X   np.random.normal(0, noiseStd, size=X.shape)
y_idd = f(x_idd)
# normalize the data by the same factor
x_idd /= x_max
y_idd /= y_max
evaluation = model.evaluate(x_idd, y_idd, verbose=1)
# it should be very good
print('Evaluation on ID data: ', evaluation)
########################################################
# evaluate the model on the OOD data, i.e. data that is very far from the training data
x_ood = np.linspace(-100, 100, 100000)
y_ood = f(x_ood)
# normalize the data by the same factor
x_ood /= x_max
y_ood /= y_max
evaluation = model.evaluate(x_ood, y_ood, verbose=1)
# it would be very painful :D NNs typically don't generalize well to OOD data
print('Evaluation on OOD data: ', evaluation)

I highly recommend playing around with this code/model and see how it behaves. For example, you can try to change the activation function, the number of neurons in each layer, the number of layers, the dropout rate, etc. Especially encourage you to try 'relu' activation function.

As you can see, (simple) neural networks aren't suitable for "low-dimensional" problems with exact solutions. They shine in high-dimensional problems which couldn't be solved by exact methods. For example, there is no exact equation to convert an RGB image to a probability distribution is it a cat or a dog. But, neural networks can learn this mapping from the training data. It would be even more efficient, because each image would be represented by a lot of pixels, instead of just a single number.

  • Related