How to teach a parabolic function to a neural net-CodePudding

I am aiming for a sequential neural network with two neurons capabale of reproducing a quadratic function. To do this, I chose the activation function of the first neuron to be lambda x: x**2, and the second neuron to be None.

Each neuron outputs A(ax b) where A is the activation function, a is the weight for the given neuron, b is the bias term. Output of the first neuron is passed onto the second neuron, and the output of that neuron is the result.

The form of the output of my network is then:

$A_2(A_1(a_1x b_1) b_2) = a_2(a_1x b_1)^2 b_2 = a_2a_1^2x^2 2a_2a_1b_1x a_2b_1^2 b_2$

Training the model means to adjust the weights and biases of each neuron. Choosing a very simple set of parameters, ie:

$a_1 = a_2 = b_1 = b_2=1$

leads us to a parabola which should be perfectly learnable by a 2-neuron neural net descibed above:

$f(x) = x^2 2x 2$

To implement the neural network, I do:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

Define function to be learned:

f = lambda x: x**2   2*x   2

Generate training inputs and outputs using above function:

np.random.seed(42)
questions = np.random.rand(999)
solutions = f(questions)

Define neural network architecture:

model = tf.keras.Sequential([
  tf.keras.layers.Dense(units=1, input_shape=[1],activation=lambda x: x**2),
  tf.keras.layers.Dense(units=1, input_shape=[1],activation=None)
])

Compile net:

model.compile(loss='mean_squared_error',
              optimizer=tf.keras.optimizers.Adam(0.1))

Train the model:

history = model.fit(questions, solutions, epochs=999, batch_size = 1, verbose=1)

Generate predictions of f(x) using the newly trained model:

np.random.seed(43)
test_questions = np.random.rand(100)
test_solutions = f(test_questions)

test_answers = model.predict(test_questions)

Visualize result:

plt.figure(figsize=(10,6))
plt.scatter(test_questions, test_solutions, c='r', label='solutions')
plt.scatter(test_questions, test_answers, c='b', label='answers')
plt.legend()

The red dots form the curve of the parabola our model was supposed to learn, the blue dots form the curve which it has learnt. This approach clearly did not work.

What is wrong with the approach above and how to make the neural net actually learn the parabola?

CodePudding user response：

Fix using proposed architecture

Decreasing a learning rate to 0.001 does the trick, compile like this instead:

model.compile(loss='mean_squared_error',
              optimizer=tf.keras.optimizers.Adam(0.001))

Visualize new results:

plt.figure(figsize=(10,6))
plt.scatter(test_questions, test_solutions, c='r',marker=' ', s=500, label='solutions')
plt.scatter(test_questions, test_answers, c='b', marker='o', label='answers')
plt.legend()

Nice fit. To check the actual weights to know what parabola exactly was learnt, we can do:

[np.array(layer.weights) for layer in model.layers]

Output:

[array([-1.3284513, -1.328055 ], dtype=float32),
 array([0.5667597, 1.0003909], dtype=float32)]

Expected 1, 1, 1, 1, but plug these values back to the equation

$a_2a_1^2x^2 2a_2a_1b_1x a_2b_1^2 b_2$

Coefficient of x^2 term:

0.5667597*(-1.3284513)**2 # result: 1.0002078022990382

Coefficient of x term:

2*0.5667597*-1.3284513*-1.328055 # result: 1.9998188460235597

Constants terms:

0.5667597*(-1.328055)**2 1.0003909 # result: 2.000002032736224

Ie the learnt parabola is:

1.0002078022990382 * x**2   1.9998188460235597 * x   2.000002032736224

Which is pretty close to f, ie x**2 2*x 2.

Reassuringly, the difference between the coefficients of the learned parabola and the true parabola is less than the learning rate.

Note that we can use an even simpler architecture

ie:

model = tf.keras.Sequential([
  tf.keras.layers.Dense(units=1, input_shape=[1],activation=lambda x: x**2),
])

Ie we have a neuron with output (a*x b)**2, and through training a & b are adjusted -> we can describe any parabola like this as well. (Did actually try this too, it worked.)

CodePudding user response：

To add to @Zabob's answer. You have used Adam optimizer which is sensitive to the initial learning rate, and while it is considered quite robust, I have found that it is sensitive to the initial learning rate- and can result in unexpected results (as in your case where it is learning opposite curve). If you change the optimizer to SGD:

model.compile(loss='mean_squared_error',
              optimizer=tf.keras.optimizers.SGD(0.01))

Then in less than 100 epochs, you can get an optimized network: