In theory, a no-hidden layer neural network should be the same as a logistic regression, however, we collect wildly varied results. What makes this even more bewildering is that the test case is incredibly basic, yet the neural network fails to learn.
tensorflow no-hidden-layer neural network
We have attempted to choose the parameters of both models to be as similar as possible (same number of epochs, no L2 penalty, same loss function, no addition optimizations such as momentum, etc...). The sklearn logistic regression correctly finds the decision boundary consistently, with minimal variation. The tensorflow neural network is highly variable, where it looks like the bias is 'struggling' to train.
The code is included below to recreate this issue. An ideal solution would have the tensorflow decision boundary very similar to the logistic regression decision boundary.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv1D, Dense, Flatten, Input, Concatenate, Dropout
from tensorflow.keras import Sequential, Model
from tensorflow.keras.optimizers import SGD
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.linear_model import LogisticRegression
X = np.array([[1, 1],
[2, 2]])
y = np.array([0, 1])
model = LogisticRegression(penalty = 'none',
solver='sag',
max_iter = 300,
tol = 1e-100)
model.fit(X, y)
model.score(X, y)
model.coef_.flatten()[1]
model.intercept_
w_1 = model.coef_.flatten()[0]
w_2 = model.coef_.flatten()[1]
b = model.intercept_
n = np.linspace(0, 3, 10000, endpoint=False)
x_n = -w_1 / w_2 * n - b / w_2
plt.scatter(X[:, 0], X[:, 1], c = y)
plt.plot(n, x_n)
plt.gca().set_aspect('equal')
plt.show()
X = np.array([[1, 1],
[2, 2]])
y = np.array([0, 1])
optimizer = SGD(learning_rate=0.01,
momentum = 0.0,
nesterov = False,
name = 'SGD')
inputs = Input(shape = (2,), name='inputs')
outputs = Dense(1, activation = 'sigmoid', name = 'outputs')(inputs)
model = Model(inputs = inputs, outputs = outputs, name = 'model')
model.compile(loss = 'bce', optimizer = optimizer, metrics = ['AUC', 'accuracy'])
model.fit(X, y, epochs = 100, verbose=False)
print(model.evaluate(X, y))
weights, bias = model.layers[1].get_weights()
weights = weights.flatten()
w_1 = weights[0]
w_2 = weights[1]
b = bias
n = np.linspace(0, 3, 10000, endpoint=False)
x_n = -w_1 / w_2 * n - b / w_2
plt.scatter(X[:, 0], X[:, 1], c = y)
plt.plot(n, x_n)
plt.grid()
plt.gca().set_aspect('equal')
plt.show()
CodePudding user response:
A simple way to determine if this is actually a bug is to let the number of epochs in your perceptron go to some arbitrary large number (say, 5000). You'll note that the decision boundary approaches that of your logistic regression model.
The natural question is why LR needs fewer iterations to achieve a near-optimal decision boundary. For strongly convex functions (like in your example), SAG enjoys much faster convergence than SGD. Thus, it takes SGD longer to converge to a "globally good" solution (though not many to converge to a locally good solution).