Overfitting a small data set through the ReLU activation-CodePudding

I would like to train a network by only using the ReLU activation and completely overfit the data. However, no matter how many different network structures I utilize (e.g., increasing the number of neurons and layers), I'm not able to reach a loss value close to zero.

It is important to emphasize that i) I don't want to use another activation function, and ii) for now, I won't be normalizing the data points.

n = 50
x = np.random.randint(50, 2000,  (n, 10))
y = np.random.randint(600, 4000,  (n, 1))

k = 16

model = tf.keras.Sequential([  
        tf.keras.layers.Flatten(input_shape=(10,)),
        tf.keras.layers.Dense(k, activation='relu'),
        tf.keras.layers.Dense(k, activation='relu'),
        tf.keras.layers.Dense(k, activation='relu'),
        tf.keras.layers.Dense(k, activation='relu'),
        tf.keras.layers.Dense(1) 
    ])

model.compile(loss='mse',optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001))   
model.fit(x, y, epochs=1000, batch_size=1, verbose=2)

My current network is quite vanilla, and in my opinion, the ReLU activation should "easily" memorize/overfit the data especially when considering the size and dimension of the data set. Is there a chance that I might be doing something wrong in my code? or What could be the reason that my network does not work?

CodePudding user response：

The loss seems to go down to zero eventually if you add more neurons and increase the batch size and learning rate.

import numpy as np
import tensorflow as tf
np.random.seed(0)
tf.random.set_seed(0)

n = 50
x = np.random.randint(50, 2000,  (n, 10))
y = np.random.randint(600, 4000,  (n, 1))

k = 100

model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(10,)),
        tf.keras.layers.Dense(k, activation='relu'),
        tf.keras.layers.Dense(k, activation='relu'),
        tf.keras.layers.Dense(k, activation='relu'),
        tf.keras.layers.Dense(k, activation='relu'),
        tf.keras.layers.Dense(1)
    ])

model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(learning_rate=0.001))

history = model.fit(x, y, epochs=1000, batch_size=32, verbose=0)

loss = history.history['loss']

for epoch in [1, 10, 50, 100, 500, 1000]:
    print('Epoch: {}, Loss: {:,.4f}'.format(epoch, loss[epoch - 1]))

# Epoch: 1, Loss: 7,371,788.0000
# Epoch: 10, Loss: 1,696,856.7500
# Epoch: 50, Loss: 523,101.4375
# Epoch: 100, Loss: 23,518.8301
# Epoch: 500, Loss: 88.4517
# Epoch: 1000, Loss: 0.0000