Is there a difference in the weights of hidden layers if I train a model with an output layer of 10-CodePudding

Essentially I don't have enough RAM to train the model I want from scratch using the 2000 classes all at once. Because of that I was wondering if I could use an output layer of 200 neurons and save the weights after training the model with those 200 classes and then load those same weights and train the model yet again with another 200 different classes until I trained the model with all the 2000 classes.

Note that this dataset is being used to pre-train the model so that I can then, retrain the model with another, much smaller, dataset. So essentially I want to pre-train the model with this big dataset and then switch the output layer and retrain the last layers of the model with a much smaller one.

Does this way of training achieve the same weights on hidden layers as training the model one time with the 2000 classes?

CodePudding user response：

No. Your weights will be different. This would work only if you were training a linear model, but not a neural network.

I find it rather suspicious to see that the problem lies in number of outputs changing from 200 to 2000. This is 10x increase in memory use of final layer, but this should not be a huge number to begin with, maybe your last (penultime) hidden layer is too large? Even if your previous layer is say 2000 too, this would give us a matrix of 2000x2000 which is barely 4,000,000 floats which is 16 megabytes.