Switch between the heads of a model during inference-CodePudding

I have 200 neural networks which I trained using transfer learning on text. They all share the same weights except for their heads which are trained on different tasks. Is it possible to merge those networks into a single model to use with Tensorflow such that when I call it with input (text, i) it returns me the prediction task i. The idea here is to only store the shared weights once to save on model size and also to only evaluate the head of the task we want to predict in order to save on computations. The important bit is to wrap all of that into a Tensorflow model as I want to make it easier to serve it on google-ai-platform .

Note: It is fine to train all the heads independently, I just want to put all of them together into a single model for the inference part

CodePudding user response：

You probably have a model like the following:

# Create the model
inputs = Input(shape=(height, width, channels), name='data')
x = layers.Conv2D(...)(inputs)
# ...
x = layers.GlobalAveragePooling2D(name='penultimate_layer')(x)
x = layers.Dense(num_class, name='task0', ...)(x)
model = models.Model(inputs=inputs, outputs=[x])

Until now the model only has one output. You can add multiple outputs at model creation, or later on. You can add a new head like this:

last_layer = model.get_layer('penultimate_layer').output

output_heads = []
taskID = 0
while True:
    try:
        head = model.get_layer("task" str(taskID))
        output_heads.append(head.output)
        taskID  = 1
    except:
        break

# add new head
new_head = layers.Dense(num_class, name='task' str(taskID), ...)(last_layer)
output_heads.append(new_head)

model = models.Model(inputs=model.input, outputs=output_heads)

Now since every head has a name you can load your specific weights, calling the head by name. The weights to load are the weights of the last layer of (an)other_model. You should have something like this:

model.get_layer("task0").set_weights(other_model.layers[-1].get_weights())

When you want to obtain predictions, all you need to know is the task ID of the head you want to look at:

taskID=0  # obtain predictions from head 0
outputs = model(test_data, training=False)
predictions = outputs[taskID]

If you want to train new heads later on, while still sharing the same backbone, you just have to freeze the other heads, otherwise even those will be trained, and you don't want that:

for layer in model.layers:
    if "task" in layer.name:
        layer.trainable = False
# code to add the new head ...

Training new tasks, so a new set of classes, in a later moment is called task-incremental learning. The major issue with this is catastrophic forgetting: it is pretty easy to still forget prior knowledge while training new tasks. Even if the heads are frozen, the backbone obviously isn't. If you do this you'll have to apply some technique to avoid this.