I want to concatenate two embeddings using keras embeddings layers and train the model-CodePudding

I have data consisting of two-column (titles, label) for binary classification(0,1) I have generated two embeddings (raw text by SBERT and Knowledge graphs embeddings) of sizes (14196,384) and (6063,384) respectively. And now I want to concatenate these two embeddings to train the model by Keras embeddings layer. I am trying to embed my generated embeddings as pre-trained weights in the embeddings layers of Keras. I am using the following code.


num_epochs = 20
batch_size = 64
train_text, temp_text, train_labels, temp_labels = train_test_split(df['Title'], df['Label']
                                                                    random_state=2022, 
                                                                    test_size=0.3, 
                                                                    stratify=df['Label'])
train_data=tf.data.Dataset.from_tensor_slices((train_text, train_labels))
valid_data = tf.data.Dataset.from_tensor_slices((temp_text, temp_labels))


ip1 = tf.keras.layers.Input((14169))
ip2 = tf.keras.layers.Input((14169))
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(128,activation = 'relu',name='dense_1'))
text_embed = tf.keras.layers.Embedding(14169, 384, input_length=14169,weights=[r_text],trainable=False)(ip1)
KG_embed = tf.keras.layers.Embedding(6063, 384, input_length=6063,weights=[embeddings_rdf_train],trainable=False)(ip2)
# model.add(tf.keras.layers.Embedding(vocab_size, 300, weights=[r_text], 
# input_length=max_length, trainable=False))
#Model.add(tf.keras.layers.GlobalAveragePooling2D())
layerlist = [text_embed, KG_embed]
concat = tf.keras.layers.Concatenate(axis = -1)(layerlist)
model.add(tf.keras.models.Model([ip1, ip2], concat))
model.add(tf.keras.layers.Dense(1, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
history_concat = model.fit(train_data,epochs=num_epochs,validation_data=valid_data,verbose=1, batch_size=batch_size)

getting following error:
Epoch 1/20
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-3fb01ce958a6> in <module>()
      1 #history = Model.fit(concate_embeddings[0:14169],np.asarray(train_labels.values).astype('float32'), epochs=num_epochs, validation_split=0.1, shuffle=True, batch_size=batch_size)
----> 2 history_concat = model.fit(train_data,epochs=num_epochs,validation_data=valid_data,verbose=1, batch_size=batch_size)

1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
   1145           except Exception as e:  # pylint:disable=broad-except
   1146             if hasattr(e, "ag_error_metadata"):
-> 1147               raise e.ag_error_metadata.to_exception(e)
   1148             else:
   1149               raise

ValueError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 859, in train_step
        y_pred = self(x, training=True)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/input_spec.py", line 228, in assert_input_compatibility
        raise ValueError(f'Input {input_index} of layer "{layer_name}" '

    ValueError: Exception encountered when calling layer "sequential_13" (type Sequential).
    
    Input 0 of layer "dense_1" is incompatible with the layer: expected min_ndim=2, found ndim=0. Full shape received: ()
    
    Call arguments received:
      • inputs=tf.Tensor(shape=(), dtype=string)
      • training=True
      • mask=None

I have less idea about keras. Could anyone please help?

CodePudding user response：

Do you know how to convert your text into the indices necessary for each of the embeddings? Without this, your model will never work. And only you have these mappings since you trained the embeddings.

Aside from that, answering the question: you cannot use a Sequential model with two inputs.

Build a two input model

#question: do your titles really have 14169 words each? Sounds very weird
ip1 = tf.keras.layers.Input((14169))
ip2 = tf.keras.layers.Input((14169))

text_embed = tf.keras.layers.Embedding(14169, 384, input_length=14169,weights=[r_text],trainable=False)(ip1)
KG_embed = tf.keras.layers.Embedding(6063, 384, input_length=6063,weights=[embeddings_rdf_train],trainable=False)(ip2)

concat = layerlist = [text_embed, KG_embed]
concat = tf.keras.layers.Concatenate(axis = -1)(layerlist)

outputs = tf.keras.layers.Dense(128,activation = 'relu',name='dense_1')(concat)
outputs = tf.keras.layers.Dense(1, activation = 'sigmoid')(outputs)

model = tf.keras.models.Model([inp1, inp2], outputs)
model.compile(...)
model.fit(...)

CodePudding user response：

As per @Daniel Möller suggestion. I am using the code with some modifications to match the shapes of two different embeddings. Using reference answered by you concatenate-pre-trained-embedding-layer-and-input-layer

ip1 = tf.keras.layers.Input((14169,)) # number of input sentences 
ip2 = tf.keras.layers.Input((6063,))  

text_embed = tf.keras.layers.Embedding(14169, 384, input_length=14169,weights=[r_text],trainable=False)(ip1) #text embeddings(14196,384)
KG_embed = tf.keras.layers.Embedding(6063, 384, input_length=14169,weights=[embeddings_rdf_train],trainable=False)(ip2) #knowledge graphs embeddings (6063,384)

normal_kg = tf.keras.layers.Dense(14196)(ip2) #trying to match embeddings dimensions
normal_kg = tf.keras.layers.Reshape((14196,1))(normal_kg)
embedding_KG = KG_embed(normal_kg)

concat = layerlist = [text_embed, embedding_KG]
concat = tf.keras.layers.Concatenate(axis = -1)(layerlist)

outputs = tf.keras.layers.Dense(128,activation = 'relu',name='dense_1')(concat)
outputs = tf.keras.layers.Dense(1, activation = 'sigmoid')(outputs)

model = tf.keras.models.Model([ip1, ip2], outputs)
model.summary()

getting error:

TypeError                                 Traceback (most recent call last)
<ipython-input-28-4409dc2687a5> in <module>()
      7 normal_kg = tf.keras.layers.Dense(14196)(ip2)
      8 normal_kg = tf.keras.layers.Reshape((14196,1))(normal_kg)
----> 9 embedding_KG = KG_embed(normal_kg)
     10 
     11 concat = layerlist = [text_embed, embedding_KG]

TypeError: 'KerasTensor' object is not callable

could you please tell. How I should do according to my problem or whether I am going right?