Use dataset with multiple tensors per item in model with multiple inputs-CodePudding

I have a Tensorflow Dataset, with items of the following format

(
    <tf.Tensor: shape=(14,), dtype=int64, numpy=array([ 1,  2,  3,  4,  5,  4,  6,  7,  8,  9, 10, 11,  9, 12])>, 
    <tf.Tensor: shape=(12,), dtype=int64, numpy=array([ 1,  2,  3,  4,  5,  4,  6,  7,  8,  9, 10, 11])>, 
    <tf.Tensor: shape=(), dtype=int64, numpy=0>
)

created by this code

encoder = tfds.deprecated.text.TokenTextEncoder(token_counts)

def encode(text0, text1, label):
    return encoder.encode(text0.numpy()), encoder.encode(text1.numpy()), label

def encode_map_fn(text, text2, label):
    return tf.py_function(encode,
                          inp=[text, text2, label],
                          Tout=[tf.int64, tf.int64, tf.int64])

ds_train = ds_raw_train.map(encode_map_fn)
ds_train_valid = ds_raw_train_valid.map(encode_map_fn)

It gets batched with the following code, but this should not have any effect on the problem

train_data_batch = ds_train.padded_batch( 32, padded_shapes=([-1],[-1],[]))

valid_data_batch = ds_train_valid.padded_batch( 32, padded_shapes=([-1],[-1],[]))

It produces the following output

(
    <tf.Tensor: shape=(32, 29), dtype=int64, numpy=array([[ 1,  2,  3,  4,  5,  4,  6,  7,  8,  9, 10, 11,  9, 12],...])>, 
    <tf.Tensor: shape=(32, 29), dtype=int64, numpy=array([[ 1,  2,  3,  4,  5,  4,  6,  7,  8,  9, 10, 11],...])>, 
    <tf.Tensor: shape=(32,), dtype=int64, numpy=array([0,...]>
)

After creating model with

lstm_layer = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences = False))

size_dic = len(token_counts)   2

emb = Embedding(size_dic, 100, input_length=300)

input1 = tf.keras.Input(shape=(300,))
input2 = tf.keras.Input(shape=(300,))

e1 = emb(input1)
e2 = emb(input2)
x1 = lstm_layer(e1)
x2 = lstm_layer(e2)

mhd = lambda x: tf.keras.backend.abs(x[0] - x[1])
merged = tf.keras.layers.Lambda(function=mhd, output_shape= lambda x: x[0])([x1, x2])
preds = tf.keras.layers.Dense(1)(merged)
model = tf.keras.Model(inputs=(input1, input2), outputs=preds)
model.compile(optimizer=tf.keras.optimizers.Adam(1e-3),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
              metrics=['accuracy'])

i try to fit the model with

history = model.fit(train_data_batch, validation_data=valid_data_batch, epochs=5)

This results in the following error

ValueError: Layer "model_1" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None) dtype=int64>]

I think the items of the dataset should be in the format ([tf.int64, tf.int64], tf.int64), but trying to set this in Tout will result in a error.

Is there a way to either change the dataset to the needed format, change the model to accept the dataset as it is or get the individual Iterator for the first, second and third attribute of each item in the Dataset?

CodePudding user response：

Try:

train_data_batch = train_data_batch.map(lambda x1, x2, y: ((x1, x2), y))