Keras LSTM and pytorch LSTM results, 5% of the same data gap? O bosses find out why-CodePudding

As written in pytorch and keras with LSTM computing text similarity model,
Keras 88% accuracy on the test set, pytorch completes a was 83%, but keras particularly slow, because LSTM modules have a difference?

Keras model
# building model
Input_left=Input (shape=(Max_Seq_Num,))
Input_right=Input (shape=(Max_Seq_Num,))

# word embedded
Seq_input=Input (shape=(Max_Seq_Num,))

Embedding_layer=Embedding (Words_vec words_num Embed_Size, input_length=Max_Seq_Num,
Weights=[Words_vec vectors],
Trainable=False, name='embed_layer) (seq_input)

# LSTM
Lstm_layer=LSTM (128, dropout=0.1, recurrent_dropout=0.1, name='LSTM') (embedding_layer)

Model_encode=Model (seq_input lstm_layer, name='encode_model')
Model_encode. The summary ()

Left_encode=model_encode (input_left)
Right_encode=model_encode (input_right)

Merge_vec=concatenate ([left_encode right_encode])
Drop1 drop1=Dropout (0.1, name=' ') (merge_vec)
BN=BatchNormalization (name='bn1) (drop1)
Dence_1=Dense (128, activation='relu' name='dence1') (BN)
Drop2 drop2=Dropout (0.1, name=' ') (dence_1)
BN2=BatchNormalization (name='BN2) (drop2)
Predictions=Dense (1, the activation='sigmoid' name='pre') (BN2)

Model=model (input=[input_left input_right], outputs=predictions)
Model.com running (optimizer='Adam', loss='binary_crossentropy', the metrics=[' acc '])

Pytorch model

The class bilstm_att (nn Module) :
Def __init__ (self, embed_shape vectors) :
Super (bilstm_att, self) __init__ ()
Self. Word_embeding=nn. Embedding (embed_shape [0], embed_shape [1])
Self. Word_embeding. From_pretrained (torch from_numpy (vectors), freeze=True)

Self. Bilstm_word=nn. Sequential (nn Dropout (0.1), nn. The LSTM (embed_shape [1], 128, num_layers=1, Dropout=0.1,
Bidirectional=False, batch_first=True))

The self. The pre=nn. Sequential (nn Dropout (0.1), nn. BatchNorm1d (256), nn. Linear (256, 128), nn. ReLU (True),
Nn. BatchNorm1d (128), a nn. Dropout (p=0.1), nn. Linear (128, 1), nn. Sigmoid ())

Def encoder (self, x) :
X=self. Word_embeding (x)
X, _=self. Bilstm_word (x)
32 * 40 * 256 # x
X=x/: 1, :
Return the x

Def forward (self, x1, x2) :
X1=self. Encoder (X1)
X2=self. Encoder (X2)
X=torch. (X1, X2 (cat), dim=1)
Y=self. Pre (x). Squeeze ()
Return y

Optimizer=torch. Optim. Adam (model. The parameters (), lr=0.001)
Loss=nn. BCELoss ()