I have a question related to keras model code in R. I have finished training the model and need to predict. Predicting a line is very fast, but my data has 2000,000,000 rows and nearly 200 columns, with a structure like the attached image. Datastructure I don't know if anyone has any suggestions on which method to use so that predict can run quickly and use less memory. I created a matrix according to the table as shown in order to predict, each matrix is 200,000x200 dimensions. Then I use sapply to predict all the remaining matrices. However, even though predict is fast for each matrix, but creating the matrix is slow, so it makes the model run twice or three times as long, and that is not taking into account the sapply step. I wonder if keras has a "smart" way to know that in each of his matrix, the last N columns that are exactly the same? I google and see someone talking about RepeatVector but I don't quite understand and it seems that this is only used for training? I already have the model and just need to predict. Thank you so much everyone!
CodePudding user response:
One of the most performant ways to feed keras models locally is by creating a tf.data.Dataset
object. Please take a look at the tfdatasets
R package for guides and example usage.