Training a network for machine learning purpose, dividing the dataset in portions-CodePudding

I have a big dataset that can't be loaded in RAM due to lack of enough memory. What I am trying to do is train the model in x portions of the dataset to get the final model trained in the whole dataset as following:

num_divisione_dataset=4
div_tr = int(int(len(x_tr))/num_divisione_dataset)
div_val = int(2160/num_divisione_dataset)
num_training = int(math.ceil(100/num_divisione_dataset))

for i in range(0,num_divisione_dataset-1):

  model.fit(
  x_tr[div_tr*i:div_tr*(i 1)], y_tr[div_tr*i:div_tr*(i 1)],
  batch_size = 32,
  callbacks=[model_checkpoint_callback],
  validation_data = (x_val, y_val),
  epochs = 25 
  )

Is it a right way to train a model?

CodePudding user response：

The batch_size = 32 already is a way to train the model in batches of size 32. It seems you have two levels of batching, one that you built yourself an another that's provided by Tensorflow.

The problem with your batching is epochs=25. The Tensorflow batches alternate within an epoch, and the next epoch it loops again over the Tensorflow batches. But you first train 25 epochs with your first batch, then 25 epochs with your second batch, etcetera.

I'm not sure this is best solved in software. It might be easier to just ignore the lack of RAM, and let the OS swap to disk. Buying more RAM could be another viable route. But a possible software route would be an input pipeline

CodePudding user response：

Put your data in a CSV file. Then use make_csv_dataset to load it in batches and pass it to model.fit. Make sure to set num_epochs=1, otherwise the data set will loop forever.

Here you can find an example on how to use it.

A minimal code should be:

def get_dataset(batch_size = 5):
    return tf.data.experimental.make_csv_dataset(DATASET_PATH, batch_size = batch_size, label_name = LABEL_NAME, num_epochs = 1)

dataset = get_dataset(batch_size=BATCH_SIZE)
train_dataset = dataset.take(train_size)
val_dataset = dataset.skip(train_size)
history = model.fit(
    train_dataset,
    validation_data=val_dataset,
    callbacks=callbacks,
    epochs=EPOCHS,
)