Why Tensorflow reports CUDA out of memory but empty

device = torch.device("cuda:0")
model = BertModel.from_pretrained("bert-base-uncased", output_hidden_states = True)

model.to(device)

train_hidden_states = []
model.eval()

for batch in train_dataloader:
    b_input_ids = batch[0].to(device)
    b_input_mask = batch[1].to(device)

    with torch.no_grad():        
        output = model(b_input_ids, 
                        token_type_ids=None, 
                        attention_mask=b_input_mask,
                        )
        hidden_states = output[2][12]
        train_hidden_states.append(hidden_states)

Here I am trying to get the last layer embeddings of Bert model for data in the train_dataloader.

The thing is that CUDA out of memory after 14 batches.

I tried to empty the cache, but it only decreases the GPU usage for a little bit.

with torch.cuda.device('cuda:0'):
    torch.cuda.empty_cache()

What could be the problem?

CodePudding user response：

You are storing tensors on GPU in train_hidden_states list. You can move then into CPU before pushing to the list train_hidden_states.append(hidden_states.cpu()).