CIFAR10 data set, the code to complete the task is: to spread before when the output of the multiple net out, and then the average, the forward (function) every calculated out, all need nearly 7 g of memory, memory after multiple iterations will be multiplied, and the memory of Titan also 12 g, however, so in the second when calculating out tips out of the memory, eventually lead to abnormal program termination to the next step is calculated,
Aiming at this problem, my efforts:
1) ruled out coding errors, because the download code via making, is unlikely to be coding errors
Version 2) eliminate errors, I thought it was pytorch version, results in the multiple versions 0.0.12, 0.2.0, 0.3.0 0.4.0 are unable to solve this problem
3) lead to exclude one-time loading data, because I have done in the previous code
Train_loader=torch. Utils. Data. DataLoader (dataset=train_dataset batch_size=64, shuffle=True, num_workers=4, drop_last=True)
This is what causes excuse me? What is the solution?
I imagine: may be the result, the dynamic figure pytorch out every time calculations are Variable, the results stored in memory, calculate out once again, the result will still be stored in memory, and each time the computation overhead is quite huge, so lead to memory multiplied, whether can be solved by the following two ways:
1) will be to spread out before each kept in memory, average in memory, and then after will average out into the memory, for the subsequent back propagation,
2) turns out it's all types by Variable into Tensor
These two methods is just imagine, I do not know whether it is feasible to, even if possible also don't know how to do,
For I, (images, labels) in enumerate (train_loader) :
Images=Variable (images). Cuda ()
Labels=Variable (labels). Cuda ()
Out=net (images)
Loss=criterion (out, labels)/UPDATE_EVERY
Loss. Backward ()
If (I + 1) % UPDATE_EVERY==0:
Optimizer. Step ()
Optimizer. Zero_grad ()
Loss_epoch=loss_epoch + loss. Data [0]
Score_epoch=score_epoch + compute_score (out) data, labels. Data)
Def forward (self, x) :
Out=nn. LogSoftmax () (self.net s [0] (x))
For n in range (1, and the self. The num) :
Out=out + nn. LogSoftmax () s/n (x) (self.net)
Out=out/self. Num
Return the out
CodePudding user response:
This depends on the understanding of the source pytorch way of memory managementCodePudding user response:
You want to make sure whether logsoftmax problem, can write code to test it.