tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed to allocate memory [Op:AddV2]-CodePudding

Hi I am a beginner in DL and tensorflow,

I created a CNN (you can see the model below)

model = tf.keras.Sequential()

model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=7, activation="relu", input_shape=[512, 640, 3]))
model.add(tf.keras.layers.MaxPooling2D(2))
model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=3, activation="relu"))
model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=3, activation="relu"))
model.add(tf.keras.layers.MaxPooling2D(2))
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=3, activation="relu"))
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=3, activation="relu"))
model.add(tf.keras.layers.MaxPooling2D(2))

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(2, activation='softmax'))

optimizer = tf.keras.optimizers.SGD(learning_rate=0.2) #, momentum=0.9, decay=0.1)
model.compile(optimizer=optimizer, loss='mse', metrics=['accuracy'])

I tried building and training it with the cpu and it was completed successfully (but very slowly) so I decided to install tensorflow-gpu. Installed everything as instructed in https://www.tensorflow.org/install/gpu).

But now when I am trying to build the model this error comes up:

> Traceback (most recent call last):   File
> "C:/Users/thano/Documents/Py_workspace/AI_tensorflow/fire_detection/main.py",
> line 63, in <module>
>     model = create_models.model1()   File "C:\Users\thano\Documents\Py_workspace\AI_tensorflow\fire_detection\create_models.py",
> line 20, in model1
>     model.add(tf.keras.layers.Dense(128, activation='relu'))   File "C:\Python37\lib\site-packages\tensorflow\python\training\tracking\base.py",
> line 530, in _method_wrapper
>     result = method(self, *args, **kwargs)   File "C:\Python37\lib\site-packages\keras\engine\sequential.py", line 217,
> in add
>     output_tensor = layer(self.outputs[0])   File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 977,
> in __call__
>     input_list)   File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 1115,
> in _functional_construction_call
>     inputs, input_masks, args, kwargs)   File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 848,
> in _keras_tensor_symbolic_call
>     return self._infer_output_signature(inputs, args, kwargs, input_masks)   File
> "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 886,
> in _infer_output_signature
>     self._maybe_build(inputs)   File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 2659,
> in _maybe_build
>     self.build(input_shapes)  # pylint:disable=not-callable   File "C:\Python37\lib\site-packages\keras\layers\core.py", line 1185, in
> build
>     trainable=True)   File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 663,
> in add_weight
>     caching_device=caching_device)   File "C:\Python37\lib\site-packages\tensorflow\python\training\tracking\base.py",
> line 818, in _add_variable_with_custom_getter
>     **kwargs_for_getter)   File "C:\Python37\lib\site-packages\keras\engine\base_layer_utils.py", line
> 129, in make_variable
>     shape=variable_shape if variable_shape else None)   File "C:\Python37\lib\site-packages\tensorflow\python\ops\variables.py",
> line 266, in __call__
>     return cls._variable_v1_call(*args, **kwargs)   File "C:\Python37\lib\site-packages\tensorflow\python\ops\variables.py",
> line 227, in _variable_v1_call
>     shape=shape)   File "C:\Python37\lib\site-packages\tensorflow\python\ops\variables.py",
> line 205, in <lambda>
>     previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)   File "C:\Python37\lib\site-packages\tensorflow\python\ops\variable_scope.py",
> line 2626, in default_variable_creator
>     shape=shape)   File "C:\Python37\lib\site-packages\tensorflow\python\ops\variables.py",
> line 270, in __call__
>     return super(VariableMetaclass, cls).__call__(*args, **kwargs)   File
> "C:\Python37\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py",
> line 1613, in __init__
>     distribute_strategy=distribute_strategy)   File "C:\Python37\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py",
> line 1740, in _init_from_args
>     initial_value = initial_value()   File "C:\Python37\lib\site-packages\keras\initializers\initializers_v2.py",
> line 517, in __call__
>     return self._random_generator.random_uniform(shape, -limit, limit, dtype)   File
> "C:\Python37\lib\site-packages\keras\initializers\initializers_v2.py",
> line 973, in random_uniform
>     shape=shape, minval=minval, maxval=maxval, dtype=dtype, seed=self.seed)   File
> "C:\Python37\lib\site-packages\tensorflow\python\util\dispatch.py",
> line 206, in wrapper
>     return target(*args, **kwargs)   File "C:\Python37\lib\site-packages\tensorflow\python\ops\random_ops.py",
> line 315, in random_uniform
>     result = math_ops.add(result * (maxval - minval), minval, name=name)   File
> "C:\Python37\lib\site-packages\tensorflow\python\util\dispatch.py",
> line 206, in wrapper
>     return target(*args, **kwargs)   File "C:\Python37\lib\site-packages\tensorflow\python\ops\math_ops.py",
> line 3943, in add
>     return gen_math_ops.add_v2(x, y, name=name)   File "C:\Python37\lib\site-packages\tensorflow\python\ops\gen_math_ops.py",
> line 454, in add_v2
>     _ops.raise_from_not_ok_status(e, name)   File "C:\Python37\lib\site-packages\tensorflow\python\framework\ops.py",
> line 6941, in raise_from_not_ok_status
>     six.raise_from(core._status_to_exception(e.code, message), None)   File "<string>", line 3, in raise_from
> tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed
> to allocate memory [Op:AddV2]

Any ideas what might be the problem?

CodePudding user response：

The error is telling you that it couldn't allocate as much VRAM as you are using. The easiest way to overcome this kind of problem is to reduce to batch-size to a number that fits on your GPU's VRAM.

CodePudding user response：

The error message you received tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed to allocate memory [Op:AddV2] could indicate that your GPU does not have enough memory for the training job you want to run. What GPU are you using and how much vRAM does it have?

When it comes to "Out Of Memory" (OOM) errors when training, the most straightforward thing to do is to reduce the batch_size hyperparameter.

There's no straightforward way to determine what the largest batch_size you can use while training that will fit your GPU's available vRAM other than trial and error. A general rule however, is to use a power of 2 (e.g. 8, 16, 32).