Home > Blockchain >  DNN library is not found. (Google Colab)
DNN library is not found. (Google Colab)

Time:08-19

I'm working on a ML project using Google Colab and Tensorflow to train a CNN, starting from the EfficientNetV2M model. It used to work just fine until two days ago, when starting the training:

train = model.fit(X, y, epochs=save_every_n_epochs, batch_size=16, verbose=1)

gave the following error:

UnimplementedError                        Traceback (most recent call last)
<ipython-input-5-1b2fb9765100> in <module>
     70     print(f"Training the model for {save_every_n_epochs} epochs")
     71 
---> 72     train = model.fit(X, y, epochs=save_every_n_epochs, batch_size=16, verbose=1)
     73     print("Model trained")
     74 

1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     53     ctx.ensure_initialized()
     54     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55                                         inputs, attrs, num_outputs)
     56   except core._NotOkStatusException as e:
     57     if name is not None:

UnimplementedError: Graph execution error:

Detected at node 'sequential/efficientnetv2-m/stem_conv/Conv2D' defined at (most recent call last):
    File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
      "__main__", mod_spec)
    File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
      exec(code, run_globals)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py", line 16, in <module>
      app.launch_new_instance()
    File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 846, in launch_instance
      app.start()
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py", line 612, in start
      self.io_loop.start()
    File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 132, in start
      self.asyncio_loop.run_forever()
    File "/usr/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
      self._run_once()
    File "/usr/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
      handle._run()
    File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
      self._context.run(self._callback, *self._args)
    File "/usr/local/lib/python3.7/dist-packages/tornado/ioloop.py", line 758, in _run_callback
      ret = callback()
    File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1233, in inner
      self.run()
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
      yielded = self.gen.send(value)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 365, in process_one
      yield gen.maybe_future(dispatch(*args))
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
      yielded = next(result)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
      yield gen.maybe_future(handler(stream, idents, msg))
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
      yielded = next(result)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 545, in execute_request
      user_expressions, allow_stdin,
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
      yielded = next(result)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py", line 306, in do_execute
      res = shell.run_cell(code, store_history=store_history, silent=silent)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell
      return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2855, in run_cell
      raw_cell, store_history, silent, shell_futures)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell
      return runner(coro)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
      coro.send(None)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3058, in run_cell_async
      interactivity=interactivity, compiler=compiler, result=result)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes
      if (await self.run_code(code, result,  async_=asy)):
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "<ipython-input-5-1b2fb9765100>", line 72, in <module>
      train = model.fit(X, y, epochs=save_every_n_epochs, batch_size=16, verbose=1)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1409, in fit
      tmp_logs = self.train_function(iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function
      return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step
      outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step
      y_pred = self(x, training=True)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/sequential.py", line 374, in call
      return super(Sequential, self).call(inputs, training=training, mask=mask)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 459, in call
      inputs, training=training, mask=mask)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 596, in _run_internal_graph
      outputs = node.layer(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 459, in call
      inputs, training=training, mask=mask)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 596, in _run_internal_graph
      outputs = node.layer(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/layers/convolutional/base_conv.py", line 250, in call
      outputs = self.convolution_op(inputs, self.kernel)
    File "/usr/local/lib/python3.7/dist-packages/keras/layers/convolutional/base_conv.py", line 232, in convolution_op
      name=self.__class__.__name__)
Node: 'sequential/efficientnetv2-m/stem_conv/Conv2D'
DNN library is not found.
     [[{{node sequential/efficientnetv2-m/stem_conv/Conv2D}}]] [Op:__inference_train_function_45723]

I wasn't able to train any model since then, always getting this error, also when loading previously stored models. I am able to train when i use an environment without GPU, but it is obviously too slow. I've also tried to change the TF version as suggested in other topics, without any success.

Any suggestions?

CodePudding user response:

Yes there's a similar question from a few hours ago. Apparently this is a problem related to the latest Tensoflow update introduced in Colab (Tensorflow 2.9.1).

As a quick fix you could downgrade Tensorflow. However only downgrading to tf 2.8, as suggested in the linked question wasn't enough to fix the problem in my case.

Try this:

!pip uninstall tensorflow-gpu

!pip install tensorflow-gpu==2.8
!apt install --allow-change-held-packages libcudnn8=8.1.0.77-1 cuda11.2

Also make sure to restart the runtime if it asks you to do so.

  • Related