Deeplabcut stuck at ''Starting training....''


I try to use testscript.py to test DLC environment is work or not. But is show some error and stuck at ''Starting training....''.

The following is what I do:

  1. A win11 computer with i7-12700 and RTX3060
  2. Install Anaconda
  3. Install CUDA 11.2 and cuDNN 8.1
  4. Create environment by using offical DEEPLABCUT.yaml file
  5. Enter environment and pip install Torch
  6. Check if TF can use GPU
  7. Run testscripy.py and stuck at "Starting training..."

Please help me solve this problem. I think that may cause by some outdated packages.

The packages in environment:

# packages in environment at C:\App\anaconda3\envs\DEEPLABCUT:
# Name                    Version                   Build  Channel
I sure TF can use my GPU by this:

(DEEPLABCUT) C:\Windows\system32>python
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:45) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
2022-09-11 20:41:50.137638: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-11 20:41:50.481875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /device:GPU:0 with 9616 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
incarnation: 200595950863773239
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10083106816
locality {
  bus_id: 1
  links {
incarnation: 14570387183940456862
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6"
xla_global_id: 416903419

Terminal stuck at "Starting training..."

(DEEPLABCUT) C:\works\DLC\DLC_script\DeepLabCut-master\DeepLabCut-master\examples>python testscript.py
Loading DLC 2.2.2...
Imported DLC!
On Windows/OSX tensorpack is not tested by default.
Created "C:\works\DLC\DLC_script\DeepLabCut-master\DeepLabCut-master\examples\TEST-Alex-2022-09-11\videos"
Created "C:\works\DLC\DLC_script\DeepLabCut-master\DeepLabCut-master\examples\TEST-Alex-2022-09-11\labeled-data"
Created "C:\works\DLC\DLC_script\DeepLabCut-master\DeepLabCut-master\examples\TEST-Alex-2022-09-11\training-datasets"
Created "C:\works\DLC\DLC_script\DeepLabCut-master\DeepLabCut-master\examples\TEST-Alex-2022-09-11\dlc-models"
Copying the videos
Generated "C:\works\DLC\DLC_script\DeepLabCut-master\DeepLabCut-master\examples\TEST-Alex-2022-09-11\config.yaml"

A new project with name TEST-Alex-2022-09-11 is created at C:\works\DLC\DLC_script\DeepLabCut-master\DeepLabCut-master\examples and a configurable file (config.yaml) is stored there. Change the parameters in this file to adapt to your project's needs.
 Once you have changed the configuration file, use the function 'extract_frames' to select frames for labeling.
. [OPTIONAL] Use the function 'add_new_videos' to add new videos to your project (at any stage).
Config file read successfully.
Extracting frames based on kmeans ...
Kmeans-quantization based extracting of frames from 0.0  seconds to 8.53  seconds.
Extracting and downsampling... 256  frames from the video.
256it [00:01, 214.77it/s]
Kmeans clustering ... (this might take a while)
Frames were successfully extracted, for the videos listed in the config.yaml file.

You can now label the frames using the function 'label_frames' (Note, you should label frames extracted from diverse videos (and many videos; we do not recommend training on single videos!)).
Plot labels...
Creating images with labels by Alex.
100%|█████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  6.01it/s]
If all the labels are ok, then use the function 'create_training_dataset' to create the training dataset!
Downloading a ImageNet-pretrained model from https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckptsaug/efficientnet-b0.tar.gz....
The training dataset is successfully created. Use the function 'train_network' to start training. Happy training!
CHANGING training parameters to end quickly!
Selecting single-animal trainer
{'all_joints': [[0], [1], [2], [3]],
 'all_joints_names': ['bodypart1', 'bodypart2', 'bodypart3', 'objectA'],
 'alpha_r': 0.02,
 'apply_prob': 0.5,
 'batch_size': 1,
 'contrast': {'clahe': True,
              'claheratio': 0.1,
              'histeq': True,
              'histeqratio': 0.1},
 'convolution': {'edge': False,
                 'emboss': {'alpha': [0.0, 1.0], 'strength': [0.5, 1.5]},
                 'embossratio': 0.1,
                 'sharpen': False,
                 'sharpenratio': 0.3},
 'crop_pad': 0,
 'cropratio': 0.4,
 'dataset': 'training-datasets\\iteration-0\\UnaugmentedDataSet_TESTSep11\\TEST_Alex80shuffle1.mat',
 'dataset_type': 'default',
 'decay_steps': 30000,
 'deterministic': False,
 'display_iters': 2,
 'fg_fraction': 0.25,
 'global_scale': 0.8,
 'init_weights': 'C:\\App\\anaconda3\\envs\\DEEPLABCUT\\lib\\site-packages\\deeplabcut\\pose_estimation_tensorflow\\models\\pretrained\\efficientnet-b0\\model.ckpt',
 'intermediate_supervision': False,
 'intermediate_supervision_layer': 12,
 'location_refinement': True,
 'locref_huber_loss': True,
 'locref_loss_weight': 0.05,
 'locref_stdev': 7.2801,
 'log_dir': 'log',
 'lr_init': 0.0005,
 'max_input_size': 1500,
 'mean_pixel': [123.68, 116.779, 103.939],
 'metadataset': 'training-datasets\\iteration-0\\UnaugmentedDataSet_TESTSep11\\Documentation_data-TEST_80shuffle1.pickle',
 'min_input_size': 64,
 'mirror': False,
 'multi_stage': False,
 'multi_step': [[0.001, 5]],
 'net_type': 'efficientnet-b0',
 'num_joints': 4,
 'optimizer': 'sgd',
 'pairwise_huber_loss': False,
 'pairwise_predict': False,
 'partaffinityfield_predict': False,
 'pos_dist_thresh': 17,
 'project_path': 'C:\\works\\DLC\\DLC_script\\DeepLabCut-master\\DeepLabCut-master\\examples\\TEST-Alex-2022-09-11',
 'regularize': False,
 'rotation': 25,
 'rotratio': 0.4,
 'save_iters': 5,
 'scale_jitter_lo': 0.5,
 'scale_jitter_up': 1.25,
 'scoremap_dir': 'test',
 'shuffle': True,
 'snapshot_prefix': 'C:\\works\\DLC\\DLC_script\\DeepLabCut-master\\DeepLabCut-master\\examples\\TEST-Alex-2022-09-11\\dlc-models\\iteration-0\\TESTSep11-trainset80shuffle1\\train\\snapshot',
 'stride': 8.0,
 'weigh_negatives': False,
 'weigh_only_present_joints': False,
 'weigh_part_predictions': False,
 'weight_decay': 0.0001}
Batch Size is 1
C:\App\anaconda3\envs\DEEPLABCUT\lib\site-packages\tensorflow\python\keras\engine\base_layer_v1.py:1694: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  warnings.warn('`layer.apply` is deprecated and '
2022-09-11 20:18:35.578698: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-11 20:18:35.897924: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-09-11 20:18:35.898044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9616 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6
Loading ImageNet-pretrained efficientnet-b0
2022-09-11 20:18:36.209153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9616 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6
Switching to cosine decay schedule with adam!
Exception in thread Thread-2:
Traceback (most recent call last):
  File "C:\App\anaconda3\envs\DEEPLABCUT\lib\threading.py", line 932, in _bootstrap_inner
  File "C:\App\anaconda3\envs\DEEPLABCUT\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\App\anaconda3\envs\DEEPLABCUT\lib\site-packages\deeplabcut\pose_estimation_tensorflow\core\train.py", line 81, in load_and_enqueue
    batch_np = dataset.next_batch()
  File "C:\App\anaconda3\envs\DEEPLABCUT\lib\site-packages\deeplabcut\pose_estimation_tensorflow\datasets\pose_imgaug.py", line 404, in next_batch
    scmap_update = self.get_scmap_update(
  File "C:\App\anaconda3\envs\DEEPLABCUT\lib\site-packages\deeplabcut\pose_estimation_tensorflow\datasets\pose_imgaug.py", line 361, in get_scmap_update
    ) = self.compute_target_part_scoremap_numpy(
  File "C:\App\anaconda3\envs\DEEPLABCUT\lib\site-packages\deeplabcut\pose_estimation_tensorflow\datasets\pose_imgaug.py", line 498, in compute_target_part_scoremap_numpy
    j_x = np.asscalar(joint_pt[0])
  File "C:\App\anaconda3\envs\DEEPLABCUT\lib\site-packages\numpy\__init__.py", line 311, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'asscalar'
2022-09-11 20:18:38.298353: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
Training parameter:
{'stride': 8.0, 'weigh_part_predictions': False, 'weigh_negatives': False, 'fg_fraction': 0.25, 'mean_pixel': [123.68, 116.779, 103.939], 'shuffle': True, 'snapshot_prefix': 'C:\\works\\DLC\\DLC_script\\DeepLabCut-master\\DeepLabCut-master\\examples\\TEST-Alex-2022-09-11\\dlc-models\\iteration-0\\TESTSep11-trainset80shuffle1\\train\\snapshot', 'log_dir': 'log', 'global_scale': 0.8, 'location_refinement': True, 'locref_stdev': 7.2801, 'locref_loss_weight': 0.05, 'locref_huber_loss': True, 'optimizer': 'adam', 'intermediate_supervision': False, 'intermediate_supervision_layer': 12, 'regularize': False, 'weight_decay': 0.0001, 'crop_pad': 0, 'scoremap_dir': 'test', 'batch_size': 1, 'dataset_type': 'default', 'deterministic': False, 'mirror': False, 'pairwise_huber_loss': False, 'weigh_only_present_joints': False, 'partaffinityfield_predict': False, 'pairwise_predict': False, 'all_joints': [[0], [1], [2], [3]], 'all_joints_names': ['bodypart1', 'bodypart2', 'bodypart3', 'objectA'], 'alpha_r': 0.02, 'apply_prob': 0.5, 'contrast': {'clahe': True, 'claheratio': 0.1, 'histeq': True, 'histeqratio': 0.1, 'gamma': False, 'sigmoid': False, 'log': False, 'linear': False}, 'convolution': {'edge': False, 'emboss': {'alpha': [0.0, 1.0], 'strength': [0.5, 1.5]}, 'embossratio': 0.1, 'sharpen': False, 'sharpenratio': 0.3}, 'cropratio': 0.4, 'dataset': 'training-datasets\\iteration-0\\UnaugmentedDataSet_TESTSep11\\TEST_Alex80shuffle1.mat', 'decay_steps': 30000, 'display_iters': 2, 'init_weights': 'C:\\App\\anaconda3\\envs\\DEEPLABCUT\\lib\\site-packages\\deeplabcut\\pose_estimation_tensorflow\\models\\pretrained\\efficientnet-b0\\model.ckpt', 'lr_init': 0.0005, 'max_input_size': 1500, 'metadataset': 'training-datasets\\iteration-0\\UnaugmentedDataSet_TESTSep11\\Documentation_data-TEST_80shuffle1.pickle', 'min_input_size': 64, 'multi_stage': False, 'multi_step': [[0.001, 5]], 'net_type': 'efficientnet-b0', 'num_joints': 4, 'pos_dist_thresh': 17, 'project_path': 'C:\\works\\DLC\\DLC_script\\DeepLabCut-master\\DeepLabCut-master\\examples\\TEST-Alex-2022-09-11', 'rotation': 25, 'rotratio': 0.4, 'save_iters': 5, 'scale_jitter_lo': 0.5, 'scale_jitter_up': 1.25, 'covering': True, 'elastic_transform': True, 'motion_blur': True, 'motion_blur_params': {'k': 7, 'angle': (-90, 90)}, 'use_batch_norm': False, 'use_drop_out': False}
Starting training....

CodePudding user response:

I suspect two issues here:

  1. How long did you wait for? My setup has weaker hardware than yours and took almost 8 minutes before the first iterations showed.
  2. Your error message clearly shows that np.asscalar isn't found. Your numpy version is 1.23.3, but np.asscalar is deprecated since 1.16. Maybe try downgrading (pip install numpy==1.15 / conda install numpy==1.15) and see if the error persists.

Edit: I just checked the config file supplied by DLC and verified that no numpy version is specified. You should probably downgrade to a version <1.16 since np.asscalar is used.

CodePudding user response:

The numpy.asscalar() method was finally removed in NumPy 1.23 (see Release Notes) after being deprecated since v1.16. I added an Issue to the repository. Unless you want to send in a Pull Request to fix it, downgrade the Numpy to 1.22 or below.

conda install -n DEEPLABCUT 'numpy <1.23'

BTW, no one should be waiting for slow solves anymore - Mamba has been stable for a long time and solved this issue. Once installed, just use the word mamba instead of conda for most commands.

conda install -n base conda-forge::mamba

mamba install -n DEEPLABCUT 'numpy <1.23'


Alternatively, edit the YAML to include the upper bound on numpy:

  - numpy <1.23

and recreate the environment from the updated YAML.

