Error in python scripts/train.py (Unit 3)

Hello,

I’m in Step 5 at Unit 3. When I run the following command:
python scripts/train.py --dataset ./data/train.tfrecord --val_dataset ./data/test.tfrecord --classes ./data/new_names.names --num_classes $num_new_classes --mode fit --transfer darknet --batch_size 16 --epochs 20 --weights ./checkpoints/yolov3.tf --weights_num_classes $num_coco_classes

I obtain the following errors:
2020-04-09 17:37:00.722949: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer.so.6’; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/home/user/catkin_ws/src/my_catkin_ws_python3/devel/lib:/home/user/.catkin_ws_python3/devel/lib:/home/simulations/public_sim_ws/devel/lib:/opt/ros/kinetic/lib:/opt/ros/kinetic/lib/x86_64-linux-gnu:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/usr/local/cuda-10.2/lib64:/home/user/.catkin_ws_python3/src/TensorRT-7.0.0.11/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu/gazebo-7/plugins
2020-04-09 17:37:00.723129: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer_plugin.so.6’; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/home/user/catkin_ws/src/my_catkin_ws_python3/devel/lib:/home/user/.catkin_ws_python3/devel/lib:/home/simulations/public_sim_ws/devel/lib:/opt/ros/kinetic/lib:/opt/ros/kinetic/lib/x86_64-linux-gnu:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/usr/local/cuda-10.2/lib64:/home/user/.catkin_ws_python3/src/TensorRT-7.0.0.11/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu/gazebo-7/plugins
2020-04-09 17:37:00.723170: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
list.remove(x): x not in list
Its already removed…/opt/ros/kinetic/lib/python2.7/dist-packages
2020-04-09 17:37:02.144095: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcuda.so.1’; dlerror: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short; LD_LIBRARY_PATH: /opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/home/user/catkin_ws/src/my_catkin_ws_python3/devel/lib:/home/user/.catkin_ws_python3/devel/lib:/home/simulations/public_sim_ws/devel/lib:/opt/ros/kinetic/lib:/opt/ros/kinetic/lib/x86_64-linux-gnu:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/opt/ros/kinetic/share/euslisp/jskeus/eus//Linux64/lib:/usr/local/cuda-10.2/lib64:/home/user/.catkin_ws_python3/src/TensorRT-7.0.0.11/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu/gazebo-7/plugins
2020-04-09 17:37:02.144152: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

Any clue of how to solve it?

Thanks
Juan

Hi,

These errors re normal in the setup in RobotIgniteAcademy. These are errors related to not having GPU enabled system for CUDA processing. The trining process should works as expected if you follow the notebook. Just tested it right now, and started the trainig epochs without any issue.

Please tell us if it doesnt and explain the issue with as much details as possible.