建议Ctrl+F，输入关键词查找该网页有没有你遇到的bug
或者看左侧的目录

下面所有问题以及解决方法都是本人亲自遇到，且根据解决方法成功解决的，如果仍然有问题欢迎各位评论区讨论。

版本问题

1. 使用不用版本的Pytorch训练的权重模型无法加载。

pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.

Bug原因: 训练和测试的torch版本不一致。训练的时候是1.x，测试的时候是1.m。
解决办法： 先在1.x版本下加载模型，然后在保存的时候设置_use_new_zipfile_serialization=False就行了。

1	torch.save(model.state_dict(), model_path, use_new_zipfile_serialization=False)

2. ValueError: numpy.ufunc size changed, may indicate binary incompatibility.

numpy版本过低导致，需要对numpy版本进行升级即可。

1
2
3

pip uninstall numpy
pip install numpy
pip install --upgrade numpy

3. maskrcnn训练提示：FutureWarning: Input image dtype is bool

参考博客：https://blog.csdn.net/qq_39483453/article/details/118598535

scikit-image=0.17.2 的版本存在的问题，修改scikit-image包版本为0.16.2

1	pip install -U scikit-image==0.16.2

4. CUDA 版本不匹配

问题日志：
The detected CUDA version (12.1) mismatches the version that was used to compile PyTorch (11.3). Please make sure to use the same CUDA versions.

解决方法：安装多版本CUDA

（1）下载CUDA

前往官网，下载低版本的cuda并安装

1 2	wget https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run sudo sh cuda_11.3.0_465.19.01_linux.run

取消安装cuda驱动Driver

Do you want to install a symbolic link at /usr/local/cuda? # 是否将安装目录通过软连接的方式 link 到 /usr/local/cuda
选择yes

（2）多版本切换

修改~/.bashrc中cuda的路径：打开~/.bashrc文件，在最后添加如下：（根据需要使用cuda版本修改版本号）：

1 2	export PATH="/usr/local/cuda-10.2/bin:$PATH" export LD_LIBRARY_PATH="/usr/lcoal/cuda-10.2/lib64:$LD_LIBRARY_PATH"

使用 source ~/.bashrc 个更新配置

使用nvcc --version检查，如果发现显示为刚刚安装的CUDA版本，说明安装成功。

环境配置/包安装

1. No module named ‘pycocotools‘

参考博客：超简单解决No module named ‘pycocotools‘

各版本pycocotools.whl文件

点击连接中任意一个版本，（我安装的是2.0.0），下载的时候一定要注意对应的python版本，cp36指的是python3.6，cp37和cp38同理。

下载之后，放入你喜欢的文件夹中，然后启动命令行，进入whl文件所在的目录，输入以下命令即可，注意install后面是你自己下载的whl文件全称

activate tensorflow
E:
cd E:/windows/Downloads
pip install pycocotools_windows-2.0-cp36-cp36m-win_amd64.whl

2. 成功解决AttributeError: ‘str‘ object has no attribute ‘decode‘

参考博客：https://blog.csdn.net/qq_41185868/article/details/82079079

1	pip install 'h5py<3.0.0' -i https://pypi.tuna.tsinghua.edu.cn/simple

3. 使用Google Colab时，将Tensorflow版本转换到1.x版本

参考博客：https://blog.csdn.net/qq_44262417/article/details/105222696

1	%tensorflow_version 1.x

4. 长时间卡在Successfully opened dynamic library libcudnn.so.7

原因是tensorflow版本与CUDA版本不匹配，我使用的是cuda11.6与tensorflow2.2.0。

经过查找，可以使用的组合包括CUDA11.2+Tensorflow2.5.0。

5. pip 安装包出现错误

错误1: ImportError: No module named pip._internal.cli.main

问题日志：

Traceback (most recent call last):
  File "/home/mahaofei/anaconda3/envs/linemod/bin/pip", line 6, in <module>
    from pip._internal.cli.main import main
ImportError: No module named pip._internal.cli.main

解决方法：

原因是python2.7的环境安装时pip版本太旧，无法正常安装很多包。

1 2	python -m ensurepip pip install --upgrade pip

错误2: Could not find a version that satisfies the requirement

问题日志：

1 2	ERROR: Could not find a version that satisfies the requirement comm>=0.1.3 (from ipywidgets) (from versions: 0.0.1) ERROR: No matching distribution found for comm>=0.1.3 (from ipywidgets)

解决方法：

1	pip install package -i https://pypi.tuna.tsinghua.edu.cn/simple

如果是ipywidgets无法通过pip安装，则：

1	conda install -c conda-forge ipywidgets

数据格式不匹配

1. TypeError: expected str, bytes or os.PathLike object, not NoneType

出现这个问题多半是没有指定路径，上述问题翻译过来是，期望一个字符串或者字节路径，而不是默认值，出现这个问题需要把指定路径的变量赋值即可。这种错误多半出现在运行开源代码时出现。

检查代码中对应位置的字符串变量，是否传入了确定的值，而非None。

多GPU问题

1. ValueError: Memory growth cannot differ between GPU devices

多显卡问题。默认计算时使用所有的显卡，但是set_memory_growth只为了部分显卡设定，就造成了显卡模式不匹配的问题。比较简单的解决方法如下，在报错的代码之前添加：

1	os.environ['CUDA_VISIBLE_DEVICES']='0'

而且需要知道一点，那就是os.environ[‘CUDA_VISIBLE_DEVICES’]='0’这样的设置必须在调用TensorFlow框架之前声明否则不生效。