SpaCy 3 on a Google Cloud Compute Instance to train a NER Transformer Model

Here you will find a step by step guide (last tested and working July 2021) on how to install and use Spacy 3.0 (and Cupy) on a Google Cloud GPU powered instance. I wrote this article in order to spare others whole days testing and installing packages. I've already wasted them, why should you? ;)
I used this architecture to train a NER Transformer Model.
Softwares versions:
  • cuda v11.2
  • spacy v3.0
  • GCloud instance creation
    Create a virtual machine instance: is a google cloud virtual machine with this setup
    GPU machine
    Serie: A2
    Machine: a2-highgpu-1g
    GPU: 1 x NVIDIA Tesla a100
    Image: Debian GNU/Linux 10 (buster)
    WARNING: you must modify the standard disk space: 10gb are not enough (at least for my needs). I used 30gb.
    NVIDIA driver installation
    Connect via ssh to the created virtual machine, update the system and install some useful packages with these commands
    sudo apt-get update && sudo apt-get upgrade
    sudo apt-get -y install pciutils software-properties-common wget g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev
    Check if your gpu is cuda enabled. If not there is probably a problem with your architecture you need to investigate further.
    You should have at least one positive output.
    lspci | grep -i nvidia
    Let's clean eventually previous installation and packages:
    sudo apt-get purge nvidia*
    sudo apt remove nvidia-*
    sudo rm /etc/apt/sources.list.d/cuda*
    sudo apt-get autoremove && sudo apt-get autoclean
    sudo rm -rf /usr/local/cuda*
    gcc compiler is required for development using the cuda toolkit. to verify the version of gcc installed enter
    gcc --version
    if not present, install it
    sudo apt-get -y install gcc
    Install kernel headers needed by Nvidia drivers:
    sudo apt-get -y install linux-headers-4.19.0-16-cloud-amd64
    Now download and install the latest nvidia driver for Debian 10. This is the most up-to-date drivers at the time I'm writing this article: https://www.nvidia.com/Download/driverResults.aspx/173142/en-us. If you decide to install more up-to-date drivers (which I recommend) you'll also probably need to accordingly adjust something else from this guide.
    If you want to look for some other update / architectures: https://www.nvidia.com/Download/index.aspx?lang=en-us
    # download drivers
    wget https://us.download.nvidia.com/tesla/460.73.01/NVIDIA-Linux-x86_64-460.73.01.run
    # make it executable
    chmod u+x NVIDIA-Linux-x86_64-460.73.01.run
    # install the drivers
    sudo ./NVIDIA-Linux-x86_64-460.73.01.run
    When asked, do not install 32-bit compatibilty packages.
    Check that the drivers have been correctly installed with:
    nvidia-smi
    The ouput should be now something like this. If the command cannot find any GPU, there is something wrong (check for new drivers et similia) and continuing in this guide will be pointless:
    CUDA11.3 Toolkit installation
    Install NVIDIA CUDA 11.3 toolkit packages for Debian 10. For other installations (not considered in this article) please refer to this useful NVIDIA link: https://developer.nvidia.com/cuda-downloads
    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/7fa2af80.pub
    sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/ /"
    sudo add-apt-repository contrib
    sudo apt-get update
    sudo apt-get -y install cuda-11-2
    If asked to remove one NVIDIA package proceed with yes.
    Check that the drivers are still correctly installed with:
    nvidia-smi
    output should be like the previous one.
    Spacy installation
    We will now create a python virtualenv, install spacy and check if spacy can access the GPU.
    # install useful package
    sudo apt-get -y install python3-venv
    # creates venv
    python3 -m venv myvenv
    # activate it
    source myvenv/bin/activate
    # upgrade pip
    pip install --upgrade pip
    
    # install spacy
    pip install -U spacy
    # download the trf model
    python -m spacy download en_core_web_trf
    
    # install other pip packages and dependencies
    pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
    # point to the correct cuda folder
    export CUDA_PATH="/usr/local/cuda-11"
    # install spacy transformers info
    pip install -U spacy[cuda113,transformers]
    
    # and install the correct version of cupy
    # here more info: https://docs.cupy.dev/en/stable/install.html#installing-cupy
    pip install cupy-cuda113
    Test spacy and cupy: run python and the following commands
    python
    >>> import spacy
    >>> spacy.require_gpu()
    the output must be simply
    True
    Another test you can do to be absolutely sure everything is correctly installed, always inside a python console:
    >>> import cupy
    >>> a = cupy.zeros((1,1))
    this commands should give no output at all. If it does, it will probably be an explanatory error/exception.
    The end
    You are now ready and you can use your GPU inside spacy or any other systems using cupy.
    Feel free (and please do it) to reach me out for any error you may find or any question you may have.
    This article is also a gist here: https://gist.github.com/DavidGerva/86bba9a23e4376e4303d3ca02a422612
    References:
    This guide is an adaptation to my needs and "today" of this material I found online and I tested over and over again till this working solution: should work "as is".

    57

    This website collects cookies to deliver better user experience

    SpaCy 3 on a Google Cloud Compute Instance to train a NER Transformer Model