Background: NVIDIA GPU branding history
NVIDIA's GPU lineup has evolved its branding to reflect changes in the product and their market segments. The original lineup — GeForce, Titan, Quadro, and Tesla — has been streamlined into three product families: GeForce RTX, RTX PRO, and Data Center GPUs.
In the consumer segment, NVIDIA combined the GeForce and Titan lines under the GeForce RTX brand. Older GeForce cards like the GTX 1080 (Pascal) were aimed at gamers, while Titan cards like the Titan Xp offered more compute power for creators and researchers. With the launch of Turing GPUs (e.g., RTX 2080, RTX 2070), NVIDIA introduced RT and Tensor cores for ray tracing and AI acceleration, blending the use cases between the GeForce and Titan brands. This continued with Ampere (RTX 3080, RTX 3090) and Ada Lovelace (RTX 4090, RTX 4080), making high-end Titan cards largely redundant (and thus the branding was retired starting from the Ampere generation).
For professional workstations, the Quadro line of products (e.g. Quadro P4000, Quadro RTX 5000) covered GPUs for CAD, media, and design professionals With the Ampere generation, NVIDIA dropped the Quadro name in favor of RTX A-series cards like the RTX A6000 and RTX 6000 Ada. More recently, these have been unified under the RTX PRO label, emphasizing certified performance for industry applications.
In the data center, HPC, and AI space, NVIDIA's Tesla brand (e.g. Tesla V100, P100) served high-performance computing and machine learning (historical aside: Tesla was also the codename of the first microarchitecture generation where data center GPUs were released). Tesla was retired with the Ampere generation to reduce confusion with Tesla, Inc. and align branding. Now, GPUs like the A100 and H100 are part of the Data Center family, used in systems like DGX and HGX for large-scale AI and HPC workloads.
GeForce + Titan -> GeForce RTX
Quadro -> RTX A Series -> RTX PRO
Tesla -> Data Center
Background: Understanding the NVIDIA software stack
Firstly, an understanding of the NVIDIA software stack:
- The NVIDIA GPU drivers consist of two parts:
- Kernel-space driver (kernel module,
nvidia.ko
)- This is a Linux kernel module that interfaces directly with the GPU hardware.
- User-space driver libraries, including:
libcuda.so
: CUDA (Compute Unified Device Architecture) Driver librarylibnvidia-ml.so
: The NVIDIA Management Library (NVML) provides telemetry (e.g. temperature, clock speeds) and power-management controlslibGLX_nvidia.so
,libEGL_nvidia.so
, etc.: Linux graphics libraries
- Kernel-space driver (kernel module,
- CUDA toolkit: The CUDA toolkit builds on top of the CUDA Driver library and provides additional runtime and development libraries to make it easier to use the general purpose GPU compute (GPGPU) capabilities of the graphics card. Deep learning frameworks like PyTorch use the CUDA toolkit under the hood for GPU-accelerated computation. Example components provided by the CUDA toolkit include:
- CUDA Runtime (
libcudart.so
): High-level CUDA API used by most applications (cudaMalloc
,cudaMemcpy
, etc.) nvcc
: NVIDIA CUDA compilercuda-gdb
: CUDA-aware debuggernvprof
,nsight
: Profiling and performance tools- cuBLAS (Basic Linear Algebra Subprograms): Provides GPU accelerated linear algebra operations.
- CUDA Runtime (
Instructions
The following instructions have been tested on Ubuntu 22.04 (Jammy) and 24.04 (Noble) as of May 2025.
Even though the stack is quite straightforward, what gets confusing is that there are multiple ways to install the NVIDIA stack on Linux, and the methods are incompatible with each other, so mixing steps (by following different tutorials, each with incomplete or unclear instructions) results in a non-functional setup.
The most reliable way I have found to install the NVIDIA stack is to install the NVIDIA drivers using your operating system's official repositories (these have been further tested by the OS maintainers), and then use Mamba (in particular, Micromamba) to install the CUDA Toolkit. This way, different projects can use different versions of CUDA, as long as they are still supported by the same NVIDIA driver.
Step 1: Determine which driver version you should install
Check which version of the CUDA toolkit is needed by the software you will be using. Use that version number to determine the minimum version number of the NVIDIA driver you need by checking the CUDA toolkit release notes.
Currently, all versions of CUDA 12 (which, as of 2025, is the toolkit version used by the latest versions of PyTorch) are supported by NVIDIA drivers >=525.60.13
.
Step 2: Install the NVIDIA driver
NVIDIA provides two separate drivers: a GeForce (and Titan) driver, and a Data Center (Tesla) driver. Based on my understanding, at least for Linux only, the GeForce and Data Center drivers are identical. The only difference is found in the release timing and focus, with GeForce drivers released when a change affects GeForce GPUs, and similarly for the Data Center drivers. GeForce drivers are released either on the New Feature Branch (NFB) or the Production Branch (PB), which has support for 1 year. The Data Center drivers also have a Long Term Support Branch (LTSB), with a three year support timeline specifically for Data Center GPUs.
This means that if you are using GeForce card(s), you should install the GeForce variant of the driver. If you are working with workstation (Quadro/RTX PRO) or Data Center (Tesla) cards, you should install the Data Center variant of the driver.
We will install the NVIDIA drivers using the official Ubuntu repository. The Ubuntu maintainers do not provide every single version of the driver released by NVIDIA, but they do support the Production Branch (PB) and the Long Term Support Branch (LTSB).
There are at least three different repositories that provide NVIDIA driver packages:
- Official Ubuntu repositories: Maintained by the Ubuntu Core Developers team, it contains the stable releases of both the Data Center and GeForce drivers, including a headless (compute-only) metapackage and a combined (desktop and compute) metapackage. It also includes the CUDA toolkit, but this package is often out of date, so we will not be using it.
- Official NVIDIA repository: Maintained by NVIDIA, it contains the latest releases of (only) the Data Center drivers, with display-only, compute-only, and combined metapackages as well. This repository also contains packages for the CUDA toolkit.
ppa:graphics-drivers/ppa
: Maintained by the Ubuntu Graphics Drivers team, it contains the latest releases of the GeForce drivers. It does not include the CUDA toolkit, nor any headless version of the drivers.
The Ubuntu repository provides a series of metapackages designed to install only the relevant packages needed for your use case.
- The
nvidia-headless
metapackages will only install the kernel module and the compute (CUDA) driver, along with DKMS to automatically rebuild the kernel module whenever the Linux kernel is updated.- The
nvidia-headless-no-dkms
metapackage assumes that you will install a prebuilt kernel module (i.e. from thelinux-modules-nvidia
series of packages) that will provide thenvidia.ko
kernel driver, rather than rely on DKMS to automatically build the module whenever the kernel is updated. This variant is only recommended for environments where kernel upgrades are limited. Most users should NOT install thenvidia-headless-no-dkms
metapackage.
- The
- The
nvidia-driver
packages will additionally install the display drivers and libraries (OpenGL, X Server, Wayland, etc.) If you are using the display output of your GPU, install this version. - The
-server
variant of the metapackages are built from the Data Center drivers, whereas the packages without-server
are built from the GeForce drivers. - The
-open
variant of the metapackages use NVIDIA's open-source kernel module instead of their legacy, proprietary modules. These are recommended for anyone using a Turing (2018) generation or later GPU.
Note that the below install the DKMS variant of the NVIDIA kernel module, as it is much more convenient and suitable for most users.
# Pick:
# 535 for Long Term Support Branch
# Note that in version 535, only Data Center GPUs
# are supported by the Open GPU kernels (nvidia.ko).
# Attempts to use Open GPU kernels on GeForce cards
# will not work (see output from `sudo dmesg | grep -i nvidia`)
# 570 for Production Branch
VERSION=570
# GeForce cards, compute drivers only, proprietary NVIDIA kernel module
sudo apt-get install nvidia-headless-$VERSION
# GeForce cards, compute and display drivers, proprietary NVIDIA kernel module
sudo apt-get install nvidia-driver-$VERSION
# GeForce cards, compute drivers only, open NVIDIA kernel module
sudo apt-get install nvidia-headless-$VERSION-open
# GeForce cards, compute and display drivers, open NVIDIA kernel module
sudo apt-get install nvidia-driver-$VERSION-open
# Data Center cards, compute drivers only, proprietary NVIDIA kernel module
sudo apt-get install nvidia-headless-$VERSION-server
# Data Center cards, compute and display drivers, proprietary NVIDIA kernel module
sudo apt-get install nvidia-driver-$VERSION-server
# Data Center cards, compute drivers only, open NVIDIA kernel module
sudo apt-get install nvidia-headless-$VERSION-server-open
# Data Center cards, compute and display drivers, open NVIDIA kernel module
sudo apt-get install nvidia-driver-$VERSION-server-open
# headless metapackages do not install nvidia-utils, which contains
# the nvidia-smi binary; we will install it manually
sudo apt-get install nvidia-utils-$VERSION
# or
sudo apt-get install nvidia-utils-$VERSION-server
For those using the non-DKMS variants, you must install the NVIDIA kernel module manually with one of the following packages:
# proprietary NVIDIA kernel module (nvidia.ko)
sudo apt-get install linux-modules-nvidia-$VERSION-$(uname -r)
# open NVIDIA kernel module (nvidia.ko)
sudo apt-get install linux-modules-nvidia-$VERSION-open-$(uname -r)
Step 3: Install the CUDA Toolkit
The CUDA Toolkit can be installed system-wide using your operating system's package manager (e.g. nvidia-cuda-toolkit
in Ubuntu) or by using NVIDIA's official CUDA repository. However, older deep learning software may need a CUDA version that is incompatible with your newer deep learning projects, so it is recommended to install the CUDA Toolkit on a per-project basis. We can do this using Mamba (specifically Micromamba).
PyTorch 2.7 (stable) has pre-built wheels compiled with CUDA 12.6, so we will install that particular version of CUDA.
# install micromamba
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
ENV_NAME=pytorch_project # change this to a relevant name
PYTHON_VERSION=3.12 # use a Python version compatible with PyTorch
# create new micromamba environment
micromamba create -n "$ENV_NAME" python="$PYTHON_VERSION"
# install CUDA toolkit into the environment
CUDA_VERSION=12.6
micromamba install -n "$ENV_NAME" conda-forge::cuda-toolkit="$CUDA_VERSION"
Debugging NVIDIA driver installations
Note: this section is still a work in progress.
Always reboot your system after installing the NVIDIA driver; many times when nvidia-smi
still doesn't appear to work it's because another driver (nouveau
) is already controlling the GPUs. A reboot forces all PCIe devices to be re-enumerated and drivers re-probed.
Other troubleshooting steps that may help:
- Make sure DKMS is working:
sudo dkms status
# you should see something like:
# nvidia/570.133.07, 6.8.0-60-generic, x86_64: installed
- Check that Nouveau is not loaded, and that the NVIDIA kernel module is loaded:
lsmod | grep -i nouveau
# should be empty
lsmod | grep -i nvidia
# should show something like:
# nvidia_uvm 2121728 0
# nvidia_drm 131072 0
# nvidia_modeset 1724416 1 nvidia_drm
# nvidia 11640832 2 nvidia_uvm,nvidia_modeset
# ecc 45056 2 ecdh_generic,nvidia
# video 77824 2 asus_wmi,nvidia_modeset
- Unload the Nouveau module:
sudo modprobe -r nouveau
- (Re)load the NVIDIA kernel module:
sudo modprobe nvidia