NVIDIA is warning users to activate the system level error-reforming code mitigation to protect the rowhammer attacks on graphical processors with GDDR6 memory.
The company is strengthening the recommendation as new research displays an Rowhammer attack against an NVidia A6000 GPU (graphical processing unit).
Rowhammer is a hardware mistake that can be triggered through software procedures and can be very close to each other stems from memory cells. The attack on DRAM cells was demonstrated, but it can also affect GPU memory.
It works by reaching a memory row with adequate reed-light operations, which causes the value of adjacent data bits to flip from one to one and vice versa, which changes in-memory information.
The effect may increase a refusal-service condition, data corruption, or even privilege.
System level error-corrected code (ECC) can preserve the integer of data by adding fruitless bits to maintain the reliability and accuracy of data and correcting single-bit errors.
In the Workstation and Data Center GPU where VRAM handles the exact calculation related to large dataset and AI workload, ECC must be able to prevent important errors in their operation.
NVIDIA security notices notices that researchers at the University of Toronto showed “a potential Rohemmer attack against a NVidia A6000 GPU with GDDR6 memory, where the system-level ECC was not able to show.
Academic researchers developed GPUhammer, which is an attack method for flipping bits on GPU memories.
Although harmning on GDDR6 is difficult due to high delays and rapid refreshing than CPU-based DDR4, researchers were able to display it. Rowhammer attacks GPU Memory bank is possible.
In addition to RTX A6000, GPU manufacturer also recommended Enabling system-level ECC for the following products:
Data Center GPUS:
- Ampere: A100, A40, A30, A16, A10, A2, A800
- ADA: L40S, L40, L4
- Hopper: H100, H200, GH200, H20, H800
- Blackwell: GB200, B200, B100
- Turing: T1000, T600, T400, T4
- Volta: Tesla V100, Tesla V100s
Workstation GPU:
- Ampere RTX: A6000, A5000, A4500, A4000, A2000, A1000, A400
- ADA RTX: 6000, 5000, 4500, 4000, 4000 SFF, 2000
- Blackwell RTX Pro (Latest Workstation Line)
- Turing RTX: 8000, 6000, 5000, 4000
- Volta: Quadro GV100
Embedded / Industrial:
- Jetson AGX Oin Industrial
- IGX Oin
The GPU manufacturer notes that the Blackwell RTX 50 series (GEFORCE), Blackwell Data Center GB200, B200, B100, and Hopper Data Centers H100, H200, H20 and GH200 like new GPU, underlying on-dye ECC security, which need neither user intervention.
Whether the system level ECC is capable or not is a way to check an out-of-band method that uses the BMC (Basboard Management Controller) and Hardware Interface Software of the system, such as the system, such as the system, such Redfish APITo examine the situation “eccmodeenabled”.
Tools such as NSM Type 3 and NVIDIA SMBPBI can also be used for configuration, although they require access to Nvidia partner portal.
A second in-band method is also present, where the system is supported to check and enable ECC using NVDia-SMI command-line utility from the CPU.
Rowhammer represents a real safety concern that can cause data corruption or enable attacks in multi-relative environments such as cloud servers where weak GPU can be deployed.
However, the actual risk is reference-dependent, and exploiting Rohmar is firmly complicated, requiring specific conditions, high reach rates, and accurate control, making it difficult to execute.