Table of Contents
You can run general purpose GPU computations using our AMD Radeon 6600XT graphics card and the ROCm toolchain. For the sake of disk space, please use our existing llama.cpp and Stable Diffusion instances if possible. Also, please don't leave those instances running when not in use since they keep models loaded in VRAM. In general, be careful when running ML web UIs, especially ones using TCP ports instead of Unix sockets, because some can execute arbitrary code under your account.
Our GPU is technically unsupported by ROCm, but don't worry! Try running the program with the environment variable HSA_OVERRIDE_GFX_VERSION=10.3.0
, which tricks ROCm into thinking it's one of the supported GPUs.
Compute kernels
To write your own compute kernels in HIP and C++, check out this example repo. You should also read up on the GPU execution model if you're not familiar with GPU programming. The HIP compiler is located at /opt/rocm/bin/hipcc
.
PyTorch
You can install PyTorch for ROCm using these instructions. Then, use the environment variable trick mentioned above.
Text generation
We have a llama.cpp instance at /opt/llama.cpp
. To make it answer a single prompt, run HSA_OVERRIDE_GFX_VERSION=10.3.0 ./llama-cli -ngl 100 -p "Your prompt here"
. This loads the model to VRAM each time and is not interactive, so it's recommended to run it in server mode with rm ~/.cache/llama.sock; HSA_OVERRIDE_GFX_VERSION=10.3.0 ./llama-server -ngl 100 --host unix://$HOME/.cache/llama.sock
. Then, forward that Unix socket to your computer using SSH. You may get better performance for long prompts by adding -fa
to enable flash attention.
Image generation
We also have a Stable Diffusion web UI instance. Go to /opt/stable-diffusion-webui
and run the server with ./webui.sh
. The server uses the Unix socket ~/.cache/sd.sock
which you can forward over SSH to your computer.
Monitor GPU usage
Use the command /opt/rocm/bin/rocm-smi
or btop
(press 5 to show detailed GPU usage information).