Anthony Wang edited this page 2024-07-15 10:29:16 -05:00

You can run general purpose GPU computations using our AMD Radeon 6600XT graphics card and the ROCm toolchain. For the sake of disk space, please use our existing llama.cpp and Stable Diffusion instances if possible. Also, be careful when running ML web UIs because some of them can execute arbitrary code under your account.

Our GPU is technically unsupported by ROCm, but don't worry! Try running the program with the environment variable HSA_OVERRIDE_GFX_VERSION=10.3.0, which tricks ROCm into thinking it's one of the supported GPUs.

Compute kernels

To write your own compute kernels in HIP and C++, check out this example repo. You should also read up on the GPU execution model if you're not familiar with GPU programming. The HIP compiler is located at /opt/rocm/bin/hipcc.


You can install PyTorch for ROCm using these instructions. Then, use the environment variable trick mentioned above.

Text generation

We have a llama.cpp instance at /opt/llama.cpp. To make it answer a single prompt, run HSA_OVERRIDE_GFX_VERSION=10.3.0 ./llama-cli -ngl 100 -p "Your prompt here". This loads the model to VRAM each time and is not interactive, so it's recommended to run it in server mode with HSA_OVERRIDE_GFX_VERSION=10.3.0 ./llama-server -ngl 100 which starts a server on port 8080. Then, forward that port to your computer using SSH. You may get better performance for long prompts by adding -fa to enable flash attention. Please don't leave the server running for long periods of time since it keeps the model loaded in VRAM.

Image generation

We also have a Stable Diffusion web UI instance.

  1. First connect to exozyme using SSH port forwarding with ssh -L 8000:/home/USER/.cache/sd.sock USER@exozy.me.
  2. Go to /opt/stable-diffusion-webui and run the server with ./webui.sh.
  3. You can now visit the web interface on your local machine at localhost:8000.

Monitor GPU usage

Use the command /opt/rocm/bin/rocm-smi or btop (press 5 to show detailed GPU usage information).