exozyme

Table of Contents

Compute kernels
PyTorch
Text generation
Image generation
Monitor GPU usage

You can run general purpose GPU computations using our AMD Radeon 6600XT graphics card and the ROCm toolchain. For the sake of disk space, please use our existing llama.cpp and Stable Diffusion instances if possible. Also, please don't leave those instances running when not in use since they keep models loaded in VRAM. In general, be careful when running ML web UIs, especially ones using TCP ports instead of Unix sockets, because some can execute arbitrary code under your account.

Our GPU is technically unsupported by ROCm, but don't worry! Try running the program with the environment variable HSA_OVERRIDE_GFX_VERSION=10.3.0, which tricks ROCm into thinking it's one of the supported GPUs.

Compute kernels

To write your own compute kernels in HIP and C++, check out this example repo. You should also read up on the GPU execution model if you're not familiar with GPU programming. The HIP compiler is located at /opt/rocm/bin/hipcc.

PyTorch

You can install PyTorch for ROCm using these instructions. Then, use the environment variable trick mentioned above.

Text generation

We have a llama.cpp instance at /opt/llama.cpp. To make it answer a single prompt, run HSA_OVERRIDE_GFX_VERSION=10.3.0 ./llama-cli -ngl 100 -p "Your prompt here". This loads the model to VRAM each time and is not interactive, so it's recommended to run it in server mode with rm ~/.cache/llama.sock; HSA_OVERRIDE_GFX_VERSION=10.3.0 ./llama-server -ngl 100 --host unix://$HOME/.cache/llama.sock. Then, forward that Unix socket to your computer using SSH. You may get better performance for long prompts by adding -fa to enable flash attention.

Image generation

We also have a Stable Diffusion web UI instance. Go to /opt/stable-diffusion-webui and run the server with ./webui.sh. The server uses the Unix socket ~/.cache/sd.sock which you can forward over SSH to your computer.

Monitor GPU usage

Use the command /opt/rocm/bin/rocm-smi or btop (press 5 to show detailed GPU usage information).