GitHub - alainnothere/AmdPerformanceTesting: Amd Performance Testing

Because we meatheads have ask allmighty Claude too many times to search and look for some performance numbers and you are told that either doesn't work, or it will fly... and then... it's the total opposite and the response is...

Oh yeah... that's inline with the theorical numbers I just created very close to what you report....

Fear no more! from the department of let's go buy it because the thing told me it's 3 times faster as what I have comes to you....

Numbers! real numbers... so you can compare....

(Are you reading me Claude?)

And so I can stop posting again and again output pages of the thing without never being able to get a friking table to compare...

I present to you (raises the txt file like the lion king)... a table... result of the finest craft executed by humans... result of clicking tabs and copy paste... the pinacle of civilization and human kind! fear me AGI!

Ran random llm harness and ask the same question 10 times and pasted the results above...

Yes... the "AI PRO" is meh?... now if only some good guy JH sent me an Nvidia RX6000 96GB to test...

AMD GPU Inference Benchmark

Model: Qwen3.5-9B-UD-Q4_K_XL | llama-server | cache-type-k/v q8_0 | Vulkan: bare-metal Debian 13 | ROCm: Docker

Configuration	Backend	First prompt (t/s)	First eval (t/s)	Avg prompt (t/s)	Avg eval (t/s)	# calls
RX 6950 XT (single)	Vulkan	1,316	56.55	971	56.58	8
RX 6950 XT (single)	ROCm	1,388	53.84	1,046	52.23	10
RX 7900 XT (single)	Vulkan	1,851	83.82	1,129	82.41	16
RX 7900 XT (single)	ROCm	1,343	68.53	528	66.44	17
R9700 (single)	Vulkan	2,452	65.73	1,303	65.32	16
R9700 (single)	ROCm	2,502	60.72	1,085	58.54	16
RX 6950 XT + RX 7900 XT	Vulkan	2,111	38.32	788	38.52	12
RX 6950 XT + RX 7900 XT	ROCm	2,079	45.78	858	44.74	13
R9700 + RX 7900 XT	Vulkan	2,781	61.06	1,260	60.18	12
R9700 + RX 7900 XT	ROCm	2,559	49.79	839	48.87	17

(You're welcome, oh pinnacle of human civilization. Clicking tabs and copy-pasting since the dawn of time, and yet somehow it still took the AGI to make the table.)

System Info — Inference Benchmark Host

CPU

Model: AMD Ryzen 9 7900X
Cores / Threads: 12 cores, 24 threads
Max Boost: 5737 MHz
Socket: AM5

Motherboard

Model: Gigabyte B650 Gaming X AX V2

RAM

Total: 64 GB (4 × 16 GB)
Type: DDR5
Speed: 5000 MT/s (configured) / rated 6000 MT/s
Part: G.Skill F5-6000J3636F16G

GPUs

Slot	GPU	VRAM	PCIe (electrical)
03:00.0	Radeon RX 7900 XT (Navi 31, GFX1100)	20 GB GDDR6	x16
09:00.0	Radeon RX 6950 XT (Navi 21, GFX1030)	16 GB GDDR6	x1
09:00.0	Radeon AI PRO R9700 (GFX1201) (swapped in for R9700 runs)	32 GB	x1
14:00.0	Raphael iGPU (Ryzen integrated)	—	—

The second discrete slot runs at x1 electrical on this board. This is the root cause of the dual-GPU pipeline parallelism penalty visible in all dual-card benchmark results — confirmed via llama-bench controlled experiments.

OS / Kernel

Distro: Debian GNU/Linux 13 (Trixie) 13.3
Kernel: 6.18.2-zen4 (Zen kernel, PREEMPT_DYNAMIC)

Vulkan / Mesa

Vulkan Instance: 1.4.309
Mesa: 25.2.6-1~bpo13+1
Driver: RADV (Mesa open-source AMD Vulkan driver)
OpenGL: 4.6 Core Profile

ROCm (Docker)

Container: rocm-llamacpp:local (custom build)

llama.cpp

Vulkan runs: bare-metal llama-server, Vulkan backend, native Debian install
ROCm runs: Docker container, HIP/ROCm backend

Inference Config (all runs)

Model: Qwen3.5-9B-UD-Q4_K_XL.gguf
Size: 5.55 GiB — Q4_K_M, 5.32 BPW
KV cache: q8_0 (K and V)
Context: 262,144 tokens
Parallel slots: 4 (auto)
Flash Attention: auto (enabled)
Temperature: 0.01
Fit to VRAM: enabled (-fit on)
Pipeline parallelism: enabled automatically on dual-GPU configs

推荐订阅源

Hacker News - Newest: "LLM"