From f8db65e8d760d2bfe7ab8696104885d7d3646c23 Mon Sep 17 00:00:00 2001 From: Donato Capitella Date: Thu, 4 Sep 2025 17:22:18 +0100 Subject: [PATCH] Fixed typos due to copy/paste --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 2703035..87b9703 100644 --- a/README.md +++ b/README.md @@ -23,8 +23,8 @@ This setup is **highly experimental** on ROCm/Strix Halo. Some models work; **ma | `meta-llama/Llama-2-7b-chat-hf` | 7B FP16 | ✅ Works | (recommended) `--dtype float16` | Stable. | | `Qwen/Qwen3-30B-A3B-Instruct-2507` | 30B (A3B) FP16 | ✅ Works | (recommended) `--dtype float16` | | | `Google/Gemma3-27B-Instruct` | 27B FP16 | ✅ Works | (recommended) `--dtype float16` | Slow | -| `Google/Gemma3-12B-Instruct` | 12B FP16 | ✅ Works | (recommended) `--dtype float16` | Slow | -| `Google/Gemma3-4B-Instruct` |4B FP16 | ✅ Works | (recommended) `--dtype float16` | Slow | +| `Google/Gemma3-12B-Instruct` | 12B FP16 | ✅ Works | (recommended) `--dtype float16` | | +| `Google/Gemma3-4B-Instruct` |4B FP16 | ✅ Works | (recommended) `--dtype float16` | | | `Qwen/Qwen3-14B-AWQ` | 14B AWQ | ✅ Works (with flags) | `--quantization awq --dtype float16 --enforce-eager` | On ROCm, eager avoids missing `awq_dequantize` during compile; vLLM auto‑sets `VLLM_USE_TRITON_AWQ`. | | `openai/gpt-oss-20b` | 20B MXFP4 | ❌ Fails | — | `ModuleNotFoundError: triton_kernels.matmul_ogs` (MXFP4 path not available in this image). | | `zai-org/GLM-4.5-Air-FP8` | FP8 | ❌ Fails | — | `ValueError: type fp8e4nv not supported (only 'fp8e5')`. |