Fixed Docker/Podman commands

This commit is contained in:
Donato Capitella
2025-09-04 15:02:00 +01:00
والد fb54a2a9b9
کامیت 8ee405f07e
+46 -6
مشاهده پرونده
@@ -112,9 +112,7 @@ You should receive a JSON response with a `choices[0].message.content` reply.
If you don't want to bother specifying the model name, you can run this which will query the currently deployed model:
```bash
MODEL=$(curl -s http://localhost:8000/v1/models | jq -r '.data[0].id')
curl -X POST http://localhost:8000/v1/chat/completions \
MODEL=$(curl -s http://localhost:8000/v1/models | jq -r '.data[0].id') curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{
\"model\": \"$MODEL\",
@@ -128,10 +126,11 @@ curl -X POST http://localhost:8000/v1/chat/completions \
Prefer this for persistent services. **Always mount a host directory for weights** so they live outside the container. If the model isn't present, vLLM will fetch it from **Hugging Face** into the mapped directory.
**Qwen2.5 7B Instruct**
```bash
podman run \
-d \
--name vllm \
podman run -d --name vllm-qwen2p5-7b \
--ipc=host \
--network host \
--device /dev/kfd \
--device /dev/dri \
@@ -148,6 +147,47 @@ podman run \
> Not using `--network host`? Map a port instead: `-p 8000:8000`.
For other models, you can try:
**Qwen3 30B A3B Instruct (2507)**
```bash
podman run -d --name vllm-qwen3-30b-a3b \
--ipc=host \
--network host \
--device /dev/kfd \
--device /dev/dri \
--group-add video \
--group-add render \
-v ~/vllm-models:/models \
-v ~/.cache/vllm:/root/.cache/vllm \
docker.io/kyuz0/vllm-therock-gfx1151-aotriton:latest \
bash -lc 'source /torch-therock/.venv/bin/activate; \
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 \
vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507 --dtype float16 \
--host 0.0.0.0 --port 8000 --download-dir /models'
```
**Qwen3 14B AWQ** *(requires extra flags on ROCm)*
```bash
podman run -d --name vllm-qwen3-14b-awq \
--ipc=host \
--network host \
--device /dev/kfd \
--device /dev/dri \
--group-add video \
--group-add render \
-v ~/vllm-models:/models \
-v ~/.cache/vllm:/root/.cache/vllm \
docker.io/kyuz0/vllm-therock-gfx1151-aotriton:latest \
bash -lc 'source /torch-therock/.venv/bin/activate; \
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 \
vllm serve Qwen/Qwen3-14B-AWQ --quantization awq --dtype float16 --enforce-eager \
--host 0.0.0.0 --port 8000 --download-dir /models'
```
---
## 5) Models, dtypes & storage