amd-strix-halo-vllm-toolboxes

Author	SHA1	Message	Date
Donato Capitella	16405e8943	config: Add VLLM_DISABLE_COMPILE_CACHE=1 to environment variables across VLLM scripts.	2026-03-09 14:07:43 +00:00
Donato Capitella	b035bcb482	updated benchmarks including thunderbolt and configuratuion guides	2026-02-25 10:48:42 +00:00
Donato Capitella	6875f62ccf	improve benchmarks	2026-02-25 09:29:46 +00:00
Donato Capitella	e726d406fa	updated benchmarks, fix start-vllm	2026-02-23 19:39:19 +00:00
Donato Capitella	91b6dbc270	feat: Display environment variables and allow to choose between RoCE/Ethernet and show RCCL debug information	2026-02-22 20:07:34 +00:00
Donato Capitella	fde8f520d9	feat: Update benchmark results across various models and configurations, increasing `num_requests` from 100 to 200.	2026-02-03 08:31:54 +00:00
Donato Capitella	b03a444c91	feat: Extract benchmark output file path generation into a helper function and add checks to skip runs if results already exist.	2026-02-03 08:28:21 +00:00
Donato Capitella	4d3b046870	feat: Add new benchmark results for various models and configurations, and update documentation UI with filtering for attention and tensor parallelism.	2026-02-02 21:30:17 +00:00
Donato Capitella	1f96c391fb	feat: Add comprehensive RDMA cluster setup guide, enforce eager mode in cluster benchmarks, and update documentation with cluster details.	2026-02-02 19:34:33 +00:00
Donato Capitella	1ddcb9a202	feat: Configure ROCm attention via `--attention-backend` CLI argument, disable the Ray dashboard, and make eager mode configurable for cluster benchmarks.	2026-02-02 15:40:16 +00:00
Donato Capitella	9c6d32e326	updating max context results	2026-02-02 11:56:26 +00:00
Donato Capitella	0109e6a19b	feat: Optimize model `max_num_seqs` and global benchmark parameters for Strix Halo, and centralize configurations in `models.py`.	2026-02-02 08:45:13 +00:00
Donato Capitella	6f118ff936	feat: Update ROCm benchmark result paths, improve cluster node discovery and cache clearing, and refine cluster benchmark result directory.	2026-02-02 07:35:50 +00:00
Donato Capitella	c587981d73	refactor: Centralize Ray/vLLM cluster management into a new `cluster_manager.py` module and refactor `start_vllm_cluster.py` to use it.	2026-02-01 22:19:34 +00:00
Donato Capitella	128ddade14	fix: improve RDMA stability by configuring NCCL IB timeout and retry count.	2026-02-01 22:04:34 +00:00
Donato Capitella	965cd2c339	feat: Improve Ray node detection, enable cluster-wide vLLM cache clearing, and enforce eager mode for benchmarks.	2026-02-01 21:35:27 +00:00
Donato Capitella	ba503f6e61	feat: centralize model configurations and benchmark settings into a new `models.py` module and update Dockerfile and scripts to use it.	2026-02-01 21:17:15 +00:00
Donato Capitella	a1105a0b96	feat: Enhance vLLM benchmarking to compare Triton and ROCm attention, introduce a new script for cluster configuration, and update Dockerfile for new tools and dependencies.	2026-02-01 19:36:07 +00:00
Donato Capitella	e5cc96bf48	feat: Introduce vLLM cluster benchmarking and setup scripts, and expand the list of models for local benchmarks.	2026-02-01 15:43:56 +00:00
Donato Capitella	711de530f6	added ROCm/Triton attention comparison	2025-12-20 11:49:03 +00:00
Donato Capitella	5e8b6bb545	updates	2025-12-20 11:37:06 +00:00

21 Commits