amd-strix-halo-vllm-toolboxes

Auteur	SHA1	Bericht	Datum
Donato Capitella	1af159af81	removing llvm flags as they have no impact on performance	2026-02-24 08:27:57 +00:00
Donato Capitella	e726d406fa	updated benchmarks, fix start-vllm	2026-02-23 19:39:19 +00:00
Donato Capitella	e0fadf426b	force egaer mode to make gemma stable	2026-02-23 18:19:15 +00:00
Donato Capitella	f968cb1f30	most of the time spent by devs is to ensure there is no standard way of passing flags - I have no idea why	2026-02-23 12:08:57 +00:00
Donato Capitella	fedfa3c682	Trying fix for ROCm/llvm loop unrolling bug, to see if performance improves on custom complied kernels	2026-02-23 11:43:44 +00:00
Donato Capitella	13c5a929a3	feat: refactor vLLM Strix Halo patching into a dedicated script	2026-02-23 10:33:20 +00:00
Donato Capitella	5a7f0cc676	feat: Implement temporary patch for C10_CHECK macro import missing	2026-02-23 09:49:42 +00:00
Donato Capitella	b3fcb0091f	feat: Enhance `find_max_context.py` with Ray cluster support and fix `C10_HIP_CHECK` build error in Dockerfile.	2026-02-23 09:11:30 +00:00
Donato Capitella	91b6dbc270	feat: Display environment variables and allow to choose between RoCE/Ethernet and show RCCL debug information	2026-02-22 20:07:34 +00:00
Donato Capitella	4a5d6c7855	fix broken stuff	2026-02-19 20:29:28 +00:00
Donato Capitella	726cd5ae53	remove clang patch	2026-02-18 15:23:02 +00:00
Donato Capitella	49b85fc1fb	add MiniMax	2026-02-18 15:22:12 +00:00
Donato Capitella	290beffb05	feat: Enhance quantization support for MoE layers with new FP8/INT8 configs and model-specific optimizations across various devices.	2026-02-12 11:10:28 +00:00
Donato Capitella	6754095398	feat: Introduce `measure_bandwidth.sh` script, install `perfquery`, and add the script to the Docker image for RDMA bandwidth monitoring.	2026-02-07 10:40:53 +00:00
Donato Capitella	9cf7eaeab2	fix: Correct 'buy me a coffee' URL in README.	2026-02-06 06:56:26 +00:00
Donato Capitella	c3ecb9bbd5	feat: add project context and support sections to README.	2026-02-05 17:55:30 +00:00
Donato Capitella	afe985afca	added images to RDMA guide	2026-02-03 19:47:42 +00:00
Donato Capitella	a2f2156c11	docs: Add a new section for references and acknowledgements.	2026-02-03 12:08:47 +00:00
Donato Capitella	90c5fe9f83	docs: Standardize Fedora OS version references and update IOMMU kernel parameter from `amd_iommu=off` to `iommu=pt` in documentation.	2026-02-03 08:34:56 +00:00
Donato Capitella	fde8f520d9	feat: Update benchmark results across various models and configurations, increasing `num_requests` from 100 to 200.	2026-02-03 08:31:54 +00:00
Donato Capitella	b03a444c91	feat: Extract benchmark output file path generation into a helper function and add checks to skip runs if results already exist.	2026-02-03 08:28:21 +00:00
Donato Capitella	8ff52abf4e	perf: Increase `max_num_seqs` for bus batch scaling and `OFF_NUM_PROMPTS` for steady-state throughput measurement on Strix Halo.	2026-02-02 22:36:15 +00:00
Donato Capitella	693757f5d9	feat: Add script to automate README benchmark table generation and update max context benchmarks with new models and a kernel parameter change.	2026-02-02 22:32:12 +00:00
Donato Capitella	4d3b046870	feat: Add new benchmark results for various models and configurations, and update documentation UI with filtering for attention and tensor parallelism.	2026-02-02 21:30:17 +00:00
Donato Capitella	a412c6bea3	build: Ignore `__pycache__/` directories.	2026-02-02 19:39:21 +00:00
Donato Capitella	1f96c391fb	feat: Add comprehensive RDMA cluster setup guide, enforce eager mode in cluster benchmarks, and update documentation with cluster details.	2026-02-02 19:34:33 +00:00
Donato Capitella	1ddcb9a202	feat: Configure ROCm attention via `--attention-backend` CLI argument, disable the Ray dashboard, and make eager mode configurable for cluster benchmarks.	2026-02-02 15:40:16 +00:00
Donato Capitella	9c6d32e326	updating max context results	2026-02-02 11:56:26 +00:00
Donato Capitella	0109e6a19b	feat: Optimize model `max_num_seqs` and global benchmark parameters for Strix Halo, and centralize configurations in `models.py`.	2026-02-02 08:45:13 +00:00
Donato Capitella	6f118ff936	feat: Update ROCm benchmark result paths, improve cluster node discovery and cache clearing, and refine cluster benchmark result directory.	2026-02-02 07:35:50 +00:00
Donato Capitella	c587981d73	refactor: Centralize Ray/vLLM cluster management into a new `cluster_manager.py` module and refactor `start_vllm_cluster.py` to use it.	2026-02-01 22:19:34 +00:00
Donato Capitella	128ddade14	fix: improve RDMA stability by configuring NCCL IB timeout and retry count.	2026-02-01 22:04:34 +00:00
Donato Capitella	b458b287d0	docs: update quickstart to recommend `refresh_toolbox.sh` for toolbox creation and detail its InfiniBand/RDMA detection capabilities.	2026-02-01 21:55:46 +00:00
Donato Capitella	0d8afba093	feat: Add `RAY_DISABLE_METRICS=1` to disable Ray metrics across cluster configurations and scripts.	2026-02-01 21:52:48 +00:00
Donato Capitella	965cd2c339	feat: Improve Ray node detection, enable cluster-wide vLLM cache clearing, and enforce eager mode for benchmarks.	2026-02-01 21:35:27 +00:00
Donato Capitella	ba503f6e61	feat: centralize model configurations and benchmark settings into a new `models.py` module and update Dockerfile and scripts to use it.	2026-02-01 21:17:15 +00:00
Donato Capitella	4b09188776	feat: add `refresh_toolbox.sh` script to automate creation and refresh of the vLLM Podman toolbox.	2026-02-01 20:44:54 +00:00
Donato Capitella	a1105a0b96	feat: Enhance vLLM benchmarking to compare Triton and ROCm attention, introduce a new script for cluster configuration, and update Dockerfile for new tools and dependencies.	2026-02-01 19:36:07 +00:00
Donato Capitella	e5cc96bf48	feat: Introduce vLLM cluster benchmarking and setup scripts, and expand the list of models for local benchmarks.	2026-02-01 15:43:56 +00:00
Donato Capitella	47bf7daba3	feat: add input to specify RCCL artifact run ID for download in build-and-publish workflow	2026-02-01 14:58:10 +00:00
Donato Capitella	b10aa50745	feat: Modularize Dockerfile dependency and ROCm SDK installations into dedicated scripts and add a GitHub Actions workflow to build and consume a custom RCCL library.	2026-02-01 14:50:37 +00:00
Donato Capitella	a8added616	feat: Introduce custom RCCL library management for gfx1151, including build scripts, Docker integration, and VLLM benchmarks.	2026-02-01 13:23:10 +00:00
Donato Capitella	13caab0634	typos	2026-01-31 14:39:04 +00:00
Donato Capitella	36424706ee	added troubleshooting steps for RDMA	2026-01-31 14:37:46 +00:00
Donato Capitella	8ebd432ac6	adding patch dependency	2026-01-31 12:43:42 +00:00
Donato Capitella	57b592b912	added dependecies for RDMA/way	2026-01-30 14:47:09 +00:00
Donato Capitella	039484a41e	Updated name of card	2025-12-24 08:13:34 +00:00
Donato Capitella	255c167734	fix	2025-12-22 16:40:44 +00:00
Donato Capitella	bc7c8e271b	updated table with host configuration	2025-12-22 16:40:25 +00:00
Donato Capitella	86eac2889b	docs: Update README to specify Fedora 43	2025-12-21 09:55:31 +00:00

1 2

76 Commits