amd-strix-halo-vllm-toolboxes

Author	SHA1	Message	Date
Donato Capitella	f968cb1f30	most of the time spent by devs is to ensure there is no standard way of passing flags - I have no idea why	2026-02-23 12:08:57 +00:00
Donato Capitella	fedfa3c682	Trying fix for ROCm/llvm loop unrolling bug, to see if performance improves on custom complied kernels	2026-02-23 11:43:44 +00:00
Donato Capitella	13c5a929a3	feat: refactor vLLM Strix Halo patching into a dedicated script	2026-02-23 10:33:20 +00:00
Donato Capitella	5a7f0cc676	feat: Implement temporary patch for C10_CHECK macro import missing	2026-02-23 09:49:42 +00:00
Donato Capitella	b3fcb0091f	feat: Enhance `find_max_context.py` with Ray cluster support and fix `C10_HIP_CHECK` build error in Dockerfile.	2026-02-23 09:11:30 +00:00
Donato Capitella	726cd5ae53	remove clang patch	2026-02-18 15:23:02 +00:00
Donato Capitella	290beffb05	feat: Enhance quantization support for MoE layers with new FP8/INT8 configs and model-specific optimizations across various devices.	2026-02-12 11:10:28 +00:00
Donato Capitella	6754095398	feat: Introduce `measure_bandwidth.sh` script, install `perfquery`, and add the script to the Docker image for RDMA bandwidth monitoring.	2026-02-07 10:40:53 +00:00
Donato Capitella	6f118ff936	feat: Update ROCm benchmark result paths, improve cluster node discovery and cache clearing, and refine cluster benchmark result directory.	2026-02-02 07:35:50 +00:00
Donato Capitella	c587981d73	refactor: Centralize Ray/vLLM cluster management into a new `cluster_manager.py` module and refactor `start_vllm_cluster.py` to use it.	2026-02-01 22:19:34 +00:00
Donato Capitella	ba503f6e61	feat: centralize model configurations and benchmark settings into a new `models.py` module and update Dockerfile and scripts to use it.	2026-02-01 21:17:15 +00:00
Donato Capitella	a1105a0b96	feat: Enhance vLLM benchmarking to compare Triton and ROCm attention, introduce a new script for cluster configuration, and update Dockerfile for new tools and dependencies.	2026-02-01 19:36:07 +00:00
Donato Capitella	e5cc96bf48	feat: Introduce vLLM cluster benchmarking and setup scripts, and expand the list of models for local benchmarks.	2026-02-01 15:43:56 +00:00
Donato Capitella	b10aa50745	feat: Modularize Dockerfile dependency and ROCm SDK installations into dedicated scripts and add a GitHub Actions workflow to build and consume a custom RCCL library.	2026-02-01 14:50:37 +00:00
Donato Capitella	a8added616	feat: Introduce custom RCCL library management for gfx1151, including build scripts, Docker integration, and VLLM benchmarks.	2026-02-01 13:23:10 +00:00
Donato Capitella	36424706ee	added troubleshooting steps for RDMA	2026-01-31 14:37:46 +00:00
Donato Capitella	8ebd432ac6	adding patch dependency	2026-01-31 12:43:42 +00:00
Donato Capitella	57b592b912	added dependecies for RDMA/way	2026-01-30 14:47:09 +00:00
Donato Capitella	3b0e736c94	feat: Implement dynamic model discovery from benchmark results, add benchmark notes, and include `dialog` dependency.	2025-12-20 12:31:20 +00:00
Donato Capitella	5e8b6bb545	updates	2025-12-20 11:37:06 +00:00
Donato Capitella	69f869ae41	restore staging	2025-12-19 08:06:51 +00:00
Donato Capitella	2b48cae736	feat: Update Dockerfile with pgrep and PyTorch nightly URL.	2025-12-19 07:45:07 +00:00
Donato Capitella	f91dc685ad	add bits and bytes	2025-12-18 08:56:14 +00:00
Donato Capitella	b8678b08ba	Installing flash_attn, as this is now neded by vLLM	2025-11-30 17:49:29 +00:00
Donato Capitella	30bd06b1bd	more dockerfile AI SLOP	2025-11-30 15:45:48 +00:00
Donato Capitella	c9cc843787	fix	2025-11-30 15:41:01 +00:00
Donato Capitella	52814ef9a2	fixing Dockerfile	2025-11-30 15:37:12 +00:00
Donato Capitella	1fe0b82853	updated Dockerfile	2025-11-30 15:29:02 +00:00
Donato Capitella	74a2e5254a	Updating toolbox and pushing GitHub Action	2025-11-30 14:57:37 +00:00

29 Commits