43 Commitit

Tekijä SHA1 Viesti Päivämäärä
Donato Capitella cf2fd6ec11 chore: remove fix_block_size.py script and its execution from the Dockerfile. 2026-03-14 13:18:56 +00:00
Donato Capitella b78e8a9d82 fix: Remove vLLM block size validation checks by adding and running a new patching script in the Dockerfile. 2026-03-13 16:29:01 +00:00
Donato Capitella 16405e8943 config: Add VLLM_DISABLE_COMPILE_CACHE=1 to environment variables across VLLM scripts. 2026-03-09 14:07:43 +00:00
Donato Capitella 8de950d9ca feat: Override _get_gcn_arch function to return "gfx1151" and rename the original implementation to _old_get_gcn_arch. 2026-03-09 12:13:27 +00:00
Donato Capitella fb0aef0864 Downgrade Python to 3.12 and remove the --no-deps flag from a pip install command in the Dockerfile. 2026-03-09 11:08:11 +00:00
Donato Capitella 8a20ec27b2 fixing https://github.com/kyuz0/amd-strix-halo-vllm-toolboxes/issues/21 2026-02-26 12:36:03 +00:00
Donato Capitella c27835d99f feat: Introduce v1 API structure, enhance quantization support, and expand model compatibility with various updates and new tests. 2026-02-25 11:50:23 +00:00
Donato Capitella 6875f62ccf improve benchmarks 2026-02-25 09:29:46 +00:00
Donato Capitella e726d406fa updated benchmarks, fix start-vllm 2026-02-23 19:39:19 +00:00
Donato Capitella e0fadf426b force egaer mode to make gemma stable 2026-02-23 18:19:15 +00:00
Donato Capitella 13c5a929a3 feat: refactor vLLM Strix Halo patching into a dedicated script 2026-02-23 10:33:20 +00:00
Donato Capitella 91b6dbc270 feat: Display environment variables and allow to choose between RoCE/Ethernet and show RCCL debug information 2026-02-22 20:07:34 +00:00
Donato Capitella 4a5d6c7855 fix broken stuff 2026-02-19 20:29:28 +00:00
Donato Capitella 49b85fc1fb add MiniMax 2026-02-18 15:22:12 +00:00
Donato Capitella 6754095398 feat: Introduce measure_bandwidth.sh script, install perfquery, and add the script to the Docker image for RDMA bandwidth monitoring. 2026-02-07 10:40:53 +00:00
Donato Capitella 8ff52abf4e perf: Increase max_num_seqs for bus batch scaling and OFF_NUM_PROMPTS for steady-state throughput measurement on Strix Halo. 2026-02-02 22:36:15 +00:00
Donato Capitella 693757f5d9 feat: Add script to automate README benchmark table generation and update max context benchmarks with new models and a kernel parameter change. 2026-02-02 22:32:12 +00:00
Donato Capitella 1f96c391fb feat: Add comprehensive RDMA cluster setup guide, enforce eager mode in cluster benchmarks, and update documentation with cluster details. 2026-02-02 19:34:33 +00:00
Donato Capitella 1ddcb9a202 feat: Configure ROCm attention via --attention-backend CLI argument, disable the Ray dashboard, and make eager mode configurable for cluster benchmarks. 2026-02-02 15:40:16 +00:00
Donato Capitella 0109e6a19b feat: Optimize model max_num_seqs and global benchmark parameters for Strix Halo, and centralize configurations in models.py. 2026-02-02 08:45:13 +00:00
Donato Capitella 6f118ff936 feat: Update ROCm benchmark result paths, improve cluster node discovery and cache clearing, and refine cluster benchmark result directory. 2026-02-02 07:35:50 +00:00
Donato Capitella c587981d73 refactor: Centralize Ray/vLLM cluster management into a new cluster_manager.py module and refactor start_vllm_cluster.py to use it. 2026-02-01 22:19:34 +00:00
Donato Capitella 128ddade14 fix: improve RDMA stability by configuring NCCL IB timeout and retry count. 2026-02-01 22:04:34 +00:00
Donato Capitella 0d8afba093 feat: Add RAY_DISABLE_METRICS=1 to disable Ray metrics across cluster configurations and scripts. 2026-02-01 21:52:48 +00:00
Donato Capitella ba503f6e61 feat: centralize model configurations and benchmark settings into a new models.py module and update Dockerfile and scripts to use it. 2026-02-01 21:17:15 +00:00
Donato Capitella a1105a0b96 feat: Enhance vLLM benchmarking to compare Triton and ROCm attention, introduce a new script for cluster configuration, and update Dockerfile for new tools and dependencies. 2026-02-01 19:36:07 +00:00
Donato Capitella e5cc96bf48 feat: Introduce vLLM cluster benchmarking and setup scripts, and expand the list of models for local benchmarks. 2026-02-01 15:43:56 +00:00
Donato Capitella b10aa50745 feat: Modularize Dockerfile dependency and ROCm SDK installations into dedicated scripts and add a GitHub Actions workflow to build and consume a custom RCCL library. 2026-02-01 14:50:37 +00:00
Donato Capitella a8added616 feat: Introduce custom RCCL library management for gfx1151, including build scripts, Docker integration, and VLLM benchmarks. 2026-02-01 13:23:10 +00:00
Donato Capitella 039484a41e Updated name of card 2025-12-24 08:13:34 +00:00
Donato Capitella 3b0e736c94 feat: Implement dynamic model discovery from benchmark results, add benchmark notes, and include dialog dependency. 2025-12-20 12:31:20 +00:00
Donato Capitella 5e8b6bb545 updates 2025-12-20 11:37:06 +00:00
Donato Capitella f19932b360 updated envs for better strix halo support on vllm 2025-12-19 08:30:02 +00:00
Donato Capitella b8678b08ba Installing flash_attn, as this is now neded by vLLM 2025-11-30 17:49:29 +00:00
Donato Capitella 74a2e5254a Updating toolbox and pushing GitHub Action 2025-11-30 14:57:37 +00:00
Donato Capitella 7c85688924 fixed missing model provider in model tag 2025-09-04 17:27:38 +01:00
Donato Capitella 7e17fa8660 Added gemma models 2025-09-04 17:20:24 +01:00
Donato Capitella fb54a2a9b9 Fixed missing parameters in start-vllm 2025-09-04 13:58:51 +01:00
Donato Capitella e9460b20ad updated with set of working models 2025-09-04 13:33:53 +01:00
Donato Capitella fc12e2cc63 fixing quant 2025-09-03 23:08:45 +01:00
Donato Capitella 0212638d6a fixes 2025-09-03 22:59:16 +01:00
Donato Capitella 46f4003f79 added start-vllm script 2025-09-03 22:37:26 +01:00
Donato Capitella a1501febb4 first commit 2025-09-03 20:42:44 +01:00