amd-strix-halo-vllm-toolboxes

AI/amd-strix-halo-vllm-toolboxes

Форкнуть 0

Граф коммитов

Выбрать ветку

Скрыть запросы на слияние

backup-before-cleanup

gfx1150

main

#4

#4

#5

#5

48a20990d3 Improve compilation support gfx1150 devbadxyz 2026-03-15 13:04:09 +01:00
039363b819 feat: set LD_LIBRARY_PATH to include ROCm library directories. main Donato Capitella 2026-03-14 13:41:09 +00:00
cf2fd6ec11 chore: remove fix_block_size.py script and its execution from the Dockerfile. Donato Capitella 2026-03-14 13:18:56 +00:00
b78e8a9d82 fix: Remove vLLM block size validation checks by adding and running a new patching script in the Dockerfile. Donato Capitella 2026-03-13 16:29:01 +00:00
16405e8943 config: Add VLLM_DISABLE_COMPILE_CACHE=1 to environment variables across VLLM scripts. Donato Capitella 2026-03-09 14:07:43 +00:00
8de950d9ca feat: Override _get_gcn_arch function to return "gfx1151" and rename the original implementation to _old_get_gcn_arch. Donato Capitella 2026-03-09 12:13:27 +00:00
fb0aef0864 Downgrade Python to 3.12 and remove the --no-deps flag from a pip install command in the Dockerfile. Donato Capitella 2026-03-09 11:08:11 +00:00
9997faaa1e build: Add --no-deps flag to local wheel installation. Donato Capitella 2026-03-08 16:31:16 +00:00
8a20ec27b2 fixing https://github.com/kyuz0/amd-strix-halo-vllm-toolboxes/issues/21 Donato Capitella 2026-02-26 12:36:03 +00:00
c27835d99f feat: Introduce v1 API structure, enhance quantization support, and expand model compatibility with various updates and new tests. Donato Capitella 2026-02-25 11:50:23 +00:00
b035bcb482 updated benchmarks including thunderbolt and configuratuion guides Donato Capitella 2026-02-25 10:48:42 +00:00
6875f62ccf improve benchmarks Donato Capitella 2026-02-25 09:29:46 +00:00
a5a7b8fe04 fix: Ignore settings.json and default 'TP2 (Eth)' checkbox to unchecked in documentation. Donato Capitella 2026-02-24 08:50:18 +00:00
1af159af81 removing llvm flags as they have no impact on performance Donato Capitella 2026-02-24 08:27:57 +00:00
e726d406fa updated benchmarks, fix start-vllm Donato Capitella 2026-02-23 19:39:19 +00:00
e0fadf426b force egaer mode to make gemma stable Donato Capitella 2026-02-23 18:19:15 +00:00
f968cb1f30 most of the time spent by devs is to ensure there is no standard way of passing flags - I have no idea why Donato Capitella 2026-02-23 12:08:57 +00:00
fedfa3c682 Trying fix for ROCm/llvm loop unrolling bug, to see if performance improves on custom complied kernels Donato Capitella 2026-02-23 11:43:44 +00:00
13c5a929a3 feat: refactor vLLM Strix Halo patching into a dedicated script Donato Capitella 2026-02-23 10:33:20 +00:00
5a7f0cc676 feat: Implement temporary patch for C10_CHECK macro import missing Donato Capitella 2026-02-23 09:49:42 +00:00
b3fcb0091f feat: Enhance find_max_context.py with Ray cluster support and fix C10_HIP_CHECK build error in Dockerfile. Donato Capitella 2026-02-23 09:11:30 +00:00
91b6dbc270 feat: Display environment variables and allow to choose between RoCE/Ethernet and show RCCL debug information Donato Capitella 2026-02-22 20:07:34 +00:00
4a5d6c7855 fix broken stuff Donato Capitella 2026-02-19 20:29:28 +00:00
726cd5ae53 remove clang patch Donato Capitella 2026-02-18 15:23:02 +00:00
49b85fc1fb add MiniMax Donato Capitella 2026-02-18 15:22:12 +00:00
290beffb05 feat: Enhance quantization support for MoE layers with new FP8/INT8 configs and model-specific optimizations across various devices. Donato Capitella 2026-02-12 11:10:28 +00:00
6754095398 feat: Introduce measure_bandwidth.sh script, install perfquery, and add the script to the Docker image for RDMA bandwidth monitoring. Donato Capitella 2026-02-07 10:40:53 +00:00
9cf7eaeab2 fix: Correct 'buy me a coffee' URL in README. Donato Capitella 2026-02-06 06:56:26 +00:00
c3ecb9bbd5 feat: add project context and support sections to README. Donato Capitella 2026-02-05 17:55:30 +00:00
afe985afca added images to RDMA guide Donato Capitella 2026-02-03 19:47:42 +00:00
a2f2156c11 docs: Add a new section for references and acknowledgements. backup-before-cleanup Donato Capitella 2026-02-03 12:08:47 +00:00
90c5fe9f83 docs: Standardize Fedora OS version references and update IOMMU kernel parameter from amd_iommu=off to iommu=pt in documentation. Donato Capitella 2026-02-03 08:34:56 +00:00
fde8f520d9 feat: Update benchmark results across various models and configurations, increasing num_requests from 100 to 200. Donato Capitella 2026-02-03 08:31:54 +00:00
b03a444c91 feat: Extract benchmark output file path generation into a helper function and add checks to skip runs if results already exist. Donato Capitella 2026-02-03 08:28:21 +00:00
8ff52abf4e perf: Increase max_num_seqs for bus batch scaling and OFF_NUM_PROMPTS for steady-state throughput measurement on Strix Halo. Donato Capitella 2026-02-02 22:36:15 +00:00
693757f5d9 feat: Add script to automate README benchmark table generation and update max context benchmarks with new models and a kernel parameter change. Donato Capitella 2026-02-02 22:32:12 +00:00
4d3b046870 feat: Add new benchmark results for various models and configurations, and update documentation UI with filtering for attention and tensor parallelism. Donato Capitella 2026-02-02 21:30:17 +00:00
a412c6bea3 build: Ignore __pycache__/ directories. Donato Capitella 2026-02-02 19:39:21 +00:00
1f96c391fb feat: Add comprehensive RDMA cluster setup guide, enforce eager mode in cluster benchmarks, and update documentation with cluster details. Donato Capitella 2026-02-02 19:34:33 +00:00
1ddcb9a202 feat: Configure ROCm attention via --attention-backend CLI argument, disable the Ray dashboard, and make eager mode configurable for cluster benchmarks. Donato Capitella 2026-02-02 15:40:16 +00:00
9c6d32e326 updating max context results Donato Capitella 2026-02-02 11:56:26 +00:00
0109e6a19b feat: Optimize model max_num_seqs and global benchmark parameters for Strix Halo, and centralize configurations in models.py. Donato Capitella 2026-02-02 08:45:13 +00:00
6f118ff936 feat: Update ROCm benchmark result paths, improve cluster node discovery and cache clearing, and refine cluster benchmark result directory. Donato Capitella 2026-02-02 07:35:50 +00:00
c587981d73 refactor: Centralize Ray/vLLM cluster management into a new cluster_manager.py module and refactor start_vllm_cluster.py to use it. Donato Capitella 2026-02-01 22:19:34 +00:00
128ddade14 fix: improve RDMA stability by configuring NCCL IB timeout and retry count. Donato Capitella 2026-02-01 22:04:34 +00:00
b458b287d0 docs: update quickstart to recommend refresh_toolbox.sh for toolbox creation and detail its InfiniBand/RDMA detection capabilities. Donato Capitella 2026-02-01 21:55:46 +00:00
0d8afba093 feat: Add RAY_DISABLE_METRICS=1 to disable Ray metrics across cluster configurations and scripts. Donato Capitella 2026-02-01 21:52:48 +00:00
965cd2c339 feat: Improve Ray node detection, enable cluster-wide vLLM cache clearing, and enforce eager mode for benchmarks. Donato Capitella 2026-02-01 21:35:27 +00:00
ba503f6e61 feat: centralize model configurations and benchmark settings into a new models.py module and update Dockerfile and scripts to use it. Donato Capitella 2026-02-01 21:17:15 +00:00
4b09188776 feat: add refresh_toolbox.sh script to automate creation and refresh of the vLLM Podman toolbox. Donato Capitella 2026-02-01 20:44:54 +00:00
a1105a0b96 feat: Enhance vLLM benchmarking to compare Triton and ROCm attention, introduce a new script for cluster configuration, and update Dockerfile for new tools and dependencies. Donato Capitella 2026-02-01 19:36:07 +00:00
e5cc96bf48 feat: Introduce vLLM cluster benchmarking and setup scripts, and expand the list of models for local benchmarks. Donato Capitella 2026-02-01 15:43:56 +00:00
47bf7daba3 feat: add input to specify RCCL artifact run ID for download in build-and-publish workflow Donato Capitella 2026-02-01 14:58:10 +00:00
b10aa50745 feat: Modularize Dockerfile dependency and ROCm SDK installations into dedicated scripts and add a GitHub Actions workflow to build and consume a custom RCCL library. Donato Capitella 2026-02-01 14:50:37 +00:00
a8added616 feat: Introduce custom RCCL library management for gfx1151, including build scripts, Docker integration, and VLLM benchmarks. Donato Capitella 2026-02-01 13:23:10 +00:00
13caab0634 typos Donato Capitella 2026-01-31 14:39:04 +00:00
36424706ee added troubleshooting steps for RDMA Donato Capitella 2026-01-31 14:37:46 +00:00
8ebd432ac6 adding patch dependency Donato Capitella 2026-01-31 12:43:42 +00:00
57b592b912 added dependecies for RDMA/way Donato Capitella 2026-01-30 14:47:09 +00:00
039484a41e Updated name of card Donato Capitella 2025-12-24 08:13:34 +00:00
255c167734 fix Donato Capitella 2025-12-22 16:40:44 +00:00
bc7c8e271b updated table with host configuration Donato Capitella 2025-12-22 16:40:25 +00:00
86eac2889b docs: Update README to specify Fedora 43 Donato Capitella 2025-12-21 09:55:31 +00:00
15f1889c6f fixes Donato Capitella 2025-12-20 12:32:46 +00:00
3b0e736c94 feat: Implement dynamic model discovery from benchmark results, add benchmark notes, and include dialog dependency. Donato Capitella 2025-12-20 12:31:20 +00:00
711de530f6 added ROCm/Triton attention comparison Donato Capitella 2025-12-20 11:49:03 +00:00
5e8b6bb545 updates Donato Capitella 2025-12-20 11:37:06 +00:00
f19932b360 updated envs for better strix halo support on vllm Donato Capitella 2025-12-19 08:30:02 +00:00
69f869ae41 restore staging Donato Capitella 2025-12-19 08:06:51 +00:00
2b48cae736 feat: Update Dockerfile with pgrep and PyTorch nightly URL. Donato Capitella 2025-12-19 07:45:07 +00:00
f91dc685ad add bits and bytes Donato Capitella 2025-12-18 08:56:14 +00:00
b8678b08ba Installing flash_attn, as this is now neded by vLLM Donato Capitella 2025-11-30 17:49:29 +00:00
30bd06b1bd more dockerfile AI SLOP Donato Capitella 2025-11-30 15:45:48 +00:00
c9cc843787 fix Donato Capitella 2025-11-30 15:41:01 +00:00
52814ef9a2 fixing Dockerfile Donato Capitella 2025-11-30 15:37:12 +00:00
1fe0b82853 updated Dockerfile Donato Capitella 2025-11-30 15:29:02 +00:00
74a2e5254a Updating toolbox and pushing GitHub Action Donato Capitella 2025-11-30 14:57:37 +00:00
7c85688924 fixed missing model provider in model tag Donato Capitella 2025-09-04 17:27:38 +01:00
f8db65e8d7 Fixed typos due to copy/paste Donato Capitella 2025-09-04 17:22:18 +01:00
7e17fa8660 Added gemma models Donato Capitella 2025-09-04 17:20:24 +01:00
8ee405f07e Fixed Docker/Podman commands Donato Capitella 2025-09-04 15:02:00 +01:00
fb54a2a9b9 Fixed missing parameters in start-vllm Donato Capitella 2025-09-04 13:58:51 +01:00
e9460b20ad updated with set of working models Donato Capitella 2025-09-04 13:33:53 +01:00
8509fe2d92 another patch for amdsmi Donato Capitella 2025-09-04 07:34:55 +01:00
fc12e2cc63 fixing quant Donato Capitella 2025-09-03 23:08:45 +01:00
0212638d6a fixes Donato Capitella 2025-09-03 22:59:16 +01:00
e17d61916b typo Donato Capitella 2025-09-03 22:42:06 +01:00
46f4003f79 added start-vllm script Donato Capitella 2025-09-03 22:37:26 +01:00
a1501febb4 first commit Donato Capitella 2025-09-03 20:42:44 +01:00

Граф коммитов Выбрать ветку Скрыть запросы на слияние backup-before-cleanup gfx1150 main #4 #4 #5 #5 Моно Цвет

Граф коммитов

Выбрать ветку

Скрыть запросы на слияние

backup-before-cleanup

gfx1150

main

#4

#4

#5

#5