amd-strix-halo-vllm-toolboxes

مولف	SHA1	پیام	تاریخ
Donato Capitella	8ff52abf4e	perf: Increase `max_num_seqs` for bus batch scaling and `OFF_NUM_PROMPTS` for steady-state throughput measurement on Strix Halo.	2026-02-02 22:36:15 +00:00
Donato Capitella	693757f5d9	feat: Add script to automate README benchmark table generation and update max context benchmarks with new models and a kernel parameter change.	2026-02-02 22:32:12 +00:00
Donato Capitella	1f96c391fb	feat: Add comprehensive RDMA cluster setup guide, enforce eager mode in cluster benchmarks, and update documentation with cluster details.	2026-02-02 19:34:33 +00:00
Donato Capitella	1ddcb9a202	feat: Configure ROCm attention via `--attention-backend` CLI argument, disable the Ray dashboard, and make eager mode configurable for cluster benchmarks.	2026-02-02 15:40:16 +00:00
Donato Capitella	0109e6a19b	feat: Optimize model `max_num_seqs` and global benchmark parameters for Strix Halo, and centralize configurations in `models.py`.	2026-02-02 08:45:13 +00:00
Donato Capitella	6f118ff936	feat: Update ROCm benchmark result paths, improve cluster node discovery and cache clearing, and refine cluster benchmark result directory.	2026-02-02 07:35:50 +00:00
Donato Capitella	c587981d73	refactor: Centralize Ray/vLLM cluster management into a new `cluster_manager.py` module and refactor `start_vllm_cluster.py` to use it.	2026-02-01 22:19:34 +00:00
Donato Capitella	128ddade14	fix: improve RDMA stability by configuring NCCL IB timeout and retry count.	2026-02-01 22:04:34 +00:00
Donato Capitella	0d8afba093	feat: Add `RAY_DISABLE_METRICS=1` to disable Ray metrics across cluster configurations and scripts.	2026-02-01 21:52:48 +00:00
Donato Capitella	ba503f6e61	feat: centralize model configurations and benchmark settings into a new `models.py` module and update Dockerfile and scripts to use it.	2026-02-01 21:17:15 +00:00
Donato Capitella	a1105a0b96	feat: Enhance vLLM benchmarking to compare Triton and ROCm attention, introduce a new script for cluster configuration, and update Dockerfile for new tools and dependencies.	2026-02-01 19:36:07 +00:00
Donato Capitella	e5cc96bf48	feat: Introduce vLLM cluster benchmarking and setup scripts, and expand the list of models for local benchmarks.	2026-02-01 15:43:56 +00:00
Donato Capitella	b10aa50745	feat: Modularize Dockerfile dependency and ROCm SDK installations into dedicated scripts and add a GitHub Actions workflow to build and consume a custom RCCL library.	2026-02-01 14:50:37 +00:00
Donato Capitella	a8added616	feat: Introduce custom RCCL library management for gfx1151, including build scripts, Docker integration, and VLLM benchmarks.	2026-02-01 13:23:10 +00:00
Donato Capitella	039484a41e	Updated name of card	2025-12-24 08:13:34 +00:00
Donato Capitella	3b0e736c94	feat: Implement dynamic model discovery from benchmark results, add benchmark notes, and include `dialog` dependency.	2025-12-20 12:31:20 +00:00
Donato Capitella	5e8b6bb545	updates	2025-12-20 11:37:06 +00:00
Donato Capitella	f19932b360	updated envs for better strix halo support on vllm	2025-12-19 08:30:02 +00:00
Donato Capitella	b8678b08ba	Installing flash_attn, as this is now neded by vLLM	2025-11-30 17:49:29 +00:00
Donato Capitella	74a2e5254a	Updating toolbox and pushing GitHub Action	2025-11-30 14:57:37 +00:00
Donato Capitella	7c85688924	fixed missing model provider in model tag	2025-09-04 17:27:38 +01:00
Donato Capitella	7e17fa8660	Added gemma models	2025-09-04 17:20:24 +01:00
Donato Capitella	fb54a2a9b9	Fixed missing parameters in start-vllm	2025-09-04 13:58:51 +01:00
Donato Capitella	e9460b20ad	updated with set of working models	2025-09-04 13:33:53 +01:00
Donato Capitella	fc12e2cc63	fixing quant	2025-09-03 23:08:45 +01:00
Donato Capitella	0212638d6a	fixes	2025-09-03 22:59:16 +01:00
Donato Capitella	46f4003f79	added start-vllm script	2025-09-03 22:37:26 +01:00
Donato Capitella	a1501febb4	first commit	2025-09-03 20:42:44 +01:00

28 کامیت‌ها