amd-strix-halo-vllm-toolboxes

Autor(a)	SHA1	Mensagem	Data
Donato Capitella	b035bcb482	updated benchmarks including thunderbolt and configuratuion guides	2026-02-25 10:48:42 +00:00
Donato Capitella	a5a7b8fe04	fix: Ignore `settings.json` and default 'TP2 (Eth)' checkbox to unchecked in documentation.	2026-02-24 08:50:18 +00:00
Donato Capitella	e726d406fa	updated benchmarks, fix start-vllm	2026-02-23 19:39:19 +00:00
Donato Capitella	90c5fe9f83	docs: Standardize Fedora OS version references and update IOMMU kernel parameter from `amd_iommu=off` to `iommu=pt` in documentation.	2026-02-03 08:34:56 +00:00
Donato Capitella	fde8f520d9	feat: Update benchmark results across various models and configurations, increasing `num_requests` from 100 to 200.	2026-02-03 08:31:54 +00:00
Donato Capitella	8ff52abf4e	perf: Increase `max_num_seqs` for bus batch scaling and `OFF_NUM_PROMPTS` for steady-state throughput measurement on Strix Halo.	2026-02-02 22:36:15 +00:00
Donato Capitella	4d3b046870	feat: Add new benchmark results for various models and configurations, and update documentation UI with filtering for attention and tensor parallelism.	2026-02-02 21:30:17 +00:00
Donato Capitella	1f96c391fb	feat: Add comprehensive RDMA cluster setup guide, enforce eager mode in cluster benchmarks, and update documentation with cluster details.	2026-02-02 19:34:33 +00:00
Donato Capitella	6f118ff936	feat: Update ROCm benchmark result paths, improve cluster node discovery and cache clearing, and refine cluster benchmark result directory.	2026-02-02 07:35:50 +00:00
Donato Capitella	a1105a0b96	feat: Enhance vLLM benchmarking to compare Triton and ROCm attention, introduce a new script for cluster configuration, and update Dockerfile for new tools and dependencies.	2026-02-01 19:36:07 +00:00
Donato Capitella	711de530f6	added ROCm/Triton attention comparison	2025-12-20 11:49:03 +00:00
Donato Capitella	5e8b6bb545	updates	2025-12-20 11:37:06 +00:00

12 Cometimentos