2
0

12 Cometimentos

Autor(a) SHA1 Mensagem Data
Donato Capitella b035bcb482 updated benchmarks including thunderbolt and configuratuion guides 2026-02-25 10:48:42 +00:00
Donato Capitella a5a7b8fe04 fix: Ignore settings.json and default 'TP2 (Eth)' checkbox to unchecked in documentation. 2026-02-24 08:50:18 +00:00
Donato Capitella e726d406fa updated benchmarks, fix start-vllm 2026-02-23 19:39:19 +00:00
Donato Capitella 90c5fe9f83 docs: Standardize Fedora OS version references and update IOMMU kernel parameter from amd_iommu=off to iommu=pt in documentation. 2026-02-03 08:34:56 +00:00
Donato Capitella fde8f520d9 feat: Update benchmark results across various models and configurations, increasing num_requests from 100 to 200. 2026-02-03 08:31:54 +00:00
Donato Capitella 8ff52abf4e perf: Increase max_num_seqs for bus batch scaling and OFF_NUM_PROMPTS for steady-state throughput measurement on Strix Halo. 2026-02-02 22:36:15 +00:00
Donato Capitella 4d3b046870 feat: Add new benchmark results for various models and configurations, and update documentation UI with filtering for attention and tensor parallelism. 2026-02-02 21:30:17 +00:00
Donato Capitella 1f96c391fb feat: Add comprehensive RDMA cluster setup guide, enforce eager mode in cluster benchmarks, and update documentation with cluster details. 2026-02-02 19:34:33 +00:00
Donato Capitella 6f118ff936 feat: Update ROCm benchmark result paths, improve cluster node discovery and cache clearing, and refine cluster benchmark result directory. 2026-02-02 07:35:50 +00:00
Donato Capitella a1105a0b96 feat: Enhance vLLM benchmarking to compare Triton and ROCm attention, introduce a new script for cluster configuration, and update Dockerfile for new tools and dependencies. 2026-02-01 19:36:07 +00:00
Donato Capitella 711de530f6 added ROCm/Triton attention comparison 2025-12-20 11:49:03 +00:00
Donato Capitella 5e8b6bb545 updates 2025-12-20 11:37:06 +00:00