Donato Capitella
|
1af159af81
|
removing llvm flags as they have no impact on performance
|
2026-02-24 08:27:57 +00:00 |
|
Donato Capitella
|
e726d406fa
|
updated benchmarks, fix start-vllm
|
2026-02-23 19:39:19 +00:00 |
|
Donato Capitella
|
e0fadf426b
|
force egaer mode to make gemma stable
|
2026-02-23 18:19:15 +00:00 |
|
Donato Capitella
|
f968cb1f30
|
most of the time spent by devs is to ensure there is no standard way of passing flags - I have no idea why
|
2026-02-23 12:08:57 +00:00 |
|
Donato Capitella
|
fedfa3c682
|
Trying fix for ROCm/llvm loop unrolling bug, to see if performance improves on custom complied kernels
|
2026-02-23 11:43:44 +00:00 |
|
Donato Capitella
|
13c5a929a3
|
feat: refactor vLLM Strix Halo patching into a dedicated script
|
2026-02-23 10:33:20 +00:00 |
|
Donato Capitella
|
5a7f0cc676
|
feat: Implement temporary patch for C10_CHECK macro import missing
|
2026-02-23 09:49:42 +00:00 |
|
Donato Capitella
|
b3fcb0091f
|
feat: Enhance find_max_context.py with Ray cluster support and fix C10_HIP_CHECK build error in Dockerfile.
|
2026-02-23 09:11:30 +00:00 |
|
Donato Capitella
|
91b6dbc270
|
feat: Display environment variables and allow to choose between RoCE/Ethernet and show RCCL debug information
|
2026-02-22 20:07:34 +00:00 |
|
Donato Capitella
|
4a5d6c7855
|
fix broken stuff
|
2026-02-19 20:29:28 +00:00 |
|
Donato Capitella
|
726cd5ae53
|
remove clang patch
|
2026-02-18 15:23:02 +00:00 |
|
Donato Capitella
|
49b85fc1fb
|
add MiniMax
|
2026-02-18 15:22:12 +00:00 |
|
Donato Capitella
|
290beffb05
|
feat: Enhance quantization support for MoE layers with new FP8/INT8 configs and model-specific optimizations across various devices.
|
2026-02-12 11:10:28 +00:00 |
|
Donato Capitella
|
6754095398
|
feat: Introduce measure_bandwidth.sh script, install perfquery, and add the script to the Docker image for RDMA bandwidth monitoring.
|
2026-02-07 10:40:53 +00:00 |
|
Donato Capitella
|
9cf7eaeab2
|
fix: Correct 'buy me a coffee' URL in README.
|
2026-02-06 06:56:26 +00:00 |
|
Donato Capitella
|
c3ecb9bbd5
|
feat: add project context and support sections to README.
|
2026-02-05 17:55:30 +00:00 |
|
Donato Capitella
|
afe985afca
|
added images to RDMA guide
|
2026-02-03 19:47:42 +00:00 |
|
Donato Capitella
|
a2f2156c11
|
docs: Add a new section for references and acknowledgements.
|
2026-02-03 12:08:47 +00:00 |
|
Donato Capitella
|
90c5fe9f83
|
docs: Standardize Fedora OS version references and update IOMMU kernel parameter from amd_iommu=off to iommu=pt in documentation.
|
2026-02-03 08:34:56 +00:00 |
|
Donato Capitella
|
fde8f520d9
|
feat: Update benchmark results across various models and configurations, increasing num_requests from 100 to 200.
|
2026-02-03 08:31:54 +00:00 |
|
Donato Capitella
|
b03a444c91
|
feat: Extract benchmark output file path generation into a helper function and add checks to skip runs if results already exist.
|
2026-02-03 08:28:21 +00:00 |
|
Donato Capitella
|
8ff52abf4e
|
perf: Increase max_num_seqs for bus batch scaling and OFF_NUM_PROMPTS for steady-state throughput measurement on Strix Halo.
|
2026-02-02 22:36:15 +00:00 |
|
Donato Capitella
|
693757f5d9
|
feat: Add script to automate README benchmark table generation and update max context benchmarks with new models and a kernel parameter change.
|
2026-02-02 22:32:12 +00:00 |
|
Donato Capitella
|
4d3b046870
|
feat: Add new benchmark results for various models and configurations, and update documentation UI with filtering for attention and tensor parallelism.
|
2026-02-02 21:30:17 +00:00 |
|
Donato Capitella
|
a412c6bea3
|
build: Ignore __pycache__/ directories.
|
2026-02-02 19:39:21 +00:00 |
|
Donato Capitella
|
1f96c391fb
|
feat: Add comprehensive RDMA cluster setup guide, enforce eager mode in cluster benchmarks, and update documentation with cluster details.
|
2026-02-02 19:34:33 +00:00 |
|
Donato Capitella
|
1ddcb9a202
|
feat: Configure ROCm attention via --attention-backend CLI argument, disable the Ray dashboard, and make eager mode configurable for cluster benchmarks.
|
2026-02-02 15:40:16 +00:00 |
|
Donato Capitella
|
9c6d32e326
|
updating max context results
|
2026-02-02 11:56:26 +00:00 |
|
Donato Capitella
|
0109e6a19b
|
feat: Optimize model max_num_seqs and global benchmark parameters for Strix Halo, and centralize configurations in models.py.
|
2026-02-02 08:45:13 +00:00 |
|
Donato Capitella
|
6f118ff936
|
feat: Update ROCm benchmark result paths, improve cluster node discovery and cache clearing, and refine cluster benchmark result directory.
|
2026-02-02 07:35:50 +00:00 |
|
Donato Capitella
|
c587981d73
|
refactor: Centralize Ray/vLLM cluster management into a new cluster_manager.py module and refactor start_vllm_cluster.py to use it.
|
2026-02-01 22:19:34 +00:00 |
|
Donato Capitella
|
128ddade14
|
fix: improve RDMA stability by configuring NCCL IB timeout and retry count.
|
2026-02-01 22:04:34 +00:00 |
|
Donato Capitella
|
b458b287d0
|
docs: update quickstart to recommend refresh_toolbox.sh for toolbox creation and detail its InfiniBand/RDMA detection capabilities.
|
2026-02-01 21:55:46 +00:00 |
|
Donato Capitella
|
0d8afba093
|
feat: Add RAY_DISABLE_METRICS=1 to disable Ray metrics across cluster configurations and scripts.
|
2026-02-01 21:52:48 +00:00 |
|
Donato Capitella
|
965cd2c339
|
feat: Improve Ray node detection, enable cluster-wide vLLM cache clearing, and enforce eager mode for benchmarks.
|
2026-02-01 21:35:27 +00:00 |
|
Donato Capitella
|
ba503f6e61
|
feat: centralize model configurations and benchmark settings into a new models.py module and update Dockerfile and scripts to use it.
|
2026-02-01 21:17:15 +00:00 |
|
Donato Capitella
|
4b09188776
|
feat: add refresh_toolbox.sh script to automate creation and refresh of the vLLM Podman toolbox.
|
2026-02-01 20:44:54 +00:00 |
|
Donato Capitella
|
a1105a0b96
|
feat: Enhance vLLM benchmarking to compare Triton and ROCm attention, introduce a new script for cluster configuration, and update Dockerfile for new tools and dependencies.
|
2026-02-01 19:36:07 +00:00 |
|
Donato Capitella
|
e5cc96bf48
|
feat: Introduce vLLM cluster benchmarking and setup scripts, and expand the list of models for local benchmarks.
|
2026-02-01 15:43:56 +00:00 |
|
Donato Capitella
|
47bf7daba3
|
feat: add input to specify RCCL artifact run ID for download in build-and-publish workflow
|
2026-02-01 14:58:10 +00:00 |
|
Donato Capitella
|
b10aa50745
|
feat: Modularize Dockerfile dependency and ROCm SDK installations into dedicated scripts and add a GitHub Actions workflow to build and consume a custom RCCL library.
|
2026-02-01 14:50:37 +00:00 |
|
Donato Capitella
|
a8added616
|
feat: Introduce custom RCCL library management for gfx1151, including build scripts, Docker integration, and VLLM benchmarks.
|
2026-02-01 13:23:10 +00:00 |
|
Donato Capitella
|
13caab0634
|
typos
|
2026-01-31 14:39:04 +00:00 |
|
Donato Capitella
|
36424706ee
|
added troubleshooting steps for RDMA
|
2026-01-31 14:37:46 +00:00 |
|
Donato Capitella
|
8ebd432ac6
|
adding patch dependency
|
2026-01-31 12:43:42 +00:00 |
|
Donato Capitella
|
57b592b912
|
added dependecies for RDMA/way
|
2026-01-30 14:47:09 +00:00 |
|
Donato Capitella
|
039484a41e
|
Updated name of card
|
2025-12-24 08:13:34 +00:00 |
|
Donato Capitella
|
255c167734
|
fix
|
2025-12-22 16:40:44 +00:00 |
|
Donato Capitella
|
bc7c8e271b
|
updated table with host configuration
|
2025-12-22 16:40:25 +00:00 |
|
Donato Capitella
|
86eac2889b
|
docs: Update README to specify Fedora 43
|
2025-12-21 09:55:31 +00:00 |
|