rocm-systems

Tekijä	SHA1	Viesti	Päivämäärä
Atul Kulkarni	63aa3bb537	Remove legacy Shm and P2p tests (#2089 ) These tests will be replaced by MPI tests. [ROCm/rccl commit: `0d797d1f6c`]	2025-12-05 16:53:28 -06:00
Atul Kulkarni	86a4dd95f6	Remove static to non-static conversion used in tests (#2084 ) * Remove coll_reg tests which are unsupported * removed static to non-static conversion feature [ROCm/rccl commit: `7ec8e73e12`]	2025-12-04 18:03:14 -06:00
Atul Kulkarni	a364ada6e7	Add missing header in alloc.h (#2086 ) [ROCm/rccl commit: `892d258319`]	2025-12-04 11:26:19 -06:00
Atul Kulkarni	0ced7aede8	Fix rccl test suite to use hip_bf16.h instead of hip_bfloat16.h for the __bf16 intrinsic (#2082 ) [ROCm/rccl commit: `cc6e259a02`]	2025-12-04 10:02:06 -06:00
Atul Kulkarni	e4aef19511	Added new unit tests for AllReduce with Bias API (#2036 ) * Added new unit tests for AllReduce with Bias API * Address review comments [ROCm/rccl commit: `7c12b0b76b`]	2025-12-03 17:37:34 -06:00
Wenkai Du	3e650467fa	Use one side stream per process (#2063 ) * Use one side stream per process * Handle multiple GPUs per process * Reset stream when not found * Address review comments * Fix missing mutex initializer [ROCm/rccl commit: `185e78a8f0`]	2025-12-02 10:03:15 -08:00
corey-derochie-amd	8e3f60e080	Add copyright to src/device/symmetric/all_reduce.cuh (#2080 ) [ROCm/rccl commit: `4acd0f64ea`]	2025-11-27 14:29:21 -07:00
isaki001	cf11e2f39f	add back missing proxy-counter updates (#2052 ) [ROCm/rccl commit: `da183596cd`]	2025-11-25 15:22:34 -06:00
Kapil S. Pawar	566671910a	[RcclReplayer] JSON <-> BIN log format conversion tool (#2056 ) * Add replay log format converter * Add Log Sanitizer * Add no timestamp option (nts) to sanitizer [ROCm/rccl commit: `5fd86021a8`]	2025-11-24 11:51:36 -06:00
Nilesh M Negi	8c928e60f9	[AzureCI] Increase timeout of per PR and nightly pipeline to 240 mins (#2074 ) [ROCm/rccl commit: `db52690c2a`]	2025-11-24 10:55:36 -06:00
AbandiGa	d6087d0d62	Fix rcclNetP2pPolicy issue (#2072 ) * fix rcclNetP2pPolicy issue * change the comment to ncclNetIb [ROCm/rccl commit: `b14e32c46e`]	2025-11-21 18:28:10 -06:00
Nilesh M Negi	3026698c40	[AzureCI] Increase UT timeout to 3 hours (#2070 ) [ROCm/rccl commit: `fca0015962`]	2025-11-21 09:18:49 -06:00
Matt Williams	7456dc7d17	Fix ToC in API Library page (#2053 ) * Add intro and remove ToC [ROCm/rccl commit: `3495baa6b2`]	2025-11-20 09:35:15 -05:00
Kapil S. Pawar	3f12f2f735	[AzureCI] Add RcclReplayer to CI (#2048 ) [ROCm/rccl commit: `ab1eb6e70e`]	2025-11-18 12:21:24 -06:00
Kapil S. Pawar	acb0d614a5	Functional Tests for Ext-Profiler Plugin (#2007 ) * Add functional tests for CSV Tuner Plugin * Updated directory structure * Updated and renamed directories * Updated csv conf files * Added tests for ext-profiler * Updated readme * Updated readme [ROCm/rccl commit: `c7f400dbff`]	2025-11-18 11:20:39 -06:00
Arm Patinyasakdikul	f81bb04bff	Added install.sh flag to suppress warnings. (#2054 ) [ROCm/rccl commit: `461e61d10e`]	2025-11-17 00:35:06 -06:00
Pedram Alizadeh	3d2fc04b45	Using hip_bf16.h instead of hip_bfloat16.h for the __bf16 intrinsic (#2037 ) * Using hip_bf16.h instead of hip_bfloat16.h for the __bf16 intrinsic * Switching to hip_bf16.h from ROCm 6.0.0 [ROCm/rccl commit: `fb67e5b467`]	2025-11-13 15:56:18 -05:00
AbandiGa	7f7c8d14f6	Disable Bfloatf16 pipelining for reduction collectives for gfx950 (#2047 ) * disable bf16 reduce_copy pipelining for gfx950 * edit CHANGELOG * Combine unroll and pipeline local arch calculation into single function * fix multi-node error and disbale for gfx950 even if it's not a local build * removed has_gfx950 * disable pipelining for gfx950 in rcclSetPipelining --------- Co-authored-by: Ghadeer Alabandi <galaband@cv350-zts-gtu-h30-08.prov.gtu.zts.cpe.ice.amd.com> Co-authored-by: Ghadeer Alabandi <galaband@cv350-zts-gtu-h30-18.prov.gtu.zts.cpe.ice.amd.com> Co-authored-by: Ghadeer Alabandi <galaband@cv350-zts-gtu-h28a-08.prov.gtu.zts.cpe.ice.amd.com> [ROCm/rccl commit: `277b6e9bac`]	2025-11-13 14:55:09 -06:00
isaki001	9a81823515	Post thread-block size increase tuning (#2042 ) * for multinode gfx950, extend AR LL128 up to 256MB, extend RS LL128 up to 8MB per rank, extend AG LL up to 64KB per rank * dont override direct allgather threshold if set to -1 * restore 2-node AR simple at earlier message sizes than higher multi-node AR * extend range of LL for single-node RS on gfx950 * update algo/proto for multi-node allreduce on gfx942 * set single-node AR on gfx950 to Tree LL for KB message sizes * decrease threshold for single node Tree for gfx950 AR [ROCm/rccl commit: `0d09f86608`]	2025-11-13 14:51:04 -06:00
Bertan Dogancay	48f37be1e3	[Launch] Move cudaEventRecord call to capturing stream only (#2050 ) [ROCm/rccl commit: `83ffc82fa7`]	2025-11-13 08:38:09 -06:00
gilbertlee-amd	22d9a038a2	[GRAPH] Adding support for rail-optimized trees for MI3XX with 4 NICs (#2031 ) [ROCm/rccl commit: `46b032b760`]	2025-11-12 19:34:27 -06:00
nawrinsu	cac8dc67fd	Add tuner config file (2,4,8 nodes) for gfx950 (#2012 ) * Add tuner config file (2,4,8 nodes) for gfx950 * remove alltoall * Added comment regarding allgather direct [ROCm/rccl commit: `c488c5307e`]	2025-11-12 09:16:36 -08:00
Kapil S. Pawar	c4d7680749	Added Functional Tests for CSV Tuner Plugin (#1968 ) * Add functional tests for CSV Tuner Plugin * Updated directory structure * Updated and renamed directories * Updated csv conf files * Updated readme * Updated readme * Updated readme [ROCm/rccl commit: `c8da880dc7`]	2025-11-11 10:11:19 -06:00
Dingming Wu	0d3fba9a22	Adjust nChannels on gfx950 based on ranks and nodes for better bandwidth (#2027 ) [ROCm/rccl commit: `b811645688`]	2025-11-11 09:46:51 -06:00
Gheorghe-Teodor Bercea	3da73a7526	Fix compilation when enabling indirect function calls (#1994 ) Fix compilation when enabling indirect function calls. [ROCm/rccl commit: `1678bb9ae7`]	2025-11-11 09:36:48 -05:00
Mustafa Abduljabbar	b12399898d	Reduce LL threshold for a2a (#2032 ) [ROCm/rccl commit: `52f9526bd6`]	2025-11-10 19:14:23 -05:00
Kapil S. Pawar	6bbc4b5d48	[RcclReplayer] Compile without the need for RCCL to be compiled (#2039 ) [ROCm/rccl commit: `acdafac49f`]	2025-11-10 15:38:48 -06:00
Dingming Wu	23870ceccd	Fail the job if flag HIP_HOST_UNCACHED_MEMORY is not set on MI350x (#2023 ) * Fail the job if compiler flag HIP_HOST_UNCACHED_MEMORY is not turned on on mi350x Place the check after initTransportsRank as the GPU arch info in comm->topo->nodes info is populated after that. * Update src/init.cc to use ERROR instead of WARN Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> [ROCm/rccl commit: `05f914c997`]	2025-11-10 11:54:35 -06:00
Dingming Wu	c601f9b3f8	Increment opCount for intra-node comms as well (#2024 ) * Enhance logging in NCCL initialization It's convenient to log comms obj and default channels together for debugging * Add opCount to collDevWork and update increment logic Added opCount to collDevWork and incremented it when proxyOpQueue is empty (e.g., for intra-node comms) * Clarify opCount increment logic in enqueue.cc Updated comment to clarify incrementing opCount for intranode communications. * Refactor NCCL_INIT logging format Updated logging format for NCCL_INIT to improve clarity. * Remove duplicate INFO logging in init.cc [ROCm/rccl commit: `b00ee4c83c`]	2025-11-10 11:23:49 -06:00
Bertan Dogancay	b955a7df40	[GEN/BUILD] Refactor generator script and reduce build time for old archs. (#2030 ) [ROCm/rccl commit: `b1e680adc0`]	2025-11-07 15:15:25 -05:00
Bertan Dogancay	524453baea	[Launch] Enable Implicit order launch with serial mode (#2033 ) [ROCm/rccl commit: `a9bb7e9807`]	2025-11-07 13:29:53 -05:00
Avinash	5ca67dc803	Empty kernel test enhancements [tools] (#1999 ) * Initial commit * Improvements-1 * Initial commit for PR * Updates warning, run.sh, decoupled loops * Forcing seq cst for CPU timimg [ROCm/rccl commit: `85baa0d113`]	2025-11-07 12:28:06 -06:00
Ghadeer Ahmed H Alabandi	5b66480595	[NET] Enable capping the number of QPs created for send/recv colls (#1998 ) [ROCm/rccl commit: `45991fadad`]	2025-11-07 00:47:01 +00:00
alex-breslow-amd	bd614458c3	[gfx950] Turn On Single Node One Slice Optimization for gfx950 and MI300A (#2017 ) * Internal benchmarking shows nice single-node performance uplift for MI300A and MI350 [ROCm/rccl commit: `56e0b4e445`]	2025-11-06 12:12:45 -08:00
Arm Patinyasakdikul	25005c1cce	proxy: handle progressOps return code properly. (#2029 ) [ROCm/rccl commit: `d6a53d2022`]	2025-11-04 09:09:50 -06:00
Aravind Ravikumar	4babb01f4d	Add S3 upload support for Perf and test reports by run ID and architecture (#2020 ) * Commits to enable scp report copy * Added Post report upload step * Added extra arg for fetch artifacts * Moved to a specific commit * Add write permissions to s3 * Added comment for TheRock sha commit date --------- Co-authored-by: arravikum <arravikum@amd.com> [ROCm/rccl commit: `07f8f6d6c6`]	2025-11-03 19:09:34 -05:00
nawrinsu	6d22ce9b1a	Fix protocol and channel override when tuner is used (#1985 ) * Fix protocol and channel override when tuner is used * Added comment * Fix README for basic tuner implementation [ROCm/rccl commit: `166268d715`]	2025-11-03 13:56:34 -08:00
Ahmed Khan	acf64be514	Remove duplicate MaxEmptyLinesToKeep from clang-format (#2016 ) [ROCm/rccl commit: `caffd013f6`]	2025-11-02 21:44:27 -06:00
Nilesh M Negi	dd625edf56	Revert "[GEN/BUILD] Refactor generate.py and reduce build time for older archs (#2006 )" (#2021 ) This reverts commit `40f3faead0`. [ROCm/rccl commit: `62ab7a22d7`]	2025-10-31 10:04:12 -05:00
David DeBonis	3e750f0f57	Single-node AllGather and ReduceScatter Optimization (#2019 ) * Single-node performance tuning * Normalizing value to individual rank [ROCm/rccl commit: `63d5846452`]	2025-10-31 08:59:46 -06:00
Arm Patinyasakdikul	54194a17c3	Added ERROR message class to handle fatal error messages. (#2002 ) * Added ERROR message class to handle fatal error messages. New ERROR message class will print the message in all debug level, including none. Change some of the fatal error message to be in ERROR instead of WARN. Added new error handler function to print out more meaningful error message in the future. * Added CHANGELOG entry. * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Change to no longer reuse NONE as ERROR. ERROR is now a separated class. * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `1ce83d5cc0`]	2025-10-30 16:14:20 -05:00
Arm Patinyasakdikul	03e92dc942	Added copyrights for Palamida scan 7.2. (#2018 ) [ROCm/rccl commit: `84fdcab68a`]	2025-10-30 13:33:20 -05:00
isaki001	9bccbcd619	P2p batching hang-fix (#2011 ) * prevent batching when send/recv bytes dont match, restore bit reversal for channel to part mapping, prevent batching beyond 32-nodes * correct computation for channel to part mapping * update changelog * disabling p2p-batching by default [ROCm/rccl commit: `641c0eb51c`]	2025-10-30 13:32:01 -05:00
isaki001	678366f5e2	gx950 multi-node tuning for LL/LL128 (#1953 ) * increased LL threshold for gfx950 AR to 256KB * AG/RS proto threshold update [ROCm/rccl commit: `72996e4d9f`]	2025-10-30 12:08:12 -05:00
Bertan Dogancay	40f3faead0	[GEN/BUILD] Refactor generate.py and reduce build time for older archs (#2006 ) [ROCm/rccl commit: `bed7cdf863`]	2025-10-30 11:45:53 -04:00
Nilesh M Negi	03d37f6305	Fix gfx950 gating conditions to match ROCm 7.0.2 (#2003 ) [ROCm/rccl commit: `8444b3c6e9`]	2025-10-29 23:27:04 -05:00
Mustafa Abduljabbar	eb0b1387b7	[Device] Adjust threadblock size for gfx950 to increase LL64/Simple performance for AR, RS and AG (#1978 ) * Add initial commit to increase tb size to 512 * Fix LL perf issue when subset of NCCL_MAX_NTHREADS is used Adding a constant to barrier_generic logic from using fallback logic when nthreads < NCCL_MAX_NTHREADS and nthreads == blockDim.X * Adjust nthreads for LL * Opt threads for reduce_scatter upper small range * Add macro for single node * Restrict MSCCL to 256 threads to prevent mem access fault * Support pre-MI350 compatibility * Partially refactor threadblock size override * Use const macros instead of numerals * opt out of unused function [ROCm/rccl commit: `12f51ba8bf`]	2025-10-29 23:24:32 -05:00
Bertan Dogancay	4c7afea115	[Tools/Replayer] Fix prohibited calls during capture mode (#1938 ) [ROCm/rccl commit: `b703ffdfa4`]	2025-10-29 12:19:32 -04:00
corey-derochie-amd	c5cdee4fa5	Updated Changelog with 7.1.1 and 7.2.0 stub sections (#2008 ) * Missing ROCm 7.0 & 7.1.0 Changelog entries (#1976) * Update CHANGELOG.md * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Added ROCm 7.2.0 section. * Update CHANGELOG.md * Apply suggestion from @corey-derochie-amd --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `561ad2fe05`]	2025-10-28 13:41:22 -06:00
Atul Kulkarni	f2287e8f97	Removed RCCL_EXPOSE_STATIC duplicate definition. (#1988 ) [ROCm/rccl commit: `cc867dbaf2`]	2025-10-28 13:01:48 -05:00

1 2 3 4 5 ...

2066 Commitit