13 Commity

Autor SHA1 Zpráva Datum
Arm Patinyasakdikul b14fec8dbc Fix git version fetching logic. (#1981)
[ROCm/rccl commit: 9806f5e9dd]
2025-10-17 09:17:49 -05:00
Arm Patinyasakdikul 99699b10a2 Fix issue where staging/mainline build commit hash doesn't match the actual RCCL commit. (#1910)
[ROCm/rccl commit: f21fbdfc18]
2025-09-11 16:13:21 -05:00
Mustafa Abduljabbar f37f290134 [Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861)
* Support pipelining codegen and template specialization

* Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16)

* Remove need for FUNC_INDEX_TOTAL

* Add pipeline field to device function key construction logic

* Avoid unneeded codegen for LL/LL64 kernels

* Modify conditions and add pipeline dtypes env

* Optimize selection for both gfx942 and gfx950

* Increase pipeline bitfield width

* Use __forceinline__ for all device functions

* Realign reduceCopy with original form

* Add opt-out option to enable perf debugs

* Remove force-reduce-pipelining option from README

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 277747c199]
2025-08-26 15:03:54 -04:00
mberenjk c76a4492f1 Added useAcc as a template parameter to address the performance regression (#1856)
* Added useAcc as a template parameter to address the 2% performance regression in allreduceWithBias
---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>


[ROCm/rccl commit: c61152baa4]
2025-08-14 15:58:54 -05:00
Grant Pinkert 1d68693a2e Fix continuous build hang on extract_metadata.cmake (#1668)
When the `roc-obj-ls` executable fails, it sometimes does not return. Since the `execute_process` command will wait until the executable finishes, this means that in some cases, the build will hang indefinitely. There is no error message, and no indication that anything is wrong. This commit fixes that by introducing timeouts into the code and better error reporting.

[ROCm/rccl commit: 2482d1475f]
2025-06-22 05:54:44 -05:00
BertanDogancay d045d0ca23 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: a6bf9bfc9e]
2025-04-23 20:47:43 -07:00
Wenkai Du e86b217182 Add fault injection of starting warps with random variations (#1593)
* Add fault injection of starting warps with random variations

This is done by inserting randomly delays after __syncthreads().
The feature can be turned off by FAULT_INJECTION=OFF in cmake.

* Remove manually introduced bug for demo purpose

* Use only one thread per warp for checking wall clock

[ROCm/rccl commit: 90ad586d94]
2025-03-20 16:11:43 -07:00
corey-derochie-amd e95578ef4c removed gfx940 and gfx941 (#1606)
* removed gfx940 and gfx941

* removed gfx940 and gfx941

* Update "gfx94" to "gfx942" in init.cc

* Updated remaining "gfx94" updates to "gfx942"

* Update filenames and variables from gfx940 to gfx942

---------

Co-authored-by: akolliasAMD <akollias@amd.com>

[ROCm/rccl commit: 6505639cf4]
2025-03-20 09:34:53 -06:00
Pedram Alizadeh acf5822a6c enable building rccl for gfx950 (#1571)
[ROCm/rccl commit: f268553ee4]
2025-02-25 16:13:48 -05:00
BertanDogancay 1b000665df Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 36343be84f]
2025-01-23 12:08:46 -06:00
Bertan Dogancay 974c13cd62 [BUILD] Move code generation to python from CMake (#1360)
* Use generate.py for func generation

* Convert AddUnroll.cmake to bash

[ROCm/rccl commit: 2dd10c8f17]
2024-10-03 10:21:19 -04:00
Wenkai Du 54e4899607 Template unroll for RCCL kernels (#1250)
* Template unroll for RCCL kernels

* Adding unroll template arg during CMake hipification

* Reduce linking parallel jobs to avoid OOM in CI

* Workaround issues with UT tests

SWDEV-469533: register spill fix is needed for mainline build
LWPCOMMLIBS-369: cannot enable 112 channels with 80 CUs
Use -parallel-jobs=8 for linking

* CI: do not use -j 16 when building

* CI: use -j 8 when building

* Only reduce parallel linking job for CI extended

* Restore original jenkins command. Change parallel linking jobs in cmake

* Disable MSCCLPP

---------

Co-authored-by: gilbertlee-amd <gilbert.lee@amd.com>

[ROCm/rccl commit: 89349f2ce4]
2024-07-19 08:15:59 -07:00
Bertan Dogancay bee47d9e91 Add stack size UT (#1081)
* Add stack size UT

[ROCm/rccl commit: dc2d486ba0]
2024-02-12 17:56:15 -07:00