PedramAlizadeh
45872d170f
Changed the name of UnitTests to rccl-UnitTests (wrapper executable included).
2022-12-13 21:45:57 +00:00
Pedram Alizadeh
8250092367
UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) ( #662 )
2022-12-13 16:05:09 -05:00
Ziyue Yang
adafc0f759
Add MSCCL Support ( #658 )
...
* Add MSCCL support
* Add alignment and message size checking
* Fix nRanks checking, in-place and out-of-place tests and group call handling
* Fix hipGraph unit test
* Change MSCCL init warning to INFO
* Revise license info
2022-12-12 15:51:04 -08:00
Wenkai Du
b953544a59
Fix typo in detecting Intel platforms ( #661 )
2022-12-07 13:36:11 -08:00
akolliasAMD
eca623df07
decreased warp size for gfx110x ( #655 )
2022-12-01 12:19:21 -07:00
gilbertlee-amd
faed69f9fc
Graph unit tests ( #656 )
...
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Wenkai Du
aebed537a5
Reduce linking time through more parallel jobs ( #657 )
2022-11-30 16:06:03 -08:00
Wenkai Du
fb9938cffa
Query DMABuf support through HSA runtime API ( #654 )
2022-11-30 08:53:03 -08:00
Wenkai Du
9594bbee3b
Adjust P2P channels on Intel platform ( #653 )
2022-11-29 13:57:10 -08:00
akolliasAMD
11862f67de
removed cmake HIP_CLANG_PATCH_LEVEL check ( #652 )
...
* removed HIP_CLANG_PATCH_LEVEL check
2022-11-29 09:48:59 -07:00
Wenkai Du
67d9327f52
Merge pull request #651 from wenkaidu/nccl_sync
...
Sync up with NCCL
2022-11-28 17:33:59 -08:00
Wenkai Du
bf03a48289
Merge remote-tracking branch 'nccl/master' into HEAD
2022-11-28 09:46:16 -08:00
gilbertlee-amd
36ac8107bd
Update CHANGELOG up to ROCm 5.4 ( #649 )
...
* Update CHANGELOG for ROCm 5.4.0
2022-11-23 09:40:19 -07:00
Sylvain Jeaugey
614b49f0de
Fix google-fastsocket plugin build
2022-11-22 02:13:13 -08:00
Sylvain Jeaugey
55b1d8ab98
Add documentation for NCCL NET plugins
...
Also repurpose dummy plugin as example, including headers and
compat layers from v6 to v2.
2022-11-22 02:12:53 -08:00
Wenkai Du
57764f8152
Fix incorrect rocm-smi ID conversion ( #648 )
2022-11-21 19:44:39 -08:00
Wenkai Du
9cb72a3d0f
Fix collective trace timestamp format ( #647 )
2022-11-21 08:11:12 -08:00
Wenkai Du
cf3c32a626
Fix typo in previous hipify change ( #645 )
2022-11-15 11:51:47 -08:00
Wenkai Du
b4f6eee9b4
Merge pull request #643 from ROCmSoftwarePlatform/2.15.5
...
Sync up with NCCL 2.15.5
2022-11-15 08:40:59 -08:00
Wenkai Du
562dd87036
Move hipify to cmake stage
...
Add minimal ROCm/HIP version requirements for Graph support
2022-11-14 18:10:45 +00:00
raramakr
ca05b3d8d4
Merge pull request #644 from raramakr/swdev-reorg
...
Correct hsa header path for new directory layout
2022-11-10 10:06:39 -08:00
Wenkai Du
94ad7f6f51
Update tuning table and fix topo_expl
2022-11-07 18:24:24 +00:00
Ranjith Ramakrishnan
b397cb16ea
Correct hsa header path for new directory layout
2022-11-04 09:52:16 -07:00
Wenkai Du
9a077e6947
Merge remote-tracking branch 'nccl/master' into develop
2022-11-03 21:17:42 +00:00
Wenkai Du
effc4b255b
Merge pull request #641 from ROCmSoftwarePlatform/2.14.3
...
Sync up with NCCL 2.14.3
2022-11-02 08:31:12 -07:00
Wenkai Du
72ef100050
Fix P2P scheduling
2022-10-31 08:54:34 -07:00
Sylvain Jeaugey
2f4cb874ba
Merge tag 'v2.15.5-1'
2022-10-25 01:15:22 -07:00
Sylvain Jeaugey
cb111f764a
2.15.5-1
...
Fix crash with CollnetChain on some node topologies
Fix hang when interleaving the capture of different graphs
Fix hang during init in multi-threaded mode
Fix potential data corruption with LL128 protocol on unaligned buffers.
Fix CPU usage during preconnect
Fixes double-free in the error path for ncclCommInitAll
Workaround hang on H100 with Ring/LL128 on 2 GPUs.
2022-10-25 00:55:55 -07:00
Wenkai Du
4f0e223db4
Merge remote-tracking branch 'nccl/master' into develop
2022-10-20 15:41:29 +00:00
Wenkai Du
bc8ef779df
Fix missing initialization due to merge error ( #640 )
2022-10-19 21:20:11 -07:00
Wenkai Du
fc554a2428
topo_expl: fix compilation error ( #639 )
2022-10-19 14:19:50 -07:00
gilbertlee-amd
10dbd2a452
Fixing formatting for copywrite ( #638 )
2022-10-19 13:43:21 -06:00
Wenkai Du
9ddf0e0649
Support P2P with invisible devices ( #636 )
...
* Support P2P with invisible devices
* Update copyright year
2022-10-17 10:24:59 -07:00
Wenkai Du
9916a09818
Merge pull request #634 from yzygitzh/ziyyang/npkit-fix
...
Apply several fixes to NPKit
2022-10-17 08:01:24 -07:00
raramakr
b32f38126d
Merge pull request #635 from raramakr/swdev
...
Correct include and library path for new directory layout
2022-10-14 15:48:44 -07:00
gilbertlee-amd
ebb8b5bf63
Updating files for missing licenses ( #637 )
2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan
cf4e963aaf
Correct include and library path for new directory layout
...
Use actual header files and libraries , rather than using wrapper header files and library softlinks
2022-10-14 01:32:04 -07:00
Ziyue Yang
7d6bbc19d4
apply npkit
2022-10-14 01:28:17 +00:00
Sylvain Jeaugey
d128d62238
Merge tag 'v2.15.1-1'
2022-10-07 11:00:26 -07:00
Edgar Gabriel
4972c129e3
Merge pull request #633 from edgargabriel/topic/topo-binary-tree
...
introduce a hw topology aware bintree
2022-10-05 17:06:54 -05:00
John Bachan
2401f4a918
Fixes a double-free in the error path of ncclCommInitAll.
...
Fixes https://github.com/NVIDIA/nccl/issues/726
2022-10-03 17:12:32 -07:00
Edgar Gabriel
e645b02cd8
introduce a hw topology aware bintree
...
for hayabusa architecture.
2022-10-03 15:26:21 +00:00
gilbertlee-amd
bd7d589446
Removing TransferBench from tools ( #632 )
...
Point to new TransferBench repo
2022-09-30 11:53:32 -06:00
akolliasAMD
ef71550738
Added new gpu targets ( #631 )
2022-09-29 14:53:55 -06:00
Wenkai Du
a523b37ac7
Another threadfence and flags rework ( #629 )
2022-09-28 16:49:29 -07:00
Wenkai Du
021932b3c8
Add LL128 tuning ( #630 )
2022-09-27 09:39:09 -07:00
Sylvain Jeaugey
da8152e57a
2.15.1-1
...
Add support for H100 (sm90).
Make sure NCCL kernel honor user stream priorities.
2022-09-27 02:31:13 -07:00
Sylvain Jeaugey
99c28f2e75
Merge remote-tracking branch 'origin/master'
2022-09-27 02:24:41 -07:00
Cliff Woolley
78313a6d21
Use compatibility shim only with static cudart
...
Closes issue 658
2022-09-27 02:22:48 -07:00
Wen-Heng (Jack) Chung
e8af0716c4
Merge pull request #619 from whchung/exp_reduce_code_size
...
Only use split tree algorithm to reduce kernel code size.
2022-09-26 10:06:27 -05:00