Граф коммитов

947 Коммитов

Автор SHA1 Сообщение Дата
PedramAlizadeh 45872d170f Changed the name of UnitTests to rccl-UnitTests (wrapper executable included). 2022-12-13 21:45:57 +00:00
Pedram Alizadeh 8250092367 UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) (#662) 2022-12-13 16:05:09 -05:00
Ziyue Yang adafc0f759 Add MSCCL Support (#658)
* Add MSCCL support

* Add alignment and message size checking

* Fix nRanks checking, in-place and out-of-place tests and group call handling

* Fix hipGraph unit test

* Change MSCCL init warning to INFO

* Revise license info
2022-12-12 15:51:04 -08:00
Wenkai Du b953544a59 Fix typo in detecting Intel platforms (#661) 2022-12-07 13:36:11 -08:00
akolliasAMD eca623df07 decreased warp size for gfx110x (#655) 2022-12-01 12:19:21 -07:00
gilbertlee-amd faed69f9fc Graph unit tests (#656)
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Wenkai Du aebed537a5 Reduce linking time through more parallel jobs (#657) 2022-11-30 16:06:03 -08:00
Wenkai Du fb9938cffa Query DMABuf support through HSA runtime API (#654) 2022-11-30 08:53:03 -08:00
Wenkai Du 9594bbee3b Adjust P2P channels on Intel platform (#653) 2022-11-29 13:57:10 -08:00
akolliasAMD 11862f67de removed cmake HIP_CLANG_PATCH_LEVEL check (#652)
* removed HIP_CLANG_PATCH_LEVEL check
2022-11-29 09:48:59 -07:00
Wenkai Du 67d9327f52 Merge pull request #651 from wenkaidu/nccl_sync
Sync up with NCCL
2022-11-28 17:33:59 -08:00
Wenkai Du bf03a48289 Merge remote-tracking branch 'nccl/master' into HEAD 2022-11-28 09:46:16 -08:00
gilbertlee-amd 36ac8107bd Update CHANGELOG up to ROCm 5.4 (#649)
* Update CHANGELOG for ROCm 5.4.0
2022-11-23 09:40:19 -07:00
Sylvain Jeaugey 614b49f0de Fix google-fastsocket plugin build 2022-11-22 02:13:13 -08:00
Sylvain Jeaugey 55b1d8ab98 Add documentation for NCCL NET plugins
Also repurpose dummy plugin as example, including headers and
compat layers from v6 to v2.
2022-11-22 02:12:53 -08:00
Wenkai Du 57764f8152 Fix incorrect rocm-smi ID conversion (#648) 2022-11-21 19:44:39 -08:00
Wenkai Du 9cb72a3d0f Fix collective trace timestamp format (#647) 2022-11-21 08:11:12 -08:00
Wenkai Du cf3c32a626 Fix typo in previous hipify change (#645) 2022-11-15 11:51:47 -08:00
Wenkai Du b4f6eee9b4 Merge pull request #643 from ROCmSoftwarePlatform/2.15.5
Sync up with NCCL 2.15.5
2022-11-15 08:40:59 -08:00
Wenkai Du 562dd87036 Move hipify to cmake stage
Add minimal ROCm/HIP version requirements for Graph support
2022-11-14 18:10:45 +00:00
raramakr ca05b3d8d4 Merge pull request #644 from raramakr/swdev-reorg
Correct hsa header path for new directory layout
2022-11-10 10:06:39 -08:00
Wenkai Du 94ad7f6f51 Update tuning table and fix topo_expl 2022-11-07 18:24:24 +00:00
Ranjith Ramakrishnan b397cb16ea Correct hsa header path for new directory layout 2022-11-04 09:52:16 -07:00
Wenkai Du 9a077e6947 Merge remote-tracking branch 'nccl/master' into develop 2022-11-03 21:17:42 +00:00
Wenkai Du effc4b255b Merge pull request #641 from ROCmSoftwarePlatform/2.14.3
Sync up with NCCL 2.14.3
2022-11-02 08:31:12 -07:00
Wenkai Du 72ef100050 Fix P2P scheduling 2022-10-31 08:54:34 -07:00
Sylvain Jeaugey 2f4cb874ba Merge tag 'v2.15.5-1' 2022-10-25 01:15:22 -07:00
Sylvain Jeaugey cb111f764a 2.15.5-1
Fix crash with CollnetChain on some node topologies
Fix hang when interleaving the capture of different graphs
Fix hang during init in multi-threaded mode
Fix potential data corruption with LL128 protocol on unaligned buffers.
Fix CPU usage during preconnect
Fixes double-free in the error path for ncclCommInitAll
Workaround hang on H100 with Ring/LL128 on 2 GPUs.
2022-10-25 00:55:55 -07:00
Wenkai Du 4f0e223db4 Merge remote-tracking branch 'nccl/master' into develop 2022-10-20 15:41:29 +00:00
Wenkai Du bc8ef779df Fix missing initialization due to merge error (#640) 2022-10-19 21:20:11 -07:00
Wenkai Du fc554a2428 topo_expl: fix compilation error (#639) 2022-10-19 14:19:50 -07:00
gilbertlee-amd 10dbd2a452 Fixing formatting for copywrite (#638) 2022-10-19 13:43:21 -06:00
Wenkai Du 9ddf0e0649 Support P2P with invisible devices (#636)
* Support P2P with invisible devices

* Update copyright year
2022-10-17 10:24:59 -07:00
Wenkai Du 9916a09818 Merge pull request #634 from yzygitzh/ziyyang/npkit-fix
Apply several fixes to NPKit
2022-10-17 08:01:24 -07:00
raramakr b32f38126d Merge pull request #635 from raramakr/swdev
Correct include and library path for new directory layout
2022-10-14 15:48:44 -07:00
gilbertlee-amd ebb8b5bf63 Updating files for missing licenses (#637) 2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan cf4e963aaf Correct include and library path for new directory layout
Use actual header files and libraries , rather than using wrapper header files and library softlinks
2022-10-14 01:32:04 -07:00
Ziyue Yang 7d6bbc19d4 apply npkit 2022-10-14 01:28:17 +00:00
Sylvain Jeaugey d128d62238 Merge tag 'v2.15.1-1' 2022-10-07 11:00:26 -07:00
Edgar Gabriel 4972c129e3 Merge pull request #633 from edgargabriel/topic/topo-binary-tree
introduce a hw topology aware bintree
2022-10-05 17:06:54 -05:00
John Bachan 2401f4a918 Fixes a double-free in the error path of ncclCommInitAll.
Fixes https://github.com/NVIDIA/nccl/issues/726
2022-10-03 17:12:32 -07:00
Edgar Gabriel e645b02cd8 introduce a hw topology aware bintree
for hayabusa architecture.
2022-10-03 15:26:21 +00:00
gilbertlee-amd bd7d589446 Removing TransferBench from tools (#632)
Point to new TransferBench repo
2022-09-30 11:53:32 -06:00
akolliasAMD ef71550738 Added new gpu targets (#631) 2022-09-29 14:53:55 -06:00
Wenkai Du a523b37ac7 Another threadfence and flags rework (#629) 2022-09-28 16:49:29 -07:00
Wenkai Du 021932b3c8 Add LL128 tuning (#630) 2022-09-27 09:39:09 -07:00
Sylvain Jeaugey da8152e57a 2.15.1-1
Add support for H100 (sm90).
Make sure NCCL kernel honor user stream priorities.
2022-09-27 02:31:13 -07:00
Sylvain Jeaugey 99c28f2e75 Merge remote-tracking branch 'origin/master' 2022-09-27 02:24:41 -07:00
Cliff Woolley 78313a6d21 Use compatibility shim only with static cudart
Closes issue 658
2022-09-27 02:22:48 -07:00
Wen-Heng (Jack) Chung e8af0716c4 Merge pull request #619 from whchung/exp_reduce_code_size
Only use split tree algorithm to reduce kernel code size.
2022-09-26 10:06:27 -05:00