gilbertlee-amd
80ed608a9d
Multi stream unit test ( #693 )
...
* Adding multi-stream support to unit tests
2023-02-23 13:28:50 -07:00
Wenkai Du
d601c4909c
Merge pull request #685 from ROCmSoftwarePlatform/2.16.5
...
Sync up to NCCL 2.16.5
2023-02-22 10:29:02 -08:00
gilbertlee-amd
f63d3b1978
Adding UnitTest timing summary (UT_SHOW_TIMING) ( #692 )
2023-02-22 08:57:13 -07:00
akolliasAMD
d119c0886e
UnitTests: made reduceScatter run a smaller amount of tests ( #691 )
2023-02-21 16:21:24 -07:00
Wenkai Du
86e7b71234
Fix P2P scheduling ( #690 )
2023-02-21 07:49:54 -08:00
gilbertlee-amd
a640c6983f
Unit test fail check ( #689 )
...
* Adding fall-through on unit test failure
* Workaround for hipGraph validity check issue
2023-02-18 08:50:46 -08:00
Wenkai Du
1c166046a2
Add back __syncthreads() in barrier and adjust stack size ( #688 )
2023-02-18 08:50:31 -08:00
Ziyue Yang
f4bf47f325
NPKit: improve clock calibration and fix GPU clock API ( #683 )
...
* Improve clock calibration in NPKit
* Improve gfx macro
* Fix macro
2023-02-17 12:26:57 -07:00
Wenkai Du
aee7b42bb8
Merge remote-tracking branch 'nccl/master' into HEAD
2023-02-14 17:14:13 -08:00
Wenkai Du
f7a456122c
Remove workaround and use indirect function call ( #684 )
2023-02-14 13:59:48 -08:00
Wenkai Du
9461a43168
Merge pull request #681 from wenkaidu/gfx9
...
Add HIP event optimization and remove special code for gfx90a
2023-02-13 08:04:59 -08:00
Pedram Alizadeh
f525b8e1e6
Adding -pthread flag into CMakeLists.txt ( #682 )
...
Adding -pthread flag for linking issues into CMakeLists.txt
2023-02-10 17:22:30 -05:00
Pedram Alizadeh
e05560ea82
Updated RCCL introduction at docs/source/library.rst ( #680 )
...
* Updated RCCL introduction at docs/source/library.rst
* space after period
---------
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com >
2023-02-10 17:20:54 -05:00
Wenkai Du
39534e8724
Add HIP event optimization and remove special code for gfx90a
2023-02-10 16:46:01 +00:00
gilbertlee-amd
df46645ff8
Switching to relaxed capture for unit tests ( #679 )
2023-02-08 11:28:58 -07:00
Pedram Alizadeh
0df82bd8a3
Update library.rst ( #678 )
2023-02-06 16:28:59 -05:00
Wenkai Du
11567e5157
Merge pull request #671 from ROCmSoftwarePlatform/2.16.2
...
Sync up with NCCL 2.16.2
2023-02-04 06:54:30 -08:00
Wenkai Du
e1cb45ff22
Merge remote-tracking branch 'nccl/master' into HEAD
2023-02-04 01:44:43 +00:00
Pedram Alizadeh
fddb5e6be8
UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) ( #674 )
2023-02-03 17:36:30 -05:00
Sylvain Jeaugey
f3d5166783
2.16.5-1
...
Add support for 400Gbit NDR network adapters (CX7)
Handle EINTR in socket poll() function
Add NCCL_PROGRESS_APPENDOP_FREQ to control op append overhead
Resource cleanup fixes
Fix double free in case of init failure
Fix crash in ncclCommAbort
Revert AMD speed commit
2023-02-02 12:52:47 -08:00
Pedram Alizadeh
fbe52b6caa
removed the wrapper script so that the old name is no longer referenced ( #676 )
2023-01-31 11:11:02 -05:00
Eiden Yoshida
513bc6912a
CI: decrease precheckin and extended test timeouts ( #675 )
2023-01-21 19:32:21 -07:00
Rashika Kheria
93840e7476
Fix maximum handle size for NCCL Net v4 API
...
NCCL Net v4 supports a maximum handle size of 64 bytes whereas the
ext-net example header files set it for NCCL Net v3. Since,
`aws-ofi-nccl` plugin plans to follow the example header files, fix it
here.
Signed-off-by: Rashika Kheria <rashika@amazon.com >
2023-01-18 13:31:57 +01:00
Wenkai Du
a0dd8e0b84
topo_expl: fix broken build by adding hipify steps ( #670 )
2023-01-06 07:29:40 -08:00
gilbertlee-amd
3e8ab4e46e
Updating CHANGELOG ( #669 )
2022-12-19 09:36:57 -07:00
Wenkai Du
2288e9ae80
Switch to hipLaunchHostFunc for HIP graph ( #667 )
2022-12-15 10:16:46 -08:00
akolliasAMD
24aa8bd802
added a different way for getting device count, by running it in a child process ( #665 )
2022-12-14 16:10:14 -07:00
Pedram Alizadeh
54a3da04eb
Revert "UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) ( #662 )" ( #666 )
...
This reverts commit 8250092367 .
2022-12-14 11:28:40 -05:00
Pedram Alizadeh
1a8cd9791e
Merge pull request #664 from PedramAlizadeh/rccl-UnitTests_name_change
...
Changed the name of UnitTests to rccl-UnitTests (wrapper executable i…
2022-12-13 17:16:22 -05:00
PedramAlizadeh
45872d170f
Changed the name of UnitTests to rccl-UnitTests (wrapper executable included).
2022-12-13 21:45:57 +00:00
Pedram Alizadeh
8250092367
UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) ( #662 )
2022-12-13 16:05:09 -05:00
Ziyue Yang
adafc0f759
Add MSCCL Support ( #658 )
...
* Add MSCCL support
* Add alignment and message size checking
* Fix nRanks checking, in-place and out-of-place tests and group call handling
* Fix hipGraph unit test
* Change MSCCL init warning to INFO
* Revise license info
2022-12-12 15:51:04 -08:00
Wenkai Du
b953544a59
Fix typo in detecting Intel platforms ( #661 )
2022-12-07 13:36:11 -08:00
akolliasAMD
eca623df07
decreased warp size for gfx110x ( #655 )
2022-12-01 12:19:21 -07:00
gilbertlee-amd
faed69f9fc
Graph unit tests ( #656 )
...
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Wenkai Du
aebed537a5
Reduce linking time through more parallel jobs ( #657 )
2022-11-30 16:06:03 -08:00
Wenkai Du
fb9938cffa
Query DMABuf support through HSA runtime API ( #654 )
2022-11-30 08:53:03 -08:00
Sylvain Jeaugey
28189e2df8
2.16.2-1
...
Add support for CUDA 12.0, drop Kepler (sm_35).
Support for H100 features.
Make socket code more robust and protected. Solves #555 .
Improve performance on large CUDA graphs, reducing dependencies.
Reduce inter-socket bandwidth on AMD CPUs to favor better paths.
Various fixes to ncclCommAbort.
Make service thread polling resistant to EINTR.
Compile with profiling API by default.
Extend NVTX instrumentation with call arguments.
2022-11-30 02:31:59 -08:00
Wenkai Du
9594bbee3b
Adjust P2P channels on Intel platform ( #653 )
2022-11-29 13:57:10 -08:00
akolliasAMD
11862f67de
removed cmake HIP_CLANG_PATCH_LEVEL check ( #652 )
...
* removed HIP_CLANG_PATCH_LEVEL check
2022-11-29 09:48:59 -07:00
Wenkai Du
67d9327f52
Merge pull request #651 from wenkaidu/nccl_sync
...
Sync up with NCCL
2022-11-28 17:33:59 -08:00
Wenkai Du
bf03a48289
Merge remote-tracking branch 'nccl/master' into HEAD
2022-11-28 09:46:16 -08:00
gilbertlee-amd
36ac8107bd
Update CHANGELOG up to ROCm 5.4 ( #649 )
...
* Update CHANGELOG for ROCm 5.4.0
2022-11-23 09:40:19 -07:00
Sylvain Jeaugey
614b49f0de
Fix google-fastsocket plugin build
2022-11-22 02:13:13 -08:00
Sylvain Jeaugey
55b1d8ab98
Add documentation for NCCL NET plugins
...
Also repurpose dummy plugin as example, including headers and
compat layers from v6 to v2.
2022-11-22 02:12:53 -08:00
Wenkai Du
57764f8152
Fix incorrect rocm-smi ID conversion ( #648 )
2022-11-21 19:44:39 -08:00
Wenkai Du
9cb72a3d0f
Fix collective trace timestamp format ( #647 )
2022-11-21 08:11:12 -08:00
Wenkai Du
cf3c32a626
Fix typo in previous hipify change ( #645 )
2022-11-15 11:51:47 -08:00
Wenkai Du
b4f6eee9b4
Merge pull request #643 from ROCmSoftwarePlatform/2.15.5
...
Sync up with NCCL 2.15.5
2022-11-15 08:40:59 -08:00
Wenkai Du
562dd87036
Move hipify to cmake stage
...
Add minimal ROCm/HIP version requirements for Graph support
2022-11-14 18:10:45 +00:00