نمودار کامیت

502 کامیت‌ها

مولف SHA1 پیام تاریخ
Ziyue Yang 7d6e7bcd7d revert npkit (#748) 2023-05-24 07:41:05 -07:00
Ziyue Yang ed252c30f4 Limit MSCCL reduce unrolling to pow-2 cases to shrink kernel size (#746) 2023-05-19 11:46:36 -07:00
Ziyue Yang 11676267b5 fix min, max and avg (#745) 2023-05-18 11:02:59 -07:00
Wen-Heng (Jack) Chung eba4e9e100 Merge pull request #742 from whchung/skip_done_event_msccl
Allow skipping doneEvent inside MSCCL.
2023-05-18 10:17:20 -05:00
Wenkai Du 403cda6322 Fix merge error (#744) 2023-05-18 08:09:27 -07:00
Wen-Heng (Jack) Chung ca4a1dfd67 Address review feedbacks and make the flag be disabled by default. 2023-05-17 17:50:25 +00:00
Wen-Heng (Jack) Chung 12dba425de Skip doneEvent inside MSCCL by default.
Added a RCCL_MSCCL_ENABLE_DONE_EVENT env var, set it be 0 by default.

The env var is to control whether to use doneEvent when invoking MSCCL
kernels.

Skipping doneEvent would cause the firmware to skip L2 cache flush,
resulting in overall performance improvement.
2023-05-17 16:49:42 +00:00
Wenkai Du 4ca7742c61 Revert "Ensure memory copy integrity during transport setup (#731)" (#741)
* Revert "Ensure memory copy integrity during transport setup (#731)"

This reverts commit 36e453c61e.

Add stream synchronization in ncclStrongStreamRelease.

* Use event record and wait
2023-05-16 10:34:47 -07:00
Wenkai Du 8bb3340fcb Skip checking of some settings in Cray OS (#739) 2023-05-09 07:59:56 -07:00
Wenkai Du 897745a266 Remove references to NVLS functions 2023-05-05 07:55:20 -07:00
Wenkai Du 53a1f91857 Merge remote-tracking branch 'nccl/master' into develop 2023-04-25 15:38:32 -07:00
Wenkai Du 36e453c61e Ensure memory copy integrity during transport setup (#731) 2023-04-25 14:41:43 -07:00
Wenkai Du 4b09ffba43 msccl: print stack and memory usage (#723)
* msccl: print stack and memory usage

* Update number of kernels calculation
2023-04-14 14:59:03 -07:00
Kaiming Ouyang 006b6bc7dc Add a comment to shutdown() in ncclSocketClose 2023-04-13 09:13:44 -07:00
Kaiming Ouyang 367e9b61c3 Shutdown socket before close in ncclSocketClose() 2023-04-13 09:11:52 -07:00
Ziyue Yang 7289c05146 MSCCL: Fix memcpy bug (#721) 2023-04-11 14:46:53 -07:00
Ziyue Yang c8e33b1232 fix msccl stream usage (#717) 2023-03-24 10:59:36 -07:00
Wenkai Du b02fd04165 Fix unit test HIP graph error (#712) 2023-03-20 15:34:09 -07:00
Ziyue Yang e3b2342f39 MSCCL: Improve executor and integrate scheduler (#694)
* MSCCL: improve executor and add scheduler for testing

* Use external scheduler

* Fix cmake error

* Address comments

* Fix thread safe issue

* Make MSCCL lifecycle APIs thread safe

* Make MSCCL internal scheduler aware of topology hint

* Revise error message
2023-03-14 14:34:25 -07:00
Wenkai Du 22b81fbaae Fix XGMI detection (#699)
* Fix XGMI detection

* Increase stack size

* Temporarily disable signal hangler in CI

[Process: 17281] Inside handler function signal: Segmentation fault (11)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
2023-03-08 14:08:07 -08:00
Wenkai Du 79a2031951 Warn user on incorrect system settings (#696)
* Warn user on incorrect system settings

* Fix typo

* Add possible impact

* Ignore iommu settings in VM
2023-03-06 08:17:06 -08:00
Sylvain Jeaugey 5d3ab08b69 2.17.1-1
Add new NVLS algorithm for allreduce using NVLink SHARP (intra-node only).
Add new config options: cgaClusterSize, minCTAs, maxCTAs, netName.
Enable LL128 when we use PXN to close rings.
NVTX3 includes update.
Fix crash when one CollNet (SHARP) rail fails to initialize.
2023-03-01 00:39:04 -08:00
Wenkai Du d601c4909c Merge pull request #685 from ROCmSoftwarePlatform/2.16.5
Sync up to NCCL 2.16.5
2023-02-22 10:29:02 -08:00
Wenkai Du 86e7b71234 Fix P2P scheduling (#690) 2023-02-21 07:49:54 -08:00
Wenkai Du 1c166046a2 Add back __syncthreads() in barrier and adjust stack size (#688) 2023-02-18 08:50:31 -08:00
Ziyue Yang f4bf47f325 NPKit: improve clock calibration and fix GPU clock API (#683)
* Improve clock calibration in NPKit

* Improve gfx macro

* Fix macro
2023-02-17 12:26:57 -07:00
Wenkai Du aee7b42bb8 Merge remote-tracking branch 'nccl/master' into HEAD 2023-02-14 17:14:13 -08:00
Wenkai Du f7a456122c Remove workaround and use indirect function call (#684) 2023-02-14 13:59:48 -08:00
Wenkai Du 39534e8724 Add HIP event optimization and remove special code for gfx90a 2023-02-10 16:46:01 +00:00
Wenkai Du e1cb45ff22 Merge remote-tracking branch 'nccl/master' into HEAD 2023-02-04 01:44:43 +00:00
Sylvain Jeaugey f3d5166783 2.16.5-1
Add support for 400Gbit NDR network adapters (CX7)
Handle EINTR in socket poll() function
Add NCCL_PROGRESS_APPENDOP_FREQ to control op append overhead
Resource cleanup fixes
Fix double free in case of init failure
Fix crash in ncclCommAbort
Revert AMD speed commit
2023-02-02 12:52:47 -08:00
Wenkai Du 2288e9ae80 Switch to hipLaunchHostFunc for HIP graph (#667) 2022-12-15 10:16:46 -08:00
Ziyue Yang adafc0f759 Add MSCCL Support (#658)
* Add MSCCL support

* Add alignment and message size checking

* Fix nRanks checking, in-place and out-of-place tests and group call handling

* Fix hipGraph unit test

* Change MSCCL init warning to INFO

* Revise license info
2022-12-12 15:51:04 -08:00
Wenkai Du b953544a59 Fix typo in detecting Intel platforms (#661) 2022-12-07 13:36:11 -08:00
akolliasAMD eca623df07 decreased warp size for gfx110x (#655) 2022-12-01 12:19:21 -07:00
Wenkai Du fb9938cffa Query DMABuf support through HSA runtime API (#654) 2022-11-30 08:53:03 -08:00
Sylvain Jeaugey 28189e2df8 2.16.2-1
Add support for CUDA 12.0, drop Kepler (sm_35).
Support for H100 features.
Make socket code more robust and protected. Solves #555.
Improve performance on large CUDA graphs, reducing dependencies.
Reduce inter-socket bandwidth on AMD CPUs to favor better paths.
Various fixes to ncclCommAbort.
Make service thread polling resistant to EINTR.
Compile with profiling API by default.
Extend NVTX instrumentation with call arguments.
2022-11-30 02:31:59 -08:00
Wenkai Du 9594bbee3b Adjust P2P channels on Intel platform (#653) 2022-11-29 13:57:10 -08:00
Wenkai Du 57764f8152 Fix incorrect rocm-smi ID conversion (#648) 2022-11-21 19:44:39 -08:00
Wenkai Du 9cb72a3d0f Fix collective trace timestamp format (#647) 2022-11-21 08:11:12 -08:00
Wenkai Du cf3c32a626 Fix typo in previous hipify change (#645) 2022-11-15 11:51:47 -08:00
Wenkai Du 562dd87036 Move hipify to cmake stage
Add minimal ROCm/HIP version requirements for Graph support
2022-11-14 18:10:45 +00:00
Wenkai Du 94ad7f6f51 Update tuning table and fix topo_expl 2022-11-07 18:24:24 +00:00
Wenkai Du 9a077e6947 Merge remote-tracking branch 'nccl/master' into develop 2022-11-03 21:17:42 +00:00
Wenkai Du 72ef100050 Fix P2P scheduling 2022-10-31 08:54:34 -07:00
Sylvain Jeaugey 2f4cb874ba Merge tag 'v2.15.5-1' 2022-10-25 01:15:22 -07:00
Sylvain Jeaugey cb111f764a 2.15.5-1
Fix crash with CollnetChain on some node topologies
Fix hang when interleaving the capture of different graphs
Fix hang during init in multi-threaded mode
Fix potential data corruption with LL128 protocol on unaligned buffers.
Fix CPU usage during preconnect
Fixes double-free in the error path for ncclCommInitAll
Workaround hang on H100 with Ring/LL128 on 2 GPUs.
2022-10-25 00:55:55 -07:00
Wenkai Du 4f0e223db4 Merge remote-tracking branch 'nccl/master' into develop 2022-10-20 15:41:29 +00:00
Wenkai Du bc8ef779df Fix missing initialization due to merge error (#640) 2022-10-19 21:20:11 -07:00
Wenkai Du 9ddf0e0649 Support P2P with invisible devices (#636)
* Support P2P with invisible devices

* Update copyright year
2022-10-17 10:24:59 -07:00