rocm-systems

Автор	SHA1	Сообщение	Дата
Wenkai Du	8bb3340fcb	Skip checking of some settings in Cray OS (#739 )	2023-05-09 07:59:56 -07:00
Wenkai Du	897745a266	Remove references to NVLS functions	2023-05-05 07:55:20 -07:00
Wenkai Du	53a1f91857	Merge remote-tracking branch 'nccl/master' into develop	2023-04-25 15:38:32 -07:00
Wenkai Du	36e453c61e	Ensure memory copy integrity during transport setup (#731 )	2023-04-25 14:41:43 -07:00
Wenkai Du	4b09ffba43	msccl: print stack and memory usage (#723 ) * msccl: print stack and memory usage * Update number of kernels calculation	2023-04-14 14:59:03 -07:00
Kaiming Ouyang	006b6bc7dc	Add a comment to shutdown() in ncclSocketClose	2023-04-13 09:13:44 -07:00
Kaiming Ouyang	367e9b61c3	Shutdown socket before close in ncclSocketClose()	2023-04-13 09:11:52 -07:00
Ziyue Yang	7289c05146	MSCCL: Fix memcpy bug (#721 )	2023-04-11 14:46:53 -07:00
Ziyue Yang	c8e33b1232	fix msccl stream usage (#717 )	2023-03-24 10:59:36 -07:00
Wenkai Du	b02fd04165	Fix unit test HIP graph error (#712 )	2023-03-20 15:34:09 -07:00
Ziyue Yang	e3b2342f39	MSCCL: Improve executor and integrate scheduler (#694 ) * MSCCL: improve executor and add scheduler for testing * Use external scheduler * Fix cmake error * Address comments * Fix thread safe issue * Make MSCCL lifecycle APIs thread safe * Make MSCCL internal scheduler aware of topology hint * Revise error message	2023-03-14 14:34:25 -07:00
Wenkai Du	22b81fbaae	Fix XGMI detection (#699 ) * Fix XGMI detection * Increase stack size * Temporarily disable signal hangler in CI [Process: 17281] Inside handler function signal: Segmentation fault (11) BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0) BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)	2023-03-08 14:08:07 -08:00
Wenkai Du	79a2031951	Warn user on incorrect system settings (#696 ) * Warn user on incorrect system settings * Fix typo * Add possible impact * Ignore iommu settings in VM	2023-03-06 08:17:06 -08:00
Sylvain Jeaugey	5d3ab08b69	2.17.1-1 Add new NVLS algorithm for allreduce using NVLink SHARP (intra-node only). Add new config options: cgaClusterSize, minCTAs, maxCTAs, netName. Enable LL128 when we use PXN to close rings. NVTX3 includes update. Fix crash when one CollNet (SHARP) rail fails to initialize.	2023-03-01 00:39:04 -08:00
Wenkai Du	d601c4909c	Merge pull request #685 from ROCmSoftwarePlatform/2.16.5 Sync up to NCCL 2.16.5	2023-02-22 10:29:02 -08:00
Wenkai Du	86e7b71234	Fix P2P scheduling (#690 )	2023-02-21 07:49:54 -08:00
Wenkai Du	1c166046a2	Add back __syncthreads() in barrier and adjust stack size (#688 )	2023-02-18 08:50:31 -08:00
Ziyue Yang	f4bf47f325	NPKit: improve clock calibration and fix GPU clock API (#683 ) * Improve clock calibration in NPKit * Improve gfx macro * Fix macro	2023-02-17 12:26:57 -07:00
Wenkai Du	aee7b42bb8	Merge remote-tracking branch 'nccl/master' into HEAD	2023-02-14 17:14:13 -08:00
Wenkai Du	f7a456122c	Remove workaround and use indirect function call (#684 )	2023-02-14 13:59:48 -08:00
Wenkai Du	39534e8724	Add HIP event optimization and remove special code for gfx90a	2023-02-10 16:46:01 +00:00
Wenkai Du	e1cb45ff22	Merge remote-tracking branch 'nccl/master' into HEAD	2023-02-04 01:44:43 +00:00
Sylvain Jeaugey	f3d5166783	2.16.5-1 Add support for 400Gbit NDR network adapters (CX7) Handle EINTR in socket poll() function Add NCCL_PROGRESS_APPENDOP_FREQ to control op append overhead Resource cleanup fixes Fix double free in case of init failure Fix crash in ncclCommAbort Revert AMD speed commit	2023-02-02 12:52:47 -08:00
Wenkai Du	2288e9ae80	Switch to hipLaunchHostFunc for HIP graph (#667 )	2022-12-15 10:16:46 -08:00
Ziyue Yang	adafc0f759	Add MSCCL Support (#658 ) * Add MSCCL support * Add alignment and message size checking * Fix nRanks checking, in-place and out-of-place tests and group call handling * Fix hipGraph unit test * Change MSCCL init warning to INFO * Revise license info	2022-12-12 15:51:04 -08:00
Wenkai Du	b953544a59	Fix typo in detecting Intel platforms (#661 )	2022-12-07 13:36:11 -08:00
akolliasAMD	eca623df07	decreased warp size for gfx110x (#655 )	2022-12-01 12:19:21 -07:00
Wenkai Du	fb9938cffa	Query DMABuf support through HSA runtime API (#654 )	2022-11-30 08:53:03 -08:00
Sylvain Jeaugey	28189e2df8	2.16.2-1 Add support for CUDA 12.0, drop Kepler (sm_35). Support for H100 features. Make socket code more robust and protected. Solves #555. Improve performance on large CUDA graphs, reducing dependencies. Reduce inter-socket bandwidth on AMD CPUs to favor better paths. Various fixes to ncclCommAbort. Make service thread polling resistant to EINTR. Compile with profiling API by default. Extend NVTX instrumentation with call arguments.	2022-11-30 02:31:59 -08:00
Wenkai Du	9594bbee3b	Adjust P2P channels on Intel platform (#653 )	2022-11-29 13:57:10 -08:00
Wenkai Du	57764f8152	Fix incorrect rocm-smi ID conversion (#648 )	2022-11-21 19:44:39 -08:00
Wenkai Du	9cb72a3d0f	Fix collective trace timestamp format (#647 )	2022-11-21 08:11:12 -08:00
Wenkai Du	cf3c32a626	Fix typo in previous hipify change (#645 )	2022-11-15 11:51:47 -08:00
Wenkai Du	562dd87036	Move hipify to cmake stage Add minimal ROCm/HIP version requirements for Graph support	2022-11-14 18:10:45 +00:00
Wenkai Du	94ad7f6f51	Update tuning table and fix topo_expl	2022-11-07 18:24:24 +00:00
Wenkai Du	9a077e6947	Merge remote-tracking branch 'nccl/master' into develop	2022-11-03 21:17:42 +00:00
Wenkai Du	72ef100050	Fix P2P scheduling	2022-10-31 08:54:34 -07:00
Sylvain Jeaugey	2f4cb874ba	Merge tag 'v2.15.5-1'	2022-10-25 01:15:22 -07:00
Sylvain Jeaugey	cb111f764a	2.15.5-1 Fix crash with CollnetChain on some node topologies Fix hang when interleaving the capture of different graphs Fix hang during init in multi-threaded mode Fix potential data corruption with LL128 protocol on unaligned buffers. Fix CPU usage during preconnect Fixes double-free in the error path for ncclCommInitAll Workaround hang on H100 with Ring/LL128 on 2 GPUs.	2022-10-25 00:55:55 -07:00
Wenkai Du	4f0e223db4	Merge remote-tracking branch 'nccl/master' into develop	2022-10-20 15:41:29 +00:00
Wenkai Du	bc8ef779df	Fix missing initialization due to merge error (#640 )	2022-10-19 21:20:11 -07:00
Wenkai Du	9ddf0e0649	Support P2P with invisible devices (#636 ) * Support P2P with invisible devices * Update copyright year	2022-10-17 10:24:59 -07:00
Wenkai Du	9916a09818	Merge pull request #634 from yzygitzh/ziyyang/npkit-fix Apply several fixes to NPKit	2022-10-17 08:01:24 -07:00
gilbertlee-amd	ebb8b5bf63	Updating files for missing licenses (#637 )	2022-10-14 13:49:16 -06:00
Ziyue Yang	7d6bbc19d4	apply npkit	2022-10-14 01:28:17 +00:00
Sylvain Jeaugey	d128d62238	Merge tag 'v2.15.1-1'	2022-10-07 11:00:26 -07:00
John Bachan	2401f4a918	Fixes a double-free in the error path of ncclCommInitAll. Fixes https://github.com/NVIDIA/nccl/issues/726	2022-10-03 17:12:32 -07:00
Edgar Gabriel	e645b02cd8	introduce a hw topology aware bintree for hayabusa architecture.	2022-10-03 15:26:21 +00:00
akolliasAMD	ef71550738	Added new gpu targets (#631 )	2022-09-29 14:53:55 -06:00
Wenkai Du	a523b37ac7	Another threadfence and flags rework (#629 )	2022-09-28 16:49:29 -07:00

1 2 3 4 5 ...

494 Коммитов