rocm-systems

Автор	SHA1	Сообщение	Дата
corey-derochie-amd	6dc47eecd7	Integrated RCCL with MSCCL++ for small message sizes (#1231 )	2024-07-12 15:32:58 -06:00
Rahul Vaidya	c755b9cf93	Improved version reporting in NCCL_DEBUG=VERSION (#1232 ) * Improved version reporting in NCCL_DEBUG=VERSION. Signed-off-by: rahulvaidya20 <ravaidya@amd.com> * Version reporting changes Signed-off-by: rahulvaidya20 <ravaidya@amd.com> * Versioning changes: Initialized char arrays to null and fixed typo. --------- Signed-off-by: rahulvaidya20 <ravaidya@amd.com>	2024-07-12 08:14:29 -05:00
corey-derochie-amd	0c36d571ea	Enable multi-threading for MSCCL (#1203 ) MSCCL can now run in a multi-threaded configuration. To test in the unit tests, added the ENABLE_OPENMP compile definition flag and the --openmp-test-enable flag to the unit test build script. To activate, set the environment variables UT_MULTITHREADED=1 and UT_PROCESS_MASK=1. Set Jenkins to use this mode.	2024-07-04 09:34:38 -06:00
Wenkai Du	5d7078e383	Fix DMABUF support (#1218 ) * Fix DMABUF support * Reduce log output by moving dmabuf allocation details to TRACE * Enable peer memory GDR support if ib_umem_get_peer is in kernel	2024-06-25 08:00:15 -07:00
Nilesh M Negi	d9661c17e6	Fix min_nchannels bug for gfx94* nranks=4 (#1202 ) Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>	2024-06-07 14:31:28 -05:00
Nusrat Islam	0634c5c8e1	doubling debug buffer size with increased channels	2024-06-03 13:05:05 -05:00
Nusrat Islam	506f16c506	add 256 channels support	2024-06-03 13:03:18 -05:00
ClementLinCF	cab25f919e	Optimize NCHANNELS and MSCCL config for gfx942 80CUs (#1195 ) * Optimize NCHANNELS and MSCCL config for gfx942 80CUs Set appropriately for different NCCL_MIN_NCHANNELS and MSCCL config, potentially improving communication perf on the MI300x 80CUs * Delete tools/msccl-algorithms/allreduce_1step_mccl_8_2_16777216_LL.xml * Change the factor of gfx94 and update msccl config	2024-06-01 07:07:46 -07:00
AtlantaPepsi	67246649ac	prevent segfault from npkit-enabled rccl build Signed-off-by: AtlantaPepsi <timhu102@amd.com>	2024-04-26 10:54:27 -05:00
Wenkai Du	9e0c9b4ed8	Replace __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__ (#1154 )	2024-04-25 07:19:18 -07:00
BertanDogancay	e1a835910e	Merge remote-tracking branch 'nccl/master' into develop	2024-04-23 13:34:00 -07:00
gilbertlee-amd	4cb62f999a	Rail optimization for rings (#1140 ) - Modifies the ring creation algorithm to be friendlier to rail-optimized topologies (should not affect classic fabric topologies)	2024-04-15 12:03:57 -06:00
Wenkai Du	137571fa01	Fix buffer overflow when parsing kernel cmdline (#1133 )	2024-04-08 11:12:20 -07:00
Wenkai Du	5976f757dd	Remove hipEventDisableSystemFence (#1122 ) There is no indication that disabling system fence has any latency improvement. Removing it per recommendation from HIP.	2024-03-25 08:01:57 -07:00
Sylvain Jeaugey	48bb7fec79	2.20.5-1 Fix UDS connection failure when using ncclCommSplit. Issue #1185	2024-02-26 02:52:39 -08:00
Wenkai Du	c5ab37211b	Update RCCL/MSCCL work FIFO depth to 256K (#1091 )	2024-02-21 17:15:11 -08:00
Sylvain Jeaugey	b6475625fb	2.20.3-1 Add support for alternating rings, allow for cross-nic rings without cross-rail communication. Add support for user buffer registration for network send/recv. Optimize aggregated operations to better utilize all channels. Add flattening for BCM PCI gen5 switches. Add support for inter-node NVLink communication Add support for port fusion in NET/IB. Add support for ReduceScatter and AllGather using Collnet. Update net API to v8. Fix hang during A2A connection.	2024-02-13 04:22:38 -08:00
BertanDogancay	00fdb1ef51	Clean up	2024-01-31 17:27:15 -08:00
Wenkai Du	1a134b283b	Merge remote-tracking branch 'rccl/develop' into 2.19.4	2024-01-31 11:53:10 -06:00
BertanDogancay	9ff53eeeae	Merge remote-tracking branch 'nccl/master' into develop	2024-01-30 14:43:43 -08:00
Wenkai Du	be8ef4367f	colltrace: fix dropped trace messages (#1059 ) * colltrace: fix dropped trace messages * Remove extra space	2024-01-25 13:31:53 -08:00
BertanDogancay	81ddf9de89	Merge remote-tracking branch 'nccl/v2.19' into develop	2024-01-24 15:25:33 -08:00
Bertan Dogancay	c4dbf8a914	Fix collective trace when rccl is configured (#1056 ) * Fix collective trace when rccl is configured	2024-01-22 09:26:44 -07:00
Bertan Dogancay	28d9b170c9	[DEV] Configure functions in RCCL (#986 ) * configure functions in rccl	2024-01-18 15:07:16 -07:00
Pedram Alizadeh	aa5c84c997	Merge pull request #1022 from PedramAlizadeh/sync_nccl_2.18.6 Sync to nccl 2.18.6	2024-01-09 13:29:29 -05:00
Wenkai Du	f7e39fced2	Doubling buffer size to fix NCCL INFO corruption with increased channels (#1035 )	2024-01-08 08:14:33 -08:00
Wenkai Du	e5bf56c6d8	Increase stack size for gfx906 (#1034 ) Occationally "Memory access fault by GPU node-8 (Agent handle: 0x23a5640) on address 0x7f461ec00000. Reason: Page not present or supervisor privilege" can be seen from gfx906 CI	2024-01-07 20:25:02 -08:00
PedramAlizadeh	0d515f9388	resolved conflicts, fixed the localNetCount/0 bug	2023-12-18 08:11:34 +00:00
Ziyue Yang	655742a3a6	Fully disable MSCCL when machine is not matched (#1017 ) * Disable MSCCL algorithm meta loading when machine is not matched * fully disable init * fix potential segfault	2023-12-13 08:36:21 -08:00
Nilesh M Negi	bc44e3faa7	Fix gcnArch bug in IFC mix build (#998 ) (#1002 ) Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>	2023-12-04 16:20:22 -06:00
Bertan Dogancay	7c0f49a878	IFC mix build (#998 )	2023-12-02 18:49:52 -07:00
Wenkai Du	bc8661f092	Fix kernel command line warnings (#961 ) * Fix kernel command line warnings * Remove while loop	2023-11-15 18:01:12 -08:00
Sylvain Jeaugey	88d44d777f	2.19.4-1 Split transport connect phase into multiple steps to avoid port exhaustion when connecting alltoall at large scale. Defaults to 128 peers per round. Fix memory leaks on CUDA graph capture. Fix alltoallv crash on self-sendrecv. Make topology detection more deterministic when PCI speeds are not available (fix issue #1020). Properly close shared memory in NVLS resources. Revert proxy detach after 5 seconds. Add option to print progress during transport connect. Add option to set NCCL_DEBUG to INFO on first WARN.	2023-11-13 10:36:12 -08:00
Nilesh M Negi	96ec3ffe2e	SRC/INIT: fix typo for ENABLE_PROFILING (#934 ) Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>	2023-10-26 23:52:46 -05:00
akolliasAMD	28d7fe5629	Dma buf support optin (#905 ) * dmaBufSupport Optin added on every part of the code that should invoke it	2023-10-03 03:17:48 -06:00
Sylvain Jeaugey	8c6c595185	2.19.3-1 H800/H100 fixes and tuning. Re-enable intra-process direct pointer buffer access when CUMEM is enabled.	2023-09-26 05:57:15 -07:00
Sylvain Jeaugey	3435178b6c	Merge remote-tracking branch 'origin/master' into v2.19	2023-09-26 05:55:56 -07:00
Sylvain Jeaugey	f9c3dc251e	2.19.1-1 Add local user buffer registration for NVLink SHARP. Add tuning plugin support. Increase net API to v7 to allow for device-side packet reordering; remove support for v4 plugins. Add support for RoCE ECE. Add support for C2C links. Better detect SHM allocation failures to avoid crash with Bus Error. Fix missing thread unlocks in bootstrap (Fixes #936). Disable network flush by default on H100. Move device code from src/collectives/device to src/device.	2023-09-26 05:50:33 -07:00
Kaiming Ouyang	4365458757	Fix cudaMemcpyAsync bug We are trying to use the copy result of first cudaMemcpyAsync in the second cudaMemcpyAsync without sync in between. This patch fixes it by allocating a CPU side array to cache device side addr so that we can avoid this consecutive cuda mem copy. Fixes #957	2023-09-20 05:51:14 -07:00
Audrey MP	e58ec78d35	Gcn arch name (#886 ) We use CMake to determine if we're compiling against a version of ROCm that supports gcnArchName and handles architecture checking appropriately. It includes a few helper functions as drop ins for the functionality we used gcnArch for before; sometimes to enable flags, and sometimes to set frequencies.	2023-09-12 15:34:40 -04:00
Andy li	e1dc4d5e42	enable hip graph on multi-node (#884 ) * initial checkin * enable msccl when hip graph is on * remove the commented out code of msccl enable check * clean up the code * remove the msccl HighestTransportType check logic	2023-09-11 15:30:04 -07:00
akolliasAMD	d33cd5a233	NCCL_TREES variable and rome model fixes (#856 )	2023-08-21 10:35:37 -06:00
Wenkai Du	d65c0830c6	Detect HIP_UNCACHED_MEMORY support from HIP version (#842 )	2023-08-04 10:17:04 -07:00
Wenkai Du	c8085eb704	Improve collective trace (#835 )	2023-08-03 07:16:12 -07:00
Wenkai Du	a7fcd58a97	Enable gfx94x (#808 ) (#816 ) (cherry picked from commit 94da229a7788d74685d1591a4e75a8341de64f41)	2023-07-21 07:31:27 -07:00
Wenkai Du	abd0615351	Merge remote-tracking branch 'nccl/master' into develop	2023-06-26 22:51:56 +00:00
Sylvain Jeaugey	ea38312273	2.18.3-1 Fix data corruption with Tree/LL128 on systems with 1GPU:1NIC. Fix hang with Collnet on bfloat16 on systems with less than one NIC per GPU. Fix long initialization time. Fix data corruption with Collnet when mixing multi-process and multi-GPU per process. Fix crash when shared memory creation fails. Fix Avg operation with Collnet/Chain. Fix performance of alltoall at scale with more than one NIC per GPU. Fix performance for DGX H800. Fix race condition in connection progress causing a crash. Fix network flush with Collnet. Fix performance of aggregated allGather/reduceScatter operations. Fix PXN operation when CUDA_VISIBLE_DEVICES is set. Fix NVTX3 compilation issues on Debian 10.	2023-06-14 01:29:17 -07:00
Ziyue Yang	7d6e7bcd7d	revert npkit (#748 )	2023-05-24 07:41:05 -07:00
Wenkai Du	8bb3340fcb	Skip checking of some settings in Cray OS (#739 )	2023-05-09 07:59:56 -07:00
Wenkai Du	897745a266	Remove references to NVLS functions	2023-05-05 07:55:20 -07:00

1 2 3 4

197 Коммитов