rocm-systems

Автор	SHA1	Сообщение	Дата
Wenkai Du	60ca7484c0	Fix kernel data trace	2021-08-24 14:02:53 -07:00
Wenkai Du	d5f93649ff	Merge remote-tracking branch 'origin/develop' into 2.10.3	2021-08-24 09:49:47 -07:00
Wenkai Du	5c8380ff5b	Implement NIC identification and remapping (#420 ) * Add 1H16P GPU model * Implement NIC identification and remapping * Revert "Sort IB devices based on device name (#413)" This reverts commit `2d0ed8dff6`. * Fix permute and check order * Correction on IB speed reporting * Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)" This reverts commit `caf5c9992a`.	2021-08-24 09:42:04 -07:00
Wenkai Du	5f15ed6e3e	Add gfx908 VM model (#418 )	2021-08-10 08:55:11 -07:00
Wenkai Du	707c687090	Use noinline for kernel functions	2021-08-06 09:15:04 -07:00
Wenkai Du	01d3b20a66	Fix incorrect network proxy received bytes reporting	2021-08-05 17:45:48 -07:00
Wenkai Du	babbd1047b	Merge branch 'develop' into 2.10.3	2021-08-04 09:45:22 -07:00
Wenkai Du	2d0ed8dff6	Sort IB devices based on device name (#413 )	2021-08-03 15:32:41 -07:00
Wenkai Du	bf2339f93e	Merge remote-tracking branch 'nccl/master' into 2.10.3	2021-07-30 16:23:14 -07:00
Wenkai Du	3e27227562	XGMI connection is always prioritized over NET regardless of hops (#412 )	2021-07-29 11:12:42 -07:00
Wenkai Du	818cdb16a8	Query XGMI links from xml and adjust gfx906 channel usage (#410 )	2021-07-27 17:32:41 -07:00
Wenkai Du	135d47d125	topo_expl: fix build after switching to rocm-smi-lib (#405 ) * topo_expl: fix build after switching to rocm-smi-lib * Use minimal of 4 channels for gfx908	2021-07-27 08:30:08 -07:00
Wenkai Du	dfc62d5fbb	Skipping unnecessary functions in Doxygen by marking as internal (#353 ) (#406 ) (cherry picked from commit 1c982d819d9c7fe0310b80f9a25808e54c71137e) Co-authored-by: saadrahim <44449863+saadrahim@users.noreply.github.com>	2021-07-24 11:04:27 -07:00
Lu	bd6dbca8fb	Add more info to RCCL logging for topo-aware optim.	2021-07-22 09:52:39 -07:00
Ke Wen	7e51592129	2.10.3-1 Add support for bfloat16. Add ncclAvg reduction operation. Improve performance for aggregated operations. Improve performance for tree. Improve network error reporting. Add NCCL_NET parameter to force a specific network. Add NCCL_IB_QPS_PER_CONNECTION parameter to split IB traffic onto multiple queue pairs. Fix topology detection error in WSL2. Fix proxy memory elements affinity (improve alltoall performance). Fix graph search on cubemesh topologies. Fix hang in cubemesh during NVB connections.	2021-07-08 14:30:14 -07:00
Wenkai Du	56155ff5b6	Use rocm_smi_lib for getting topology information (#402 ) * Use rocm_smi_lib for getting topology information * Add rocm-smi-lib dependency to RCCL package	2021-07-08 13:23:11 -07:00
Wenkai Du	fa6d7e9a63	Fixes for NCCL_MAX_NCHANNELS and topo_expl (#398 )	2021-06-22 08:41:49 -07:00
Wenkai Du	6dcae8a459	Select sendrecv path based on collective data size (#391 ) * Select sendrecv path based on collective data size * Add comments on packing and unpacking group field * Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests	2021-06-10 17:51:04 -07:00
Wenkai Du	b815a2800f	Setup collectives threshold for enabling intranet (#387 ) * Setup collectives threshold for enabling intranet * Use separate operation counters for coll and p2p	2021-06-09 13:24:26 -07:00
Wenkai Du	c2064adcc7	Add support for another Rome model (#385 )	2021-06-08 13:58:20 -07:00
Wenkai Du	a3a8c2d56b	Allow intranode use of network connection (#383 ) * Allow intranode use of network connection * Checking for graph for null pointer	2021-06-08 07:37:59 -07:00
Wenkai Du	961922ea02	Add option to enable multiple SAT in SHARP (#380 ) * Add option to enable multiple SAT in SHARP * Extend number of NICs to 16	2021-06-03 19:45:18 -07:00
Wenkai Du	e3abf1c2ec	Merge remote-tracking branch 'nccl/master' into develop	2021-05-25 20:52:15 -07:00
Wenkai Du	4c83adb75c	Update Rome models matching (#376 )	2021-05-25 10:12:40 -07:00
gilbertlee-amd	8e817ecd6d	Tweak clique channel usage for gfx908 (#374 )	2021-05-21 15:36:21 -06:00
Wenkai Du	50da1b48af	Correction on max number of groups (#373 )	2021-05-20 08:58:45 -07:00
Wenkai Du	8cde34be51	Use fixed segment size for sendrecv (#369 )	2021-05-19 08:25:26 -07:00
gilbertlee-amd	ddceadc313	Tune clique-based AllReduce for device type 908 (#372 ) * Changing switch-over point for clique-based AllReduce	2021-05-18 15:36:07 -06:00
Wenkai Du	87727383fe	Merge remote-tracking branch 'nccl/master' into 2.9.8	2021-05-17 10:15:16 -07:00
Stanley Tsang	0b2bfdd6d8	Multiprocess unit test various fixes (#367 ) * Re-enabling mp unit tests * Fixing shared memory leak and other bugs related to shared mem for MP unit tests * Revert 43bfbfc97bf9edbae1f386d461439091618ff8ed * Further tightening up unlinks * Moving test check macros to separate header file * Tightening up shared memory unlinking for clique kernels, add munmap for host barrier for MP unit tests * Updating new MP unit test * Fixing mqueue bug * Fixing memory leak in MP unit tests	2021-05-14 09:38:49 -06:00
Wenkai Du	caf5c9992a	Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361 ) To skip Infiniband, set RCCL_IB_HCA_SKIP_LINK_LAYER=1. To skip Ethernet, RCCL_IB_HCA_SKIP_LINK_LAYER=2.	2021-05-12 14:14:53 -07:00
Sylvain Jeaugey	3fec2fa5ee	2.9.9-1 Fix crash when setting NCCL_MAX_P2P_NCHANNELS below nchannels. Fix hang during sendrecv dynamic NVB connection establishment on cubemesh topologies. Add environment variable to only use SHARP on communicators beyond a given number of ranks. Add debug subsystem to trace memory allocations. Fix compilation with TRACE=1. (Issue #505)	2021-05-12 11:09:31 -07:00
gilbertlee-amd	e796b1645c	Clique tuning upgrade (#352 ) (#19 ) * Enabling clique for any XGMI-connected topology, adding tuning * Updating CHANGELOG for clique tuning * Re-working clique barrier system to work on multi-process / multi-gpu	2021-05-11 08:44:59 -06:00
Sylvain Jeaugey	ca8485b0d0	2.9.8-1 Fix memory leaks. Fix crash in bootstrap error case. Fix Collnet clean-up issue. Make PCI switch vendor/device optional for XML injection. Add support for nvidia-peermem module.	2021-05-10 14:00:03 -07:00
Wenkai Du	a4ea1fed5b	Merge remote-tracking branch 'nccl/master' into develop	2021-05-05 16:01:01 -07:00
gilbertlee-amd	4f8e788a61	Fixing potential race-condition in env var parameter macro (#359 )	2021-04-28 12:04:41 -06:00
Wenkai Du	ed237dcaa7	Use better name for kernel collective trace enable (#357 ) "NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,COLL" enables collectives API trace. Adding "RCCL_KERNEL_COLL_TRACE_ENABLE=1" enables kernel traces.	2021-04-26 08:35:53 -07:00
Wenkai Du	9cc9c3360b	Control collective trace from kernel separately (#356 )	2021-04-23 16:36:19 -07:00
Stanley Tsang	70597789d0	Message queue refactor to POSIX implementation and leak fix (#355 ) * Fixing message queue leak. * Using POSIX implementation of Message Queues * Adding unlink to msgqueue * MsgQueue update * Adding timeout check to msgqueue broadcast; tightening up system checks * Removing unnecessary code * Removing extra argument from print * Adding explicit msg queue close call to all other ranks	2021-04-23 11:33:20 -06:00
Wenkai Du	415c7cd3d1	Tune number of channels for gfx90a (#349 )	2021-04-19 15:27:01 -07:00
Wenkai Du	9c718ce6d6	Use correct WARP_SIZE for gfx1030 (#348 )	2021-04-14 14:09:52 -07:00
Wenkai Du	a79f74082e	Limit max channels for ring graph on single node Rome (#347 ) * Limit max channels for ring graph on single node Rome * Partially revert "Use non-temporal access for streaming data (#341)"	2021-04-14 10:14:54 -07:00
Wenkai Du	1fe031402a	Add gfx90a target (#344 ) * Add gfx90a target * Support gfx90a topology Co-authored-by: Eiden Yoshida <eiden.yoshida@amd.com>	2021-04-14 09:29:00 -06:00
Sylvain Jeaugey	a46ea10583	2.9.6-1 Add support for CUDA graphs. Fuse BCM Gen4 switches to avoid suboptimal performance on some platforms. Issue #439. Fix bootstrap issue caused by connection reordering. Fix CPU locking block. Improve CollNet algorithm. Improve performance on DGX A100 for communicators with only one GPU per node.	2021-04-12 16:00:46 -07:00
TomSang	87f12cbb86	Add detection of cooperative multi device launch attribute (#345 )	2021-04-11 13:29:24 -07:00
Wenkai Du	9dfc2c183e	Use non-temporal access for streaming data (#341 ) * Use non-temporal access for streaming data * Revert to ulong2 after fixing compiling issue	2021-04-07 17:34:35 -07:00
gilbertlee-amd	caba0a63d2	Fixing clique-topology detection (#342 ) * Fixing clique-topology detection * Fix to enable multi-process clique-based kernels	2021-04-07 11:29:44 -06:00
Wenkai Du	e26ad2995e	Cleanup number of channels calculation (#340 )	2021-04-05 17:51:56 -07:00
Wenkai Du	17491c918e	Fix incorrect net counting (#339 ) * Fix incorrect net counting * Add comments	2021-04-05 12:21:57 -07:00
Wenkai Du	1d2946ee4b	Rework network port trimming code (#338 ) * Rework network port trimming code * Move Rome related changes to separate source files	2021-03-31 10:25:59 -07:00

1 2 3 4 5 ...

293 Коммитов