rocm-systems

Автор	SHA1	Сообщение	Дата
Eiden Yoshida	fb267ea333	Move address-sanitizer build above addition of rccl library in CMakeLists (#392 )	2021-06-11 14:43:54 -06:00
Wenkai Du	6dcae8a459	Select sendrecv path based on collective data size (#391 ) * Select sendrecv path based on collective data size * Add comments on packing and unpacking group field * Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests	2021-06-10 17:51:04 -07:00
Stanley Tsang	f6f5e16fe6	Fixing bug with ExtractSubDataset function not fully initializing subdataset (#390 )	2021-06-10 14:35:39 -06:00
Eiden Yoshida	eea7b24058	Add address sanitizer build option (#389 )	2021-06-10 09:14:54 -06:00
Wenkai Du	b815a2800f	Setup collectives threshold for enabling intranet (#387 ) * Setup collectives threshold for enabling intranet * Use separate operation counters for coll and p2p	2021-06-09 13:24:26 -07:00
Wenkai Du	c2064adcc7	Add support for another Rome model (#385 )	2021-06-08 13:58:20 -07:00
Stanley Tsang	6842429a14	Updating changelog to show install script fix (#384 )	2021-06-08 13:00:40 -06:00
Wenkai Du	a3a8c2d56b	Allow intranode use of network connection (#383 ) * Allow intranode use of network connection * Checking for graph for null pointer	2021-06-08 07:37:59 -07:00
Stanley Tsang	820a53287f	Fixing install script so that invoking -r alone does not trigger rebuild (#382 )	2021-06-04 09:46:04 -06:00
Wenkai Du	961922ea02	Add option to enable multiple SAT in SHARP (#380 ) * Add option to enable multiple SAT in SHARP * Extend number of NICs to 16	2021-06-03 19:45:18 -07:00
gilbertlee-amd	903c84050d	ROCm 4.3 changelog update (#379 ) * Update CHANGELOG.md (#378) * Updating CHANGELOG.md for ROCm 4.3	2021-06-03 10:56:02 -06:00
Wenkai Du	03ac898825	Merge pull request #377 from wenkaidu/2.9.9 Sync up with NCCL 2.9.9	2021-05-26 11:38:19 -07:00
Wenkai Du	13dc80ee14	topo_expl: update to 2.9.9	2021-05-26 09:24:34 -07:00
Wenkai Du	e3abf1c2ec	Merge remote-tracking branch 'nccl/master' into develop	2021-05-25 20:52:15 -07:00
Stanley Tsang	256403d4f0	Adding support for hipMallocManaged() in unit tests (#375 ) * Adding HMM support for unit tests * Fixing HMM opt-in check	2021-05-25 17:07:12 -06:00
Wenkai Du	4c83adb75c	Update Rome models matching (#376 )	2021-05-25 10:12:40 -07:00
gilbertlee-amd	8e817ecd6d	Tweak clique channel usage for gfx908 (#374 )	2021-05-21 15:36:21 -06:00
Wenkai Du	50da1b48af	Correction on max number of groups (#373 )	2021-05-20 08:58:45 -07:00
Wenkai Du	8cde34be51	Use fixed segment size for sendrecv (#369 )	2021-05-19 08:25:26 -07:00
Wenkai Du	42b080867e	Running only sum for CI quick test (#370 )	2021-05-19 08:25:13 -07:00
gilbertlee-amd	ddceadc313	Tune clique-based AllReduce for device type 908 (#372 ) * Changing switch-over point for clique-based AllReduce	2021-05-18 15:36:07 -06:00
gilbertlee-amd	2daadcc834	Disabling env var caching for all unit tests (#371 ) * Disabling env var caching for all unit tests	2021-05-18 12:56:30 -06:00
Wenkai Du	2f31289fe6	Merge pull request #368 from ROCmSoftwarePlatform/2.9.8 Merge NCCL 2.9.8	2021-05-18 08:38:59 -07:00
Wenkai Du	87727383fe	Merge remote-tracking branch 'nccl/master' into 2.9.8	2021-05-17 10:15:16 -07:00
Stanley Tsang	0b2bfdd6d8	Multiprocess unit test various fixes (#367 ) * Re-enabling mp unit tests * Fixing shared memory leak and other bugs related to shared mem for MP unit tests * Revert 43bfbfc97bf9edbae1f386d461439091618ff8ed * Further tightening up unlinks * Moving test check macros to separate header file * Tightening up shared memory unlinking for clique kernels, add munmap for host barrier for MP unit tests * Updating new MP unit test * Fixing mqueue bug * Fixing memory leak in MP unit tests	2021-05-14 09:38:49 -06:00
Wenkai Du	caf5c9992a	Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361 ) To skip Infiniband, set RCCL_IB_HCA_SKIP_LINK_LAYER=1. To skip Ethernet, RCCL_IB_HCA_SKIP_LINK_LAYER=2.	2021-05-12 14:14:53 -07:00
Sylvain Jeaugey	3fec2fa5ee	2.9.9-1 Fix crash when setting NCCL_MAX_P2P_NCHANNELS below nchannels. Fix hang during sendrecv dynamic NVB connection establishment on cubemesh topologies. Add environment variable to only use SHARP on communicators beyond a given number of ranks. Add debug subsystem to trace memory allocations. Fix compilation with TRACE=1. (Issue #505)	2021-05-12 11:09:31 -07:00
Wenkai Du	abde40197a	Merge pull request #366 from ROCmSoftwarePlatform/2.9.6 Sync up to NCCL 2.9.6	2021-05-11 20:20:42 -07:00
Wenkai Du	330b82df3b	Revert "Sync up to NCCL 2.9.6 (#363 )" (#365 ) This reverts commit `6021329af0`.	2021-05-11 20:18:17 -07:00
Wenkai Du	6021329af0	Sync up to NCCL 2.9.6 (#363 ) * 2.9.6-1 Add support for CUDA graphs. Fuse BCM Gen4 switches to avoid suboptimal performance on some platforms. Issue #439. Fix bootstrap issue caused by connection reordering. Fix CPU locking block. Improve CollNet algorithm. Improve performance on DGX A100 for communicators with only one GPU per node. * Clique tuning upgrade (#352) (#19) * Enabling clique for any XGMI-connected topology, adding tuning * Updating CHANGELOG for clique tuning * Re-working clique barrier system to work on multi-process / multi-gpu Co-authored-by: Sylvain Jeaugey <sjeaugey@nvidia.com> Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com>	2021-05-11 19:40:34 -07:00
gilbertlee-amd	b122dcd991	Update README.md (#364 ) - Remove outdated HIP Direct call requirements - Remove outdated chrpath requirement - Adding section about HSA_FORCE_FINE_GRAIN_PCIE	2021-05-11 13:41:41 -06:00
Wenkai Du	d0d5d4d921	Merge remote-tracking branch 'rccl/develop' into 2.9.6	2021-05-11 09:14:16 -07:00
gilbertlee-amd	e796b1645c	Clique tuning upgrade (#352 ) (#19 ) * Enabling clique for any XGMI-connected topology, adding tuning * Updating CHANGELOG for clique tuning * Re-working clique barrier system to work on multi-process / multi-gpu	2021-05-11 08:44:59 -06:00
Sylvain Jeaugey	ca8485b0d0	2.9.8-1 Fix memory leaks. Fix crash in bootstrap error case. Fix Collnet clean-up issue. Make PCI switch vendor/device optional for XML injection. Add support for nvidia-peermem module.	2021-05-10 14:00:03 -07:00
gilbertlee-amd	9d7232c091	Clique tuning upgrade (#352 ) * Enabling clique for any XGMI-connected topology, adding tuning * Updating CHANGELOG for clique tuning * Re-working clique barrier system to work on multi-process / multi-gpu	2021-05-06 09:50:07 -06:00
Wenkai Du	a4ea1fed5b	Merge remote-tracking branch 'nccl/master' into develop	2021-05-05 16:01:01 -07:00
gilbertlee-amd	4f8e788a61	Fixing potential race-condition in env var parameter macro (#359 )	2021-04-28 12:04:41 -06:00
saadrahim	96782191cf	Expanding CI coverage for 8GPU configurations plus extended tests (#350 )	2021-04-27 09:57:00 -06:00
Wenkai Du	ad54a14a5c	Add libdl linking option (#358 )	2021-04-26 15:24:58 -07:00
Wenkai Du	ed237dcaa7	Use better name for kernel collective trace enable (#357 ) "NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,COLL" enables collectives API trace. Adding "RCCL_KERNEL_COLL_TRACE_ENABLE=1" enables kernel traces.	2021-04-26 08:35:53 -07:00
Wenkai Du	9cc9c3360b	Control collective trace from kernel separately (#356 )	2021-04-23 16:36:19 -07:00
Stanley Tsang	70597789d0	Message queue refactor to POSIX implementation and leak fix (#355 ) * Fixing message queue leak. * Using POSIX implementation of Message Queues * Adding unlink to msgqueue * MsgQueue update * Adding timeout check to msgqueue broadcast; tightening up system checks * Removing unnecessary code * Removing extra argument from print * Adding explicit msg queue close call to all other ranks	2021-04-23 11:33:20 -06:00
Wenkai Du	415c7cd3d1	Tune number of channels for gfx90a (#349 )	2021-04-19 15:27:01 -07:00
Wenkai Du	9c718ce6d6	Use correct WARP_SIZE for gfx1030 (#348 )	2021-04-14 14:09:52 -07:00
Wenkai Du	a79f74082e	Limit max channels for ring graph on single node Rome (#347 ) * Limit max channels for ring graph on single node Rome * Partially revert "Use non-temporal access for streaming data (#341)"	2021-04-14 10:14:54 -07:00
Wenkai Du	1fe031402a	Add gfx90a target (#344 ) * Add gfx90a target * Support gfx90a topology Co-authored-by: Eiden Yoshida <eiden.yoshida@amd.com>	2021-04-14 09:29:00 -06:00
Sylvain Jeaugey	a46ea10583	2.9.6-1 Add support for CUDA graphs. Fuse BCM Gen4 switches to avoid suboptimal performance on some platforms. Issue #439. Fix bootstrap issue caused by connection reordering. Fix CPU locking block. Improve CollNet algorithm. Improve performance on DGX A100 for communicators with only one GPU per node.	2021-04-12 16:00:46 -07:00
Wenkai Du	3f18540f50	Remove link to NUMA lib as it is no longer needed (#346 )	2021-04-12 09:53:17 -07:00
TomSang	87f12cbb86	Add detection of cooperative multi device launch attribute (#345 )	2021-04-11 13:29:24 -07:00
Wenkai Du	def8b4ca0d	Move RCCL changelog and Copyright out of /usr/share (#343 )	2021-04-09 14:08:40 -07:00

1 2 3 4 5 ...

633 Коммитов