rocm-systems

Autor	SHA1	Zpráva	Datum
Stanley Tsang	f152c8d160	Update MP UT to support arbitrary # of GPUs; multiple bugfixes (#16 ) * Fixing temp file creation/deletion for Clique kernel mode. * Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs * GroupCall MP UT properly quits when too many devices specified * MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script [ROCm/rccl commit: `d00b7d17bd`]	2021-02-05 16:49:25 -08:00
Wenkai Du	fe8923ebba	Add gfx908 Rome 4 NICs model [ROCm/rccl commit: `6dfdfef98f`]	2021-02-06 00:19:47 +00:00
Gilbert Lee	b954d85935	[TransferBench] Fixing some merge issues [ROCm/rccl commit: `f372c53d52`]	2021-02-05 16:46:20 +00:00
Wenkai Du	ae5779702a	Merge remote-tracking branch 'origin/develop' into 2.8.3 [ROCm/rccl commit: `ab1e7a0318`]	2021-02-04 20:02:34 -05:00
Gilbert Lee	1dfb88f554	[topo_expl] Updating for 2.8.3 [ROCm/rccl commit: `2f541508c5`]	2021-02-04 19:08:42 +00:00
Gilbert Lee	f2d07cb9a6	[ib-test] Update for 2.8.3] [ROCm/rccl commit: `9aac1ed38f`]	2021-02-04 19:05:03 +00:00
Gilbert Lee	1643d05c75	[TransferBench] Updating for 2.8.3 [ROCm/rccl commit: `9ce203dd0a`]	2021-02-04 18:58:25 +00:00
gilbertlee-amd	16d625ca27	Tuning some clique-based kernel parameters (#315 ) [ROCm/rccl commit: `1990ffd76a`]	2021-02-03 20:00:08 -07:00
Wenkai Du	57abf599b2	Enable GPU direct RDMA read from GPU [ROCm/rccl commit: `5f97122442`]	2021-02-03 02:48:30 +00:00
gilbertlee-amd	60c74f63fa	[TransferBench] Restore some previous fixes - memory leak, PCIe address (#314 ) [ROCm/rccl commit: `62e0447e9a`]	2021-02-01 09:48:09 -07:00
Gilbert Lee	6bf9b0d36a	Removing in-place tests from Combined calls (no support for send/recv) [ROCm/rccl commit: `01a998b17c`]	2021-01-28 20:09:03 +00:00
gilbertlee-amd	c981e76efe	Clique kernel support (#295 ) (#15 ) * Adding experimental clique-based kernels (opt-in only) Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Gilbert Lee <gilbert.lee@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com> Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com> [ROCm/rccl commit: `3e62ceddc5`]	2021-01-28 09:45:01 -07:00
Wenkai Du	7f9c15b843	Use less unroll for clique kernels (#313 ) [ROCm/rccl commit: `41e47a36e7`]	2021-01-15 17:48:10 -08:00
Stanley Tsang	d7ed44eb9a	Adding multiprocess unit tests (#312 ) Adding multiprocess unit tests for collectives. To run, NCCL_COMM_ID=$HOSTNAME:12345 build/release/test/UnitTestsMultiProcess [ROCm/rccl commit: `d3fa257682`]	2021-01-15 16:34:36 -07:00
Wenkai Du	d4382de267	Improve collective trace [ROCm/rccl commit: `2ddbe6646b`]	2021-01-14 19:28:01 -05:00
Wenkai Du	560224fe9f	gtest: add scatter to combined calls and use loops (#303 ) * gtest: add scatter to combined calls and use loops * gtest: run validation inside loop * gtest: revert small element count to 2520 * gtest: fix memory leak in validation (cherry picked from commit `36935cfbee`) * Fix combined call UT * Fix memory leak * Fix alltoallv test [ROCm/rccl commit: `b33a2cac8b`]	2021-01-14 19:28:01 -05:00
Wenkai Du	2c49121171	Port alltoall[v] [ROCm/rccl commit: `f4d5d3d620`]	2021-01-14 19:28:01 -05:00
Wenkai Du	41bead5a4e	Do not allow GPU as intermediate [ROCm/rccl commit: `105db19a11`]	2021-01-14 19:28:01 -05:00
Wenkai Du	34c6013299	Revert "Changes to topology based on XGMI (#272 )" This reverts commit `0a9adc16f4`. [ROCm/rccl commit: `e055229e56`]	2021-01-14 19:28:01 -05:00
Wenkai Du	adff98765c	Merge remote-tracking branch 'nccl/master' into no-target-id [ROCm/rccl commit: `d469947641`]	2021-01-14 19:27:53 -05:00
Wenkai Du	4ea285c527	Fix Rome PCIe 2 node topology generation (#310 ) [ROCm/rccl commit: `373a108516`]	2020-12-15 17:16:17 -08:00
gilbertlee-amd	c570f09681	[TransferBench] Fixing bug with fine-grained memory allocation (#311 ) * Fixing bug with fine-grained memory [ROCm/rccl commit: `41c35dad48`]	2020-12-15 17:37:31 -07:00
gilbertlee-amd	5155abb250	[TransferBench] Adding ability to perform CPU-executed copies, various upgrades (#309 ) * Adding CPU based execution, fixing typos, adding Fine-grained mem * Exposing sampling factor when generating range of data sizes * Refactoring how Links are launched, now once per thread * Documentation updates [ROCm/rccl commit: `ae0c4092c7`]	2020-12-11 10:21:14 -07:00
gilbertlee-amd	9b48f92d72	[TransferBench] Support multiple of 4 byte sizes, changing default GPU timing mechanism (#307 ) * Changing default timing mechanism, adjusting CPU bandwidth calc, adding flag to use combined timing * Adding support for smaller transfers (byte size must be multiple of 4 instead of 128) [ROCm/rccl commit: `b80ae551b1`]	2020-12-04 14:57:13 -07:00
Wenkai Du	9e83df4ad3	Adding backward compatibility for target-id syntax for AMDGPU_TARGETS (#306 ) [ROCm/rccl commit: `882d52ad7e`]	2020-12-04 13:55:56 -08:00
Wenkai Du	b68ff1ebba	Add Rome model and improve search (#305 ) [ROCm/rccl commit: `975b14dffa`]	2020-11-17 14:55:06 -08:00
Sylvain Jeaugey	a8908b34ee	2.8.3-1 Optimization for Tree allreduce on A100. Improve aggregation performance. Use shared buffers for inter-node send/recv. Add NVTX profiling hooks. Accelerate alltoall connections by merging communication for all channels. Add support for one hop communication through NVLink, for faster send/recv communication on cubemesh topologies like DGX-1. Improve alltoall scheduling to better balance intra/inter node communication. Increase send/recv parallelism by 8x, each warp sending or receiving to a different peer. Net: move to v4. Net: make flush operation asynchronous to accelerate alltoall. Net: define maximum number of requests. Fix hang when using LL128 protocol after 2^31 steps. Fix #379 : topology injection failing when using less GPUs than described in the XML. Fix #394 : protocol mismatch causing hangs or crashes when using one GPU per node. [ROCm/rccl commit: `920dbe5b35`]	2020-11-17 11:08:52 -08:00
Wenkai Du	32fdfc93fc	Merge remote-tracking branch 'origin/master' into develop [ROCm/rccl commit: `1943bac646`]	2020-11-16 12:16:53 -05:00
Wenkai Du	f19cbc8e51	Use device's link width and speed if port doesn't report (#304 ) [ROCm/rccl commit: `554729079d`]	2020-11-13 17:58:04 -08:00
Wenkai Du	36935cfbee	gtest: add scatter to combined calls and use loops (#303 ) * gtest: add scatter to combined calls and use loops * gtest: run validation inside loop * gtest: revert small element count to 2520 * gtest: fix memory leak in validation [ROCm/rccl commit: `b0853ccd51`]	2020-11-13 17:57:44 -08:00
Stanley Tsang	f373cd2fdc	Fixing IPC handle leak (#302 ) [ROCm/rccl commit: `2958f7eace`]	2020-11-13 10:32:42 -07:00
gilbertlee-amd	f66d05193a	Adding RCCL_CLIQUE_DEBUG to help debug experimental clique feature (#300 ) [ROCm/rccl commit: `c8d08a7c2f`]	2020-11-13 09:07:11 -07:00
Wenkai Du	62d21047b8	Skip unused peer connection in scatter and gather (#301 ) [ROCm/rccl commit: `4e68229c8b`]	2020-11-12 15:47:34 -08:00
Colin Smith	1349b382cd	Merge pull request #299 from ROCmSoftwarePlatform/develop Enable target id build [ROCm/rccl commit: `377b43470b`]	2020-11-10 15:47:42 -07:00
gilbertlee-amd	a7ef699687	Clique kernel support (#295 ) * Adding experimental clique-based kernels (opt-in only) Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Gilbert Lee <gilbert.lee@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com> [ROCm/rccl commit: `41bcfb8878`]	2020-11-10 15:44:10 -07:00
Wenkai Du	273974393e	Use target id of xnack off (#298 ) [ROCm/rccl commit: `1fdb216f87`]	2020-11-10 11:10:48 -08:00
Wenkai Du	a0109b2eae	Use ncclSend/ncclRecv for alltoall type of collectives as default (#297 ) [ROCm/rccl commit: `2e8b3a0857`]	2020-11-09 11:23:17 -08:00
gilbertlee-amd	1af4a5c9fa	Adding a CHANGELOG (#296 ) [ROCm/rccl commit: `bdd8adf1ca`]	2020-11-05 13:38:30 -07:00
Wenkai Du	a4dd1a9548	Improve GPU direct RDMA handling on Rome (#294 ) [ROCm/rccl commit: `709b7e4880`]	2020-11-03 14:29:08 -08:00
Wenkai Du	c0c64d970a	Add more Rome models (#292 ) [ROCm/rccl commit: `dfa3c41ede`]	2020-10-30 21:26:04 -07:00
gilbertlee-amd	2931959e6e	Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats (#290 ) [ROCm/rccl commit: `bfab1d3592`]	2020-10-27 09:00:33 -06:00
Wenkai Du	e7ea1a585e	Fix lintian errors (#287 ) [ROCm/rccl commit: `2ecfc62ec8`]	2020-10-21 16:20:53 -07:00
gilbertlee-amd	a062c80298	[TransferBench] Displaying PCIe Bus ID (#288 ) * Adding PCIe BusID per GPU in topology display [ROCm/rccl commit: `61e1a71d14`]	2020-10-21 16:13:36 -06:00
gilbertlee-amd	0282595de5	TransferBench Typo. Pinned host memory uses C not P (#286 ) [ROCm/rccl commit: `769418c5c7`]	2020-10-21 12:05:38 -06:00
xietingwew	0277094dd2	fix proxyArgs for trace log [ROCm/rccl commit: `084207e685`]	2020-10-21 09:18:40 -07:00
saadrahim	5439649936	Adding sles15, centos7 and centos8 testing (#283 ) [ROCm/rccl commit: `e8177c9ee7`]	2020-10-20 09:39:03 -06:00
Wenkai Du	1aae6b1344	Fix incorrect pointer checking for scatter and gather (#285 ) [ROCm/rccl commit: `dcad0ef7cb`]	2020-10-19 13:27:09 -07:00
gilbertlee-amd	f4b9a0d8e5	Removing unnecessary flags from CI (#278 ) * Removing unnecessary flags from CI * Re-adding HSA_FORCE_FINE_GRAIN_PCIE in CI [ROCm/rccl commit: `9b3f762b68`]	2020-10-19 13:08:24 -06:00
saadrahim	0465dffe6f	Updating copyright for documentation (#282 ) [ROCm/rccl commit: `49aa6d7afe`]	2020-10-19 13:07:15 -06:00
Wenkai Du	d1781365d6	Merge pull request #279 from wenkaidu/nccl_sync Sync up with latest NCCL master branch [ROCm/rccl commit: `a7deecb104`]	2020-10-16 11:21:35 -07:00

1 2 3 4 5 ...

546 Commity