rocm-systems

Autor	SHA1	Mensaje	Fecha
Wenkai Du	6dfdfef98f	Add gfx908 Rome 4 NICs model	2021-02-06 00:19:47 +00:00
Gilbert Lee	f372c53d52	[TransferBench] Fixing some merge issues	2021-02-05 16:46:20 +00:00
Wenkai Du	ab1e7a0318	Merge remote-tracking branch 'origin/develop' into 2.8.3	2021-02-04 20:02:34 -05:00
Gilbert Lee	2f541508c5	[topo_expl] Updating for 2.8.3	2021-02-04 19:08:42 +00:00
Gilbert Lee	9aac1ed38f	[ib-test] Update for 2.8.3]	2021-02-04 19:05:03 +00:00
Gilbert Lee	9ce203dd0a	[TransferBench] Updating for 2.8.3	2021-02-04 18:58:25 +00:00
gilbertlee-amd	62e0447e9a	[TransferBench] Restore some previous fixes - memory leak, PCIe address (#314 )	2021-02-01 09:48:09 -07:00
gilbertlee-amd	3e62ceddc5	Clique kernel support (#295 ) (#15 ) * Adding experimental clique-based kernels (opt-in only) Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Gilbert Lee <gilbert.lee@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com> Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>	2021-01-28 09:45:01 -07:00
Wenkai Du	2ddbe6646b	Improve collective trace	2021-01-14 19:28:01 -05:00
Wenkai Du	f4d5d3d620	Port alltoall[v]	2021-01-14 19:28:01 -05:00
Wenkai Du	d469947641	Merge remote-tracking branch 'nccl/master' into no-target-id	2021-01-14 19:27:53 -05:00
Wenkai Du	373a108516	Fix Rome PCIe 2 node topology generation (#310 )	2020-12-15 17:16:17 -08:00
gilbertlee-amd	41c35dad48	[TransferBench] Fixing bug with fine-grained memory allocation (#311 ) * Fixing bug with fine-grained memory	2020-12-15 17:37:31 -07:00
gilbertlee-amd	ae0c4092c7	[TransferBench] Adding ability to perform CPU-executed copies, various upgrades (#309 ) * Adding CPU based execution, fixing typos, adding Fine-grained mem * Exposing sampling factor when generating range of data sizes * Refactoring how Links are launched, now once per thread * Documentation updates	2020-12-11 10:21:14 -07:00
gilbertlee-amd	b80ae551b1	[TransferBench] Support multiple of 4 byte sizes, changing default GPU timing mechanism (#307 ) * Changing default timing mechanism, adjusting CPU bandwidth calc, adding flag to use combined timing * Adding support for smaller transfers (byte size must be multiple of 4 instead of 128)	2020-12-04 14:57:13 -07:00
Wenkai Du	975b14dffa	Add Rome model and improve search (#305 )	2020-11-17 14:55:06 -08:00
gilbertlee-amd	41bcfb8878	Clique kernel support (#295 ) * Adding experimental clique-based kernels (opt-in only) Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Gilbert Lee <gilbert.lee@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>	2020-11-10 15:44:10 -07:00
Wenkai Du	dfa3c41ede	Add more Rome models (#292 )	2020-10-30 21:26:04 -07:00
gilbertlee-amd	bfab1d3592	Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats (#290 )	2020-10-27 09:00:33 -06:00
gilbertlee-amd	61e1a71d14	[TransferBench] Displaying PCIe Bus ID (#288 ) * Adding PCIe BusID per GPU in topology display	2020-10-21 16:13:36 -06:00
gilbertlee-amd	769418c5c7	TransferBench Typo. Pinned host memory uses C not P (#286 )	2020-10-21 12:05:38 -06:00
gilbertlee-amd	84a2541e01	Revert "Initial support for clique-based kernels (#276 )" (#280 ) This reverts commit `2b8184808d`.	2020-10-15 11:30:18 -07:00
Wenkai Du	33babcb5e2	Update Rome single node models (#277 )	2020-10-13 13:33:09 -07:00
gilbertlee-amd	2b8184808d	Initial support for clique-based kernels (#276 ) * Initial support for clique-based kernels	2020-10-13 11:22:04 -06:00
Wenkai Du	ae008fd2db	Rework Rome detection and add multiple network ports models (#274 ) * Rework Rome detection and add multiple network ports models * Remove unused opCount in p2p transport	2020-10-07 13:37:36 -07:00
Wenkai Du	b871ea3c0c	Add Alltoallv RCCL kernel implementation (#269 ) * Add alltoallv API and implementation * Extend Rome P2P channel limit to multinode and alltoall kernels * topo_expl: fix compilation and sync up with main * gtest: use RCCL alltoallv API * Code review changes	2020-09-30 16:25:36 -07:00
gilbertlee-amd	ee262819a7	New TransferBench features (#273 ) * Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars	2020-09-25 12:20:48 -06:00
lijietang	bbe233f8c1	Add rccl bw test script in tools (#255 )	2020-09-11 16:59:03 +08:00
Wenkai Du	c5cbece6d0	Increase minimal channels for gfx908 (#259 )	2020-08-26 11:40:11 -07:00
Wenkai Du	391bbf3f1e	Add NPS4 support on some models (#256 ) * Add NPS4 support on some models * Add XML models	2020-08-19 11:03:20 -07:00
gilbertlee-amd	ec9af40fcd	Upgrading various TransferBench features (#257 )	2020-08-19 09:47:19 -06:00
Wenkai Du	a51e4071e3	Add another Rome model (#249 ) * Add another Rome model * Add gfx908 4P3L models and support * Revert "Use cached value for detecting GDR support only once" This reverts commit `67c8e72ce3`. * Skip using ibverb for GPU direct RDMA detection * Fine tune one Rome model	2020-08-17 10:51:02 -07:00
gilbertlee-amd	c985478133	Fixes to make TransferBench compile for hipclang (#254 )	2020-08-13 12:25:28 -06:00
Wenkai Du	7e3d8a31cc	Collect gcnArch and hipDeviceArch_t in XML (#252 )	2020-08-12 15:48:38 -07:00
Wenkai Du	3c46cb8ad4	Merge pull request #247 from wenkaidu/rome Additional Rome models support	2020-08-07 10:56:12 -07:00
MurtadhaAldallal	390c63cf0d	Update rccl_prim_test.cpp (#246 ) Adding doublelocalcopy operation and freeing buffer memory at end. DoubleLocalCopy Patch Added	2020-08-07 08:20:14 -07:00
Wenkai Du	09ef75656a	Add more Rome 4P2H models	2020-08-06 18:20:02 +00:00
Wenkai Du	e7a10aa0e4	Topology tuning for 4P2H on Rome (#242 ) * Topology tuning for 4P2H on Rome * Use ncclTopoIdToIndex	2020-07-27 11:53:57 -07:00
Wenkai Du	8d5fb920b6	ib-test: support multiple channels (#241 )	2020-07-27 11:03:12 -07:00
Sourav Chakraborty	2475daafee	add 4 node 8P6L 1 NIC 2nd Hive model	2020-07-22 16:27:15 +00:00
Sourav Chakraborty	db55afb014	simplify model definitions in topo expl	2020-07-22 16:05:53 +00:00
Wenkai Du	d5f90e19b5	Add 8P6L multi-node models (#239 )	2020-07-21 14:10:36 -07:00
Wenkai Du	ab787c767e	Change default channels duplication for chordal ring (#233 )	2020-07-14 15:16:50 -07:00
Stanley Tsang	9bd4c14603	Adding appropriate references in rccl-prim-test (#227 ) Adding appropriate references to rccl-prim-test.	2020-07-06 10:15:03 -06:00
Wenkai Du	d3548cc474	topo_expl: each rank needs to have its own memory for graphs (#225 )	2020-07-01 15:11:02 -07:00
Wenkai Du	a6be82f5ab	topo_expl: fix broken build (#224 )	2020-06-30 11:11:23 -07:00
Wenkai Du	0eb19a563a	Use posix_memalign for network buffer allocation on host memory (#221 ) * Use posix_memalign for network buffer allocation on host memory * ib-test: add ability to specify run iterations * ib-test: define iterations as multiple of default cycles * Add checking to posix_memalign return value	2020-06-22 13:06:25 -07:00
Wenkai Du	dc739c4e70	ib-test: support host memory allocation through posix_memalign (#220 ) * ib-test: support host memory allocation through posix_memalign * ib-test: add missing CUDACHECK to hip calls	2020-06-17 16:16:54 -07:00
Wenkai Du	cfa97eccd3	Add IB/RDMA unit test	2020-06-16 18:29:17 +00:00
Wenkai Du	e80e29573c	Add gather, scatter and alltoall collectives Introducing 3 new APIs: ncclResult_t ncclGather(const void* sendbuff, void* recvbuff, size_t sendcount, ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream); ncclResult_t ncclScatter(const void* sendbuff, void* recvbuff, size_t recvcount, ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream); ncclResult_t ncclAllToAll(const void* sendbuff, void* recvbuff, size_t count, ncclDataType_t datatype, ncclComm_t comm, hipStream_t stream); Only out of place operation is supported. Preprocessor symbol RCCL_GATHER_SCATTER=1 indicates API availibility. By default the APIs launche RCCL kernel implementation, which can be disabled by RCCL_ALLTOALL_KERNEL_DISABLE=1. Then the APIs use wrapper around ncclSend and ncclRecv.	2020-06-09 17:44:08 -07:00

1 2

94 Commits