rocm-systems

Autor	SHA1	Zpráva	Datum
gilbertlee-amd	61e1a71d14	[TransferBench] Displaying PCIe Bus ID (#288 ) * Adding PCIe BusID per GPU in topology display	2020-10-21 16:13:36 -06:00
gilbertlee-amd	769418c5c7	TransferBench Typo. Pinned host memory uses C not P (#286 )	2020-10-21 12:05:38 -06:00
gilbertlee-amd	84a2541e01	Revert "Initial support for clique-based kernels (#276 )" (#280 ) This reverts commit `2b8184808d`.	2020-10-15 11:30:18 -07:00
Wenkai Du	33babcb5e2	Update Rome single node models (#277 )	2020-10-13 13:33:09 -07:00
gilbertlee-amd	2b8184808d	Initial support for clique-based kernels (#276 ) * Initial support for clique-based kernels	2020-10-13 11:22:04 -06:00
Wenkai Du	ae008fd2db	Rework Rome detection and add multiple network ports models (#274 ) * Rework Rome detection and add multiple network ports models * Remove unused opCount in p2p transport	2020-10-07 13:37:36 -07:00
Wenkai Du	b871ea3c0c	Add Alltoallv RCCL kernel implementation (#269 ) * Add alltoallv API and implementation * Extend Rome P2P channel limit to multinode and alltoall kernels * topo_expl: fix compilation and sync up with main * gtest: use RCCL alltoallv API * Code review changes	2020-09-30 16:25:36 -07:00
gilbertlee-amd	ee262819a7	New TransferBench features (#273 ) * Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars	2020-09-25 12:20:48 -06:00
lijietang	bbe233f8c1	Add rccl bw test script in tools (#255 )	2020-09-11 16:59:03 +08:00
Wenkai Du	c5cbece6d0	Increase minimal channels for gfx908 (#259 )	2020-08-26 11:40:11 -07:00
Wenkai Du	391bbf3f1e	Add NPS4 support on some models (#256 ) * Add NPS4 support on some models * Add XML models	2020-08-19 11:03:20 -07:00
gilbertlee-amd	ec9af40fcd	Upgrading various TransferBench features (#257 )	2020-08-19 09:47:19 -06:00
Wenkai Du	a51e4071e3	Add another Rome model (#249 ) * Add another Rome model * Add gfx908 4P3L models and support * Revert "Use cached value for detecting GDR support only once" This reverts commit `67c8e72ce3`. * Skip using ibverb for GPU direct RDMA detection * Fine tune one Rome model	2020-08-17 10:51:02 -07:00
gilbertlee-amd	c985478133	Fixes to make TransferBench compile for hipclang (#254 )	2020-08-13 12:25:28 -06:00
Wenkai Du	7e3d8a31cc	Collect gcnArch and hipDeviceArch_t in XML (#252 )	2020-08-12 15:48:38 -07:00
Wenkai Du	3c46cb8ad4	Merge pull request #247 from wenkaidu/rome Additional Rome models support	2020-08-07 10:56:12 -07:00
MurtadhaAldallal	390c63cf0d	Update rccl_prim_test.cpp (#246 ) Adding doublelocalcopy operation and freeing buffer memory at end. DoubleLocalCopy Patch Added	2020-08-07 08:20:14 -07:00
Wenkai Du	09ef75656a	Add more Rome 4P2H models	2020-08-06 18:20:02 +00:00
Wenkai Du	e7a10aa0e4	Topology tuning for 4P2H on Rome (#242 ) * Topology tuning for 4P2H on Rome * Use ncclTopoIdToIndex	2020-07-27 11:53:57 -07:00
Wenkai Du	8d5fb920b6	ib-test: support multiple channels (#241 )	2020-07-27 11:03:12 -07:00
Sourav Chakraborty	2475daafee	add 4 node 8P6L 1 NIC 2nd Hive model	2020-07-22 16:27:15 +00:00
Sourav Chakraborty	db55afb014	simplify model definitions in topo expl	2020-07-22 16:05:53 +00:00
Wenkai Du	d5f90e19b5	Add 8P6L multi-node models (#239 )	2020-07-21 14:10:36 -07:00
Wenkai Du	ab787c767e	Change default channels duplication for chordal ring (#233 )	2020-07-14 15:16:50 -07:00
Stanley Tsang	9bd4c14603	Adding appropriate references in rccl-prim-test (#227 ) Adding appropriate references to rccl-prim-test.	2020-07-06 10:15:03 -06:00
Wenkai Du	d3548cc474	topo_expl: each rank needs to have its own memory for graphs (#225 )	2020-07-01 15:11:02 -07:00
Wenkai Du	a6be82f5ab	topo_expl: fix broken build (#224 )	2020-06-30 11:11:23 -07:00
Wenkai Du	0eb19a563a	Use posix_memalign for network buffer allocation on host memory (#221 ) * Use posix_memalign for network buffer allocation on host memory * ib-test: add ability to specify run iterations * ib-test: define iterations as multiple of default cycles * Add checking to posix_memalign return value	2020-06-22 13:06:25 -07:00
Wenkai Du	dc739c4e70	ib-test: support host memory allocation through posix_memalign (#220 ) * ib-test: support host memory allocation through posix_memalign * ib-test: add missing CUDACHECK to hip calls	2020-06-17 16:16:54 -07:00
Wenkai Du	cfa97eccd3	Add IB/RDMA unit test	2020-06-16 18:29:17 +00:00
Wenkai Du	e80e29573c	Add gather, scatter and alltoall collectives Introducing 3 new APIs: ncclResult_t ncclGather(const void* sendbuff, void* recvbuff, size_t sendcount, ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream); ncclResult_t ncclScatter(const void* sendbuff, void* recvbuff, size_t recvcount, ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream); ncclResult_t ncclAllToAll(const void* sendbuff, void* recvbuff, size_t count, ncclDataType_t datatype, ncclComm_t comm, hipStream_t stream); Only out of place operation is supported. Preprocessor symbol RCCL_GATHER_SCATTER=1 indicates API availibility. By default the APIs launche RCCL kernel implementation, which can be disabled by RCCL_ALLTOALL_KERNEL_DISABLE=1. Then the APIs use wrapper around ncclSend and ncclRecv.	2020-06-09 17:44:08 -07:00
Wenkai Du	71ec3e09df	tpol_expl: update to 2.7	2020-06-09 17:40:24 -07:00
Wenkai Du	706de76046	Merge pull request #208 from wenkaidu/perf_xgmi Give preference to path with more XGMI connections	2020-05-15 10:07:22 -07:00
Wenkai Du	b3c9852634	Give preference to path with more XGMI connections	2020-05-14 15:33:16 -07:00
Wenkai Du	f1058b6353	rccl-prim-test: add flags when calling hipExtLaunchMultiKernelMultiDevice in hip-clang	2020-05-12 23:54:07 +00:00
Saad Rahim	33c23fdcda	Merge remote-tracking branch 'upstream/master' into develop	2020-04-29 16:12:37 -07:00
Wenkai Du	5743c6b7d2	topo_expl: fix build error	2020-04-27 17:17:05 +00:00
Gilbert Lee	339bf9ff19	Adding option to re-use streams instead of re-creating per topology	2020-04-23 15:53:40 +00:00
Wenkai Du	ef7064ba9b	rccl-prim-test: auto-detect rings in 4P and 8P configurations	2020-04-10 18:17:21 +00:00
Aaron Enye Shi	a95090d981	Fix HIP-Clang build with HSA headers HIP-Clang does not include these HSA headers, and they need to be explicitly added in RCCL.	2020-04-03 17:58:23 -04:00
Wenkai Du	6f54b23503	topo_expl: update to 2.6	2020-04-01 13:37:08 -07:00
Wenkai Du	ebc823e603	rccl-prim-test: add all-to-all benchmark (#185 ) For gfx908, support simple detection of ring topology. Call ReduceOrCopyMulti directly from kernel. Also simplify code by removing kernel start synchronization option which has no effect on throughput measurements.	2020-03-16 10:00:54 -07:00
Wenkai Du	32388d60a9	topo_expl: add a few more single node models	2020-03-02 11:43:03 -08:00
Wenkai Du	498d5029ad	Add topology visualizer tool	2020-02-26 15:23:34 -08:00
Wenkai Du	934b6de557	topo_expl: use bandwidth numbers defined in graph in CPU models	2020-02-26 14:17:36 -08:00
Wenkai Du	d2adc61bf6	Revise PCI BW numbers on Rome	2020-02-26 13:17:49 -08:00
Wenkai Du	55f8e2dec7	Add topology explorer	2020-02-19 14:42:06 -08:00
Stanley Tsang	20fa04d9b6	Updating copyright notices for 2020.	2020-01-29 15:28:08 -08:00
Wenkai Du	fe6d012eb0	Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.5.6_cleanup	2020-01-29 15:28:03 -08:00
Wenkai Du	1e55645d97	Misc fixes and improvements for 2.5.6 1. Fix RCCL unit test 2. Add ROME detection and tuning 3. Change default P2P level 4. Fix search algorithm for XGMI 5. Remove explicit channel duplication with implicit by using half of link speed 6. Add collective trace support 7. Correct Intel Skylake CPU detection and bandwidth 8. Fix topo connect function 9. Disable GDR read and remove unreachable code 10. Disable LL128 kernels 11. Add tuning parameters 12. Use original clock64() implementation which returns RTC counter value 13. Print out timestamp of collective trace 14. Do not use struct ncclColl in kernel launch parameter 15. Fix abort handling and add tracing 17. Add __launch_bounds__ to kernel functions 18. Remove unused abortCount 19. Unset default MIN_NRINGS and MIN_NCHANNELS 20. Do not allocate shared memory when not using LL128 kernels 21. Correct time print out in tuning log	2020-01-29 15:27:05 -08:00

1 2

75 Commity