Graf commitů

75 Commity

Autor SHA1 Zpráva Datum
gilbertlee-amd 61e1a71d14 [TransferBench] Displaying PCIe Bus ID (#288)
* Adding PCIe BusID per GPU in topology display
2020-10-21 16:13:36 -06:00
gilbertlee-amd 769418c5c7 TransferBench Typo. Pinned host memory uses C not P (#286) 2020-10-21 12:05:38 -06:00
gilbertlee-amd 84a2541e01 Revert "Initial support for clique-based kernels (#276)" (#280)
This reverts commit 2b8184808d.
2020-10-15 11:30:18 -07:00
Wenkai Du 33babcb5e2 Update Rome single node models (#277) 2020-10-13 13:33:09 -07:00
gilbertlee-amd 2b8184808d Initial support for clique-based kernels (#276)
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Wenkai Du ae008fd2db Rework Rome detection and add multiple network ports models (#274)
* Rework Rome detection and add multiple network ports models

* Remove unused opCount in p2p transport
2020-10-07 13:37:36 -07:00
Wenkai Du b871ea3c0c Add Alltoallv RCCL kernel implementation (#269)
* Add alltoallv API and implementation

* Extend Rome P2P channel limit to multinode and alltoall kernels

* topo_expl: fix compilation and sync up with main

* gtest: use RCCL alltoallv API

* Code review changes
2020-09-30 16:25:36 -07:00
gilbertlee-amd ee262819a7 New TransferBench features (#273)
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars
2020-09-25 12:20:48 -06:00
lijietang bbe233f8c1 Add rccl bw test script in tools (#255) 2020-09-11 16:59:03 +08:00
Wenkai Du c5cbece6d0 Increase minimal channels for gfx908 (#259) 2020-08-26 11:40:11 -07:00
Wenkai Du 391bbf3f1e Add NPS4 support on some models (#256)
* Add NPS4 support on some models

* Add XML models
2020-08-19 11:03:20 -07:00
gilbertlee-amd ec9af40fcd Upgrading various TransferBench features (#257) 2020-08-19 09:47:19 -06:00
Wenkai Du a51e4071e3 Add another Rome model (#249)
* Add another Rome model

* Add gfx908 4P3L models and support

* Revert "Use cached value for detecting GDR support only once"

This reverts commit 67c8e72ce3.

* Skip using ibverb for GPU direct RDMA detection

* Fine tune one Rome model
2020-08-17 10:51:02 -07:00
gilbertlee-amd c985478133 Fixes to make TransferBench compile for hipclang (#254) 2020-08-13 12:25:28 -06:00
Wenkai Du 7e3d8a31cc Collect gcnArch and hipDeviceArch_t in XML (#252) 2020-08-12 15:48:38 -07:00
Wenkai Du 3c46cb8ad4 Merge pull request #247 from wenkaidu/rome
Additional Rome models support
2020-08-07 10:56:12 -07:00
MurtadhaAldallal 390c63cf0d Update rccl_prim_test.cpp (#246)
Adding doublelocalcopy operation and freeing buffer memory at end.
DoubleLocalCopy Patch Added
2020-08-07 08:20:14 -07:00
Wenkai Du 09ef75656a Add more Rome 4P2H models 2020-08-06 18:20:02 +00:00
Wenkai Du e7a10aa0e4 Topology tuning for 4P2H on Rome (#242)
* Topology tuning for 4P2H on Rome

* Use ncclTopoIdToIndex
2020-07-27 11:53:57 -07:00
Wenkai Du 8d5fb920b6 ib-test: support multiple channels (#241) 2020-07-27 11:03:12 -07:00
Sourav Chakraborty 2475daafee add 4 node 8P6L 1 NIC 2nd Hive model 2020-07-22 16:27:15 +00:00
Sourav Chakraborty db55afb014 simplify model definitions in topo expl 2020-07-22 16:05:53 +00:00
Wenkai Du d5f90e19b5 Add 8P6L multi-node models (#239) 2020-07-21 14:10:36 -07:00
Wenkai Du ab787c767e Change default channels duplication for chordal ring (#233) 2020-07-14 15:16:50 -07:00
Stanley Tsang 9bd4c14603 Adding appropriate references in rccl-prim-test (#227)
Adding appropriate references to rccl-prim-test.
2020-07-06 10:15:03 -06:00
Wenkai Du d3548cc474 topo_expl: each rank needs to have its own memory for graphs (#225) 2020-07-01 15:11:02 -07:00
Wenkai Du a6be82f5ab topo_expl: fix broken build (#224) 2020-06-30 11:11:23 -07:00
Wenkai Du 0eb19a563a Use posix_memalign for network buffer allocation on host memory (#221)
* Use posix_memalign for network buffer allocation on host memory

* ib-test: add ability to specify run iterations

* ib-test: define iterations as multiple of default cycles

* Add checking to posix_memalign return value
2020-06-22 13:06:25 -07:00
Wenkai Du dc739c4e70 ib-test: support host memory allocation through posix_memalign (#220)
* ib-test: support host memory allocation through posix_memalign

* ib-test: add missing CUDACHECK to hip calls
2020-06-17 16:16:54 -07:00
Wenkai Du cfa97eccd3 Add IB/RDMA unit test 2020-06-16 18:29:17 +00:00
Wenkai Du e80e29573c Add gather, scatter and alltoall collectives
Introducing 3 new APIs:
ncclResult_t  ncclGather(const void* sendbuff, void* recvbuff, size_t sendcount,
    ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream);
ncclResult_t  ncclScatter(const void* sendbuff, void* recvbuff,
    size_t recvcount, ncclDataType_t datatype, int root, ncclComm_t comm,
    hipStream_t stream);
ncclResult_t  ncclAllToAll(const void* sendbuff, void* recvbuff, size_t count,
    ncclDataType_t datatype, ncclComm_t comm, hipStream_t stream);

Only out of place operation is supported.
Preprocessor symbol RCCL_GATHER_SCATTER=1 indicates API availibility.
By default the APIs launche RCCL kernel implementation, which can be disabled by
RCCL_ALLTOALL_KERNEL_DISABLE=1. Then the APIs use wrapper around ncclSend and ncclRecv.
2020-06-09 17:44:08 -07:00
Wenkai Du 71ec3e09df tpol_expl: update to 2.7 2020-06-09 17:40:24 -07:00
Wenkai Du 706de76046 Merge pull request #208 from wenkaidu/perf_xgmi
Give preference to path with more XGMI connections
2020-05-15 10:07:22 -07:00
Wenkai Du b3c9852634 Give preference to path with more XGMI connections 2020-05-14 15:33:16 -07:00
Wenkai Du f1058b6353 rccl-prim-test: add flags when calling hipExtLaunchMultiKernelMultiDevice in hip-clang 2020-05-12 23:54:07 +00:00
Saad Rahim 33c23fdcda Merge remote-tracking branch 'upstream/master' into develop 2020-04-29 16:12:37 -07:00
Wenkai Du 5743c6b7d2 topo_expl: fix build error 2020-04-27 17:17:05 +00:00
Gilbert Lee 339bf9ff19 Adding option to re-use streams instead of re-creating per topology 2020-04-23 15:53:40 +00:00
Wenkai Du ef7064ba9b rccl-prim-test: auto-detect rings in 4P and 8P configurations 2020-04-10 18:17:21 +00:00
Aaron Enye Shi a95090d981 Fix HIP-Clang build with HSA headers
HIP-Clang does not include these HSA headers, and they need to be explicitly added in RCCL.
2020-04-03 17:58:23 -04:00
Wenkai Du 6f54b23503 topo_expl: update to 2.6 2020-04-01 13:37:08 -07:00
Wenkai Du ebc823e603 rccl-prim-test: add all-to-all benchmark (#185)
For gfx908, support simple detection of ring topology.
Call ReduceOrCopyMulti directly from kernel.
Also simplify code by removing kernel start synchronization option
which has no effect on throughput measurements.
2020-03-16 10:00:54 -07:00
Wenkai Du 32388d60a9 topo_expl: add a few more single node models 2020-03-02 11:43:03 -08:00
Wenkai Du 498d5029ad Add topology visualizer tool 2020-02-26 15:23:34 -08:00
Wenkai Du 934b6de557 topo_expl: use bandwidth numbers defined in graph in CPU models 2020-02-26 14:17:36 -08:00
Wenkai Du d2adc61bf6 Revise PCI BW numbers on Rome 2020-02-26 13:17:49 -08:00
Wenkai Du 55f8e2dec7 Add topology explorer 2020-02-19 14:42:06 -08:00
Stanley Tsang 20fa04d9b6 Updating copyright notices for 2020. 2020-01-29 15:28:08 -08:00
Wenkai Du fe6d012eb0 Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.5.6_cleanup 2020-01-29 15:28:03 -08:00
Wenkai Du 1e55645d97 Misc fixes and improvements for 2.5.6
1. Fix RCCL unit test
2. Add ROME detection and tuning
3. Change default P2P level
4. Fix search algorithm for XGMI
5. Remove explicit channel duplication with implicit by using half of link speed
6. Add collective trace support
7. Correct Intel Skylake CPU detection and bandwidth
8. Fix topo connect function
9. Disable GDR read and remove unreachable code
10. Disable LL128 kernels
11. Add tuning parameters
12. Use original clock64() implementation which returns RTC counter value
13. Print out timestamp of collective trace
14. Do not use struct ncclColl in kernel launch parameter
15. Fix abort handling and add tracing
17. Add __launch_bounds__ to kernel functions
18. Remove unused abortCount
19. Unset default MIN_NRINGS and MIN_NCHANNELS
20. Do not allocate shared memory when not using LL128 kernels
21. Correct time print out in tuning log
2020-01-29 15:27:05 -08:00