Commit Graph

77 Commits

Author SHA1 Message Date
Wenkai Du dfa3c41ede Add more Rome models (#292) 2020-10-30 21:26:04 -07:00
gilbertlee-amd bfab1d3592 Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats (#290) 2020-10-27 09:00:33 -06:00
gilbertlee-amd 61e1a71d14 [TransferBench] Displaying PCIe Bus ID (#288)
* Adding PCIe BusID per GPU in topology display
2020-10-21 16:13:36 -06:00
gilbertlee-amd 769418c5c7 TransferBench Typo. Pinned host memory uses C not P (#286) 2020-10-21 12:05:38 -06:00
gilbertlee-amd 84a2541e01 Revert "Initial support for clique-based kernels (#276)" (#280)
This reverts commit 2b8184808d.
2020-10-15 11:30:18 -07:00
Wenkai Du 33babcb5e2 Update Rome single node models (#277) 2020-10-13 13:33:09 -07:00
gilbertlee-amd 2b8184808d Initial support for clique-based kernels (#276)
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Wenkai Du ae008fd2db Rework Rome detection and add multiple network ports models (#274)
* Rework Rome detection and add multiple network ports models

* Remove unused opCount in p2p transport
2020-10-07 13:37:36 -07:00
Wenkai Du b871ea3c0c Add Alltoallv RCCL kernel implementation (#269)
* Add alltoallv API and implementation

* Extend Rome P2P channel limit to multinode and alltoall kernels

* topo_expl: fix compilation and sync up with main

* gtest: use RCCL alltoallv API

* Code review changes
2020-09-30 16:25:36 -07:00
gilbertlee-amd ee262819a7 New TransferBench features (#273)
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars
2020-09-25 12:20:48 -06:00
lijietang bbe233f8c1 Add rccl bw test script in tools (#255) 2020-09-11 16:59:03 +08:00
Wenkai Du c5cbece6d0 Increase minimal channels for gfx908 (#259) 2020-08-26 11:40:11 -07:00
Wenkai Du 391bbf3f1e Add NPS4 support on some models (#256)
* Add NPS4 support on some models

* Add XML models
2020-08-19 11:03:20 -07:00
gilbertlee-amd ec9af40fcd Upgrading various TransferBench features (#257) 2020-08-19 09:47:19 -06:00
Wenkai Du a51e4071e3 Add another Rome model (#249)
* Add another Rome model

* Add gfx908 4P3L models and support

* Revert "Use cached value for detecting GDR support only once"

This reverts commit 67c8e72ce3.

* Skip using ibverb for GPU direct RDMA detection

* Fine tune one Rome model
2020-08-17 10:51:02 -07:00
gilbertlee-amd c985478133 Fixes to make TransferBench compile for hipclang (#254) 2020-08-13 12:25:28 -06:00
Wenkai Du 7e3d8a31cc Collect gcnArch and hipDeviceArch_t in XML (#252) 2020-08-12 15:48:38 -07:00
Wenkai Du 3c46cb8ad4 Merge pull request #247 from wenkaidu/rome
Additional Rome models support
2020-08-07 10:56:12 -07:00
MurtadhaAldallal 390c63cf0d Update rccl_prim_test.cpp (#246)
Adding doublelocalcopy operation and freeing buffer memory at end.
DoubleLocalCopy Patch Added
2020-08-07 08:20:14 -07:00
Wenkai Du 09ef75656a Add more Rome 4P2H models 2020-08-06 18:20:02 +00:00
Wenkai Du e7a10aa0e4 Topology tuning for 4P2H on Rome (#242)
* Topology tuning for 4P2H on Rome

* Use ncclTopoIdToIndex
2020-07-27 11:53:57 -07:00
Wenkai Du 8d5fb920b6 ib-test: support multiple channels (#241) 2020-07-27 11:03:12 -07:00
Sourav Chakraborty 2475daafee add 4 node 8P6L 1 NIC 2nd Hive model 2020-07-22 16:27:15 +00:00
Sourav Chakraborty db55afb014 simplify model definitions in topo expl 2020-07-22 16:05:53 +00:00
Wenkai Du d5f90e19b5 Add 8P6L multi-node models (#239) 2020-07-21 14:10:36 -07:00
Wenkai Du ab787c767e Change default channels duplication for chordal ring (#233) 2020-07-14 15:16:50 -07:00
Stanley Tsang 9bd4c14603 Adding appropriate references in rccl-prim-test (#227)
Adding appropriate references to rccl-prim-test.
2020-07-06 10:15:03 -06:00
Wenkai Du d3548cc474 topo_expl: each rank needs to have its own memory for graphs (#225) 2020-07-01 15:11:02 -07:00
Wenkai Du a6be82f5ab topo_expl: fix broken build (#224) 2020-06-30 11:11:23 -07:00
Wenkai Du 0eb19a563a Use posix_memalign for network buffer allocation on host memory (#221)
* Use posix_memalign for network buffer allocation on host memory

* ib-test: add ability to specify run iterations

* ib-test: define iterations as multiple of default cycles

* Add checking to posix_memalign return value
2020-06-22 13:06:25 -07:00
Wenkai Du dc739c4e70 ib-test: support host memory allocation through posix_memalign (#220)
* ib-test: support host memory allocation through posix_memalign

* ib-test: add missing CUDACHECK to hip calls
2020-06-17 16:16:54 -07:00
Wenkai Du cfa97eccd3 Add IB/RDMA unit test 2020-06-16 18:29:17 +00:00
Wenkai Du e80e29573c Add gather, scatter and alltoall collectives
Introducing 3 new APIs:
ncclResult_t  ncclGather(const void* sendbuff, void* recvbuff, size_t sendcount,
    ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream);
ncclResult_t  ncclScatter(const void* sendbuff, void* recvbuff,
    size_t recvcount, ncclDataType_t datatype, int root, ncclComm_t comm,
    hipStream_t stream);
ncclResult_t  ncclAllToAll(const void* sendbuff, void* recvbuff, size_t count,
    ncclDataType_t datatype, ncclComm_t comm, hipStream_t stream);

Only out of place operation is supported.
Preprocessor symbol RCCL_GATHER_SCATTER=1 indicates API availibility.
By default the APIs launche RCCL kernel implementation, which can be disabled by
RCCL_ALLTOALL_KERNEL_DISABLE=1. Then the APIs use wrapper around ncclSend and ncclRecv.
2020-06-09 17:44:08 -07:00
Wenkai Du 71ec3e09df tpol_expl: update to 2.7 2020-06-09 17:40:24 -07:00
Wenkai Du 706de76046 Merge pull request #208 from wenkaidu/perf_xgmi
Give preference to path with more XGMI connections
2020-05-15 10:07:22 -07:00
Wenkai Du b3c9852634 Give preference to path with more XGMI connections 2020-05-14 15:33:16 -07:00
Wenkai Du f1058b6353 rccl-prim-test: add flags when calling hipExtLaunchMultiKernelMultiDevice in hip-clang 2020-05-12 23:54:07 +00:00
Saad Rahim 33c23fdcda Merge remote-tracking branch 'upstream/master' into develop 2020-04-29 16:12:37 -07:00
Wenkai Du 5743c6b7d2 topo_expl: fix build error 2020-04-27 17:17:05 +00:00
Gilbert Lee 339bf9ff19 Adding option to re-use streams instead of re-creating per topology 2020-04-23 15:53:40 +00:00
Wenkai Du ef7064ba9b rccl-prim-test: auto-detect rings in 4P and 8P configurations 2020-04-10 18:17:21 +00:00
Aaron Enye Shi a95090d981 Fix HIP-Clang build with HSA headers
HIP-Clang does not include these HSA headers, and they need to be explicitly added in RCCL.
2020-04-03 17:58:23 -04:00
Wenkai Du 6f54b23503 topo_expl: update to 2.6 2020-04-01 13:37:08 -07:00
Wenkai Du ebc823e603 rccl-prim-test: add all-to-all benchmark (#185)
For gfx908, support simple detection of ring topology.
Call ReduceOrCopyMulti directly from kernel.
Also simplify code by removing kernel start synchronization option
which has no effect on throughput measurements.
2020-03-16 10:00:54 -07:00
Wenkai Du 32388d60a9 topo_expl: add a few more single node models 2020-03-02 11:43:03 -08:00
Wenkai Du 498d5029ad Add topology visualizer tool 2020-02-26 15:23:34 -08:00
Wenkai Du 934b6de557 topo_expl: use bandwidth numbers defined in graph in CPU models 2020-02-26 14:17:36 -08:00
Wenkai Du d2adc61bf6 Revise PCI BW numbers on Rome 2020-02-26 13:17:49 -08:00
Wenkai Du 55f8e2dec7 Add topology explorer 2020-02-19 14:42:06 -08:00
Stanley Tsang 20fa04d9b6 Updating copyright notices for 2020. 2020-01-29 15:28:08 -08:00