Wenkai Du
dfa3c41ede
Add more Rome models ( #292 )
2020-10-30 21:26:04 -07:00
gilbertlee-amd
bfab1d3592
Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats ( #290 )
2020-10-27 09:00:33 -06:00
gilbertlee-amd
61e1a71d14
[TransferBench] Displaying PCIe Bus ID ( #288 )
...
* Adding PCIe BusID per GPU in topology display
2020-10-21 16:13:36 -06:00
gilbertlee-amd
769418c5c7
TransferBench Typo. Pinned host memory uses C not P ( #286 )
2020-10-21 12:05:38 -06:00
gilbertlee-amd
84a2541e01
Revert "Initial support for clique-based kernels ( #276 )" ( #280 )
...
This reverts commit 2b8184808d .
2020-10-15 11:30:18 -07:00
Wenkai Du
33babcb5e2
Update Rome single node models ( #277 )
2020-10-13 13:33:09 -07:00
gilbertlee-amd
2b8184808d
Initial support for clique-based kernels ( #276 )
...
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Wenkai Du
ae008fd2db
Rework Rome detection and add multiple network ports models ( #274 )
...
* Rework Rome detection and add multiple network ports models
* Remove unused opCount in p2p transport
2020-10-07 13:37:36 -07:00
Wenkai Du
b871ea3c0c
Add Alltoallv RCCL kernel implementation ( #269 )
...
* Add alltoallv API and implementation
* Extend Rome P2P channel limit to multinode and alltoall kernels
* topo_expl: fix compilation and sync up with main
* gtest: use RCCL alltoallv API
* Code review changes
2020-09-30 16:25:36 -07:00
gilbertlee-amd
ee262819a7
New TransferBench features ( #273 )
...
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars
2020-09-25 12:20:48 -06:00
lijietang
bbe233f8c1
Add rccl bw test script in tools ( #255 )
2020-09-11 16:59:03 +08:00
Wenkai Du
c5cbece6d0
Increase minimal channels for gfx908 ( #259 )
2020-08-26 11:40:11 -07:00
Wenkai Du
391bbf3f1e
Add NPS4 support on some models ( #256 )
...
* Add NPS4 support on some models
* Add XML models
2020-08-19 11:03:20 -07:00
gilbertlee-amd
ec9af40fcd
Upgrading various TransferBench features ( #257 )
2020-08-19 09:47:19 -06:00
Wenkai Du
a51e4071e3
Add another Rome model ( #249 )
...
* Add another Rome model
* Add gfx908 4P3L models and support
* Revert "Use cached value for detecting GDR support only once"
This reverts commit 67c8e72ce3 .
* Skip using ibverb for GPU direct RDMA detection
* Fine tune one Rome model
2020-08-17 10:51:02 -07:00
gilbertlee-amd
c985478133
Fixes to make TransferBench compile for hipclang ( #254 )
2020-08-13 12:25:28 -06:00
Wenkai Du
7e3d8a31cc
Collect gcnArch and hipDeviceArch_t in XML ( #252 )
2020-08-12 15:48:38 -07:00
Wenkai Du
3c46cb8ad4
Merge pull request #247 from wenkaidu/rome
...
Additional Rome models support
2020-08-07 10:56:12 -07:00
MurtadhaAldallal
390c63cf0d
Update rccl_prim_test.cpp ( #246 )
...
Adding doublelocalcopy operation and freeing buffer memory at end.
DoubleLocalCopy Patch Added
2020-08-07 08:20:14 -07:00
Wenkai Du
09ef75656a
Add more Rome 4P2H models
2020-08-06 18:20:02 +00:00
Wenkai Du
e7a10aa0e4
Topology tuning for 4P2H on Rome ( #242 )
...
* Topology tuning for 4P2H on Rome
* Use ncclTopoIdToIndex
2020-07-27 11:53:57 -07:00
Wenkai Du
8d5fb920b6
ib-test: support multiple channels ( #241 )
2020-07-27 11:03:12 -07:00
Sourav Chakraborty
2475daafee
add 4 node 8P6L 1 NIC 2nd Hive model
2020-07-22 16:27:15 +00:00
Sourav Chakraborty
db55afb014
simplify model definitions in topo expl
2020-07-22 16:05:53 +00:00
Wenkai Du
d5f90e19b5
Add 8P6L multi-node models ( #239 )
2020-07-21 14:10:36 -07:00
Wenkai Du
ab787c767e
Change default channels duplication for chordal ring ( #233 )
2020-07-14 15:16:50 -07:00
Stanley Tsang
9bd4c14603
Adding appropriate references in rccl-prim-test ( #227 )
...
Adding appropriate references to rccl-prim-test.
2020-07-06 10:15:03 -06:00
Wenkai Du
d3548cc474
topo_expl: each rank needs to have its own memory for graphs ( #225 )
2020-07-01 15:11:02 -07:00
Wenkai Du
a6be82f5ab
topo_expl: fix broken build ( #224 )
2020-06-30 11:11:23 -07:00
Wenkai Du
0eb19a563a
Use posix_memalign for network buffer allocation on host memory ( #221 )
...
* Use posix_memalign for network buffer allocation on host memory
* ib-test: add ability to specify run iterations
* ib-test: define iterations as multiple of default cycles
* Add checking to posix_memalign return value
2020-06-22 13:06:25 -07:00
Wenkai Du
dc739c4e70
ib-test: support host memory allocation through posix_memalign ( #220 )
...
* ib-test: support host memory allocation through posix_memalign
* ib-test: add missing CUDACHECK to hip calls
2020-06-17 16:16:54 -07:00
Wenkai Du
cfa97eccd3
Add IB/RDMA unit test
2020-06-16 18:29:17 +00:00
Wenkai Du
e80e29573c
Add gather, scatter and alltoall collectives
...
Introducing 3 new APIs:
ncclResult_t ncclGather(const void* sendbuff, void* recvbuff, size_t sendcount,
ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream);
ncclResult_t ncclScatter(const void* sendbuff, void* recvbuff,
size_t recvcount, ncclDataType_t datatype, int root, ncclComm_t comm,
hipStream_t stream);
ncclResult_t ncclAllToAll(const void* sendbuff, void* recvbuff, size_t count,
ncclDataType_t datatype, ncclComm_t comm, hipStream_t stream);
Only out of place operation is supported.
Preprocessor symbol RCCL_GATHER_SCATTER=1 indicates API availibility.
By default the APIs launche RCCL kernel implementation, which can be disabled by
RCCL_ALLTOALL_KERNEL_DISABLE=1. Then the APIs use wrapper around ncclSend and ncclRecv.
2020-06-09 17:44:08 -07:00
Wenkai Du
71ec3e09df
tpol_expl: update to 2.7
2020-06-09 17:40:24 -07:00
Wenkai Du
706de76046
Merge pull request #208 from wenkaidu/perf_xgmi
...
Give preference to path with more XGMI connections
2020-05-15 10:07:22 -07:00
Wenkai Du
b3c9852634
Give preference to path with more XGMI connections
2020-05-14 15:33:16 -07:00
Wenkai Du
f1058b6353
rccl-prim-test: add flags when calling hipExtLaunchMultiKernelMultiDevice in hip-clang
2020-05-12 23:54:07 +00:00
Saad Rahim
33c23fdcda
Merge remote-tracking branch 'upstream/master' into develop
2020-04-29 16:12:37 -07:00
Wenkai Du
5743c6b7d2
topo_expl: fix build error
2020-04-27 17:17:05 +00:00
Gilbert Lee
339bf9ff19
Adding option to re-use streams instead of re-creating per topology
2020-04-23 15:53:40 +00:00
Wenkai Du
ef7064ba9b
rccl-prim-test: auto-detect rings in 4P and 8P configurations
2020-04-10 18:17:21 +00:00
Aaron Enye Shi
a95090d981
Fix HIP-Clang build with HSA headers
...
HIP-Clang does not include these HSA headers, and they need to be explicitly added in RCCL.
2020-04-03 17:58:23 -04:00
Wenkai Du
6f54b23503
topo_expl: update to 2.6
2020-04-01 13:37:08 -07:00
Wenkai Du
ebc823e603
rccl-prim-test: add all-to-all benchmark ( #185 )
...
For gfx908, support simple detection of ring topology.
Call ReduceOrCopyMulti directly from kernel.
Also simplify code by removing kernel start synchronization option
which has no effect on throughput measurements.
2020-03-16 10:00:54 -07:00
Wenkai Du
32388d60a9
topo_expl: add a few more single node models
2020-03-02 11:43:03 -08:00
Wenkai Du
498d5029ad
Add topology visualizer tool
2020-02-26 15:23:34 -08:00
Wenkai Du
934b6de557
topo_expl: use bandwidth numbers defined in graph in CPU models
2020-02-26 14:17:36 -08:00
Wenkai Du
d2adc61bf6
Revise PCI BW numbers on Rome
2020-02-26 13:17:49 -08:00
Wenkai Du
55f8e2dec7
Add topology explorer
2020-02-19 14:42:06 -08:00
Stanley Tsang
20fa04d9b6
Updating copyright notices for 2020.
2020-01-29 15:28:08 -08:00