Wenkai Du
b815a2800f
Setup collectives threshold for enabling intranet ( #387 )
...
* Setup collectives threshold for enabling intranet
* Use separate operation counters for coll and p2p
2021-06-09 13:24:26 -07:00
Wenkai Du
a3a8c2d56b
Allow intranode use of network connection ( #383 )
...
* Allow intranode use of network connection
* Checking for graph for null pointer
2021-06-08 07:37:59 -07:00
Wenkai Du
961922ea02
Add option to enable multiple SAT in SHARP ( #380 )
...
* Add option to enable multiple SAT in SHARP
* Extend number of NICs to 16
2021-06-03 19:45:18 -07:00
Wenkai Du
13dc80ee14
topo_expl: update to 2.9.9
2021-05-26 09:24:34 -07:00
Wenkai Du
4c83adb75c
Update Rome models matching ( #376 )
2021-05-25 10:12:40 -07:00
Wenkai Du
a4ea1fed5b
Merge remote-tracking branch 'nccl/master' into develop
2021-05-05 16:01:01 -07:00
Wenkai Du
a79f74082e
Limit max channels for ring graph on single node Rome ( #347 )
...
* Limit max channels for ring graph on single node Rome
* Partially revert "Use non-temporal access for streaming data (#341 )"
2021-04-14 10:14:54 -07:00
Wenkai Du
1fe031402a
Add gfx90a target ( #344 )
...
* Add gfx90a target
* Support gfx90a topology
Co-authored-by: Eiden Yoshida <eiden.yoshida@amd.com >
2021-04-14 09:29:00 -06:00
Wenkai Du
9dfc2c183e
Use non-temporal access for streaming data ( #341 )
...
* Use non-temporal access for streaming data
* Revert to ulong2 after fixing compiling issue
2021-04-07 17:34:35 -07:00
Wenkai Du
e26ad2995e
Cleanup number of channels calculation ( #340 )
2021-04-05 17:51:56 -07:00
Wenkai Du
17491c918e
Fix incorrect net counting ( #339 )
...
* Fix incorrect net counting
* Add comments
2021-04-05 12:21:57 -07:00
Wenkai Du
1d2946ee4b
Rework network port trimming code ( #338 )
...
* Rework network port trimming code
* Move Rome related changes to separate source files
2021-03-31 10:25:59 -07:00
Wenkai Du
d87dc7c2e8
collnet: support multiple NICs ( #335 )
2021-03-25 20:59:32 -07:00
Wenkai Du
1d6244b18d
Enable collnet in RCCL ( #333 )
...
* Enable CollNet and use different number of channels
* topo_expl: enable collnet
2021-03-19 12:58:13 -07:00
Wenkai Du
8e180cf087
Revert "Port alltoall[v]" ( #325 )
...
This reverts commit f4d5d3d620 .
2021-03-06 13:59:31 -08:00
Wenkai Du
c018edf0f2
Enable local sendrecv over network if GDR is available on all GPUs ( #324 )
2021-03-05 19:59:41 -08:00
Wenkai Du
95f178324c
Add support to another Rome model
2021-02-18 02:00:31 +00:00
Wenkai Du
6dfdfef98f
Add gfx908 Rome 4 NICs model
2021-02-06 00:19:47 +00:00
Gilbert Lee
f372c53d52
[TransferBench] Fixing some merge issues
2021-02-05 16:46:20 +00:00
Wenkai Du
ab1e7a0318
Merge remote-tracking branch 'origin/develop' into 2.8.3
2021-02-04 20:02:34 -05:00
Gilbert Lee
2f541508c5
[topo_expl] Updating for 2.8.3
2021-02-04 19:08:42 +00:00
Gilbert Lee
9aac1ed38f
[ib-test] Update for 2.8.3]
2021-02-04 19:05:03 +00:00
Gilbert Lee
9ce203dd0a
[TransferBench] Updating for 2.8.3
2021-02-04 18:58:25 +00:00
gilbertlee-amd
62e0447e9a
[TransferBench] Restore some previous fixes - memory leak, PCIe address ( #314 )
2021-02-01 09:48:09 -07:00
gilbertlee-amd
3e62ceddc5
Clique kernel support ( #295 ) ( #15 )
...
* Adding experimental clique-based kernels (opt-in only)
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com >
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com >
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com >
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com >
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com >
2021-01-28 09:45:01 -07:00
Wenkai Du
2ddbe6646b
Improve collective trace
2021-01-14 19:28:01 -05:00
Wenkai Du
f4d5d3d620
Port alltoall[v]
2021-01-14 19:28:01 -05:00
Wenkai Du
d469947641
Merge remote-tracking branch 'nccl/master' into no-target-id
2021-01-14 19:27:53 -05:00
Wenkai Du
373a108516
Fix Rome PCIe 2 node topology generation ( #310 )
2020-12-15 17:16:17 -08:00
gilbertlee-amd
41c35dad48
[TransferBench] Fixing bug with fine-grained memory allocation ( #311 )
...
* Fixing bug with fine-grained memory
2020-12-15 17:37:31 -07:00
gilbertlee-amd
ae0c4092c7
[TransferBench] Adding ability to perform CPU-executed copies, various upgrades ( #309 )
...
* Adding CPU based execution, fixing typos, adding Fine-grained mem
* Exposing sampling factor when generating range of data sizes
* Refactoring how Links are launched, now once per thread
* Documentation updates
2020-12-11 10:21:14 -07:00
gilbertlee-amd
b80ae551b1
[TransferBench] Support multiple of 4 byte sizes, changing default GPU timing mechanism ( #307 )
...
* Changing default timing mechanism, adjusting CPU bandwidth calc, adding flag to use combined timing
* Adding support for smaller transfers (byte size must be multiple of 4 instead of 128)
2020-12-04 14:57:13 -07:00
Wenkai Du
975b14dffa
Add Rome model and improve search ( #305 )
2020-11-17 14:55:06 -08:00
gilbertlee-amd
41bcfb8878
Clique kernel support ( #295 )
...
* Adding experimental clique-based kernels (opt-in only)
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com >
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com >
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com >
2020-11-10 15:44:10 -07:00
Wenkai Du
dfa3c41ede
Add more Rome models ( #292 )
2020-10-30 21:26:04 -07:00
gilbertlee-amd
bfab1d3592
Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats ( #290 )
2020-10-27 09:00:33 -06:00
gilbertlee-amd
61e1a71d14
[TransferBench] Displaying PCIe Bus ID ( #288 )
...
* Adding PCIe BusID per GPU in topology display
2020-10-21 16:13:36 -06:00
gilbertlee-amd
769418c5c7
TransferBench Typo. Pinned host memory uses C not P ( #286 )
2020-10-21 12:05:38 -06:00
gilbertlee-amd
84a2541e01
Revert "Initial support for clique-based kernels ( #276 )" ( #280 )
...
This reverts commit 2b8184808d .
2020-10-15 11:30:18 -07:00
Wenkai Du
33babcb5e2
Update Rome single node models ( #277 )
2020-10-13 13:33:09 -07:00
gilbertlee-amd
2b8184808d
Initial support for clique-based kernels ( #276 )
...
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Wenkai Du
ae008fd2db
Rework Rome detection and add multiple network ports models ( #274 )
...
* Rework Rome detection and add multiple network ports models
* Remove unused opCount in p2p transport
2020-10-07 13:37:36 -07:00
Wenkai Du
b871ea3c0c
Add Alltoallv RCCL kernel implementation ( #269 )
...
* Add alltoallv API and implementation
* Extend Rome P2P channel limit to multinode and alltoall kernels
* topo_expl: fix compilation and sync up with main
* gtest: use RCCL alltoallv API
* Code review changes
2020-09-30 16:25:36 -07:00
gilbertlee-amd
ee262819a7
New TransferBench features ( #273 )
...
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars
2020-09-25 12:20:48 -06:00
lijietang
bbe233f8c1
Add rccl bw test script in tools ( #255 )
2020-09-11 16:59:03 +08:00
Wenkai Du
c5cbece6d0
Increase minimal channels for gfx908 ( #259 )
2020-08-26 11:40:11 -07:00
Wenkai Du
391bbf3f1e
Add NPS4 support on some models ( #256 )
...
* Add NPS4 support on some models
* Add XML models
2020-08-19 11:03:20 -07:00
gilbertlee-amd
ec9af40fcd
Upgrading various TransferBench features ( #257 )
2020-08-19 09:47:19 -06:00
Wenkai Du
a51e4071e3
Add another Rome model ( #249 )
...
* Add another Rome model
* Add gfx908 4P3L models and support
* Revert "Use cached value for detecting GDR support only once"
This reverts commit 67c8e72ce3 .
* Skip using ibverb for GPU direct RDMA detection
* Fine tune one Rome model
2020-08-17 10:51:02 -07:00
gilbertlee-amd
c985478133
Fixes to make TransferBench compile for hipclang ( #254 )
2020-08-13 12:25:28 -06:00