Commit Graph

111 Commits

Author SHA1 Message Date
Wenkai Du b815a2800f Setup collectives threshold for enabling intranet (#387)
* Setup collectives threshold for enabling intranet

* Use separate operation counters for coll and p2p
2021-06-09 13:24:26 -07:00
Wenkai Du a3a8c2d56b Allow intranode use of network connection (#383)
* Allow intranode use of network connection

* Checking for graph for null pointer
2021-06-08 07:37:59 -07:00
Wenkai Du 961922ea02 Add option to enable multiple SAT in SHARP (#380)
* Add option to enable multiple SAT in SHARP

* Extend number of NICs to 16
2021-06-03 19:45:18 -07:00
Wenkai Du 13dc80ee14 topo_expl: update to 2.9.9 2021-05-26 09:24:34 -07:00
Wenkai Du 4c83adb75c Update Rome models matching (#376) 2021-05-25 10:12:40 -07:00
Wenkai Du a4ea1fed5b Merge remote-tracking branch 'nccl/master' into develop 2021-05-05 16:01:01 -07:00
Wenkai Du a79f74082e Limit max channels for ring graph on single node Rome (#347)
* Limit max channels for ring graph on single node Rome
* Partially revert "Use non-temporal access for streaming data (#341)"
2021-04-14 10:14:54 -07:00
Wenkai Du 1fe031402a Add gfx90a target (#344)
* Add gfx90a target

* Support gfx90a topology

Co-authored-by: Eiden Yoshida <eiden.yoshida@amd.com>
2021-04-14 09:29:00 -06:00
Wenkai Du 9dfc2c183e Use non-temporal access for streaming data (#341)
* Use non-temporal access for streaming data

* Revert to ulong2 after fixing compiling issue
2021-04-07 17:34:35 -07:00
Wenkai Du e26ad2995e Cleanup number of channels calculation (#340) 2021-04-05 17:51:56 -07:00
Wenkai Du 17491c918e Fix incorrect net counting (#339)
* Fix incorrect net counting

* Add comments
2021-04-05 12:21:57 -07:00
Wenkai Du 1d2946ee4b Rework network port trimming code (#338)
* Rework network port trimming code

* Move Rome related changes to separate source files
2021-03-31 10:25:59 -07:00
Wenkai Du d87dc7c2e8 collnet: support multiple NICs (#335) 2021-03-25 20:59:32 -07:00
Wenkai Du 1d6244b18d Enable collnet in RCCL (#333)
* Enable CollNet and use different number of channels

* topo_expl: enable collnet
2021-03-19 12:58:13 -07:00
Wenkai Du 8e180cf087 Revert "Port alltoall[v]" (#325)
This reverts commit f4d5d3d620.
2021-03-06 13:59:31 -08:00
Wenkai Du c018edf0f2 Enable local sendrecv over network if GDR is available on all GPUs (#324) 2021-03-05 19:59:41 -08:00
Wenkai Du 95f178324c Add support to another Rome model 2021-02-18 02:00:31 +00:00
Wenkai Du 6dfdfef98f Add gfx908 Rome 4 NICs model 2021-02-06 00:19:47 +00:00
Gilbert Lee f372c53d52 [TransferBench] Fixing some merge issues 2021-02-05 16:46:20 +00:00
Wenkai Du ab1e7a0318 Merge remote-tracking branch 'origin/develop' into 2.8.3 2021-02-04 20:02:34 -05:00
Gilbert Lee 2f541508c5 [topo_expl] Updating for 2.8.3 2021-02-04 19:08:42 +00:00
Gilbert Lee 9aac1ed38f [ib-test] Update for 2.8.3] 2021-02-04 19:05:03 +00:00
Gilbert Lee 9ce203dd0a [TransferBench] Updating for 2.8.3 2021-02-04 18:58:25 +00:00
gilbertlee-amd 62e0447e9a [TransferBench] Restore some previous fixes - memory leak, PCIe address (#314) 2021-02-01 09:48:09 -07:00
gilbertlee-amd 3e62ceddc5 Clique kernel support (#295) (#15)
* Adding experimental clique-based kernels (opt-in only)

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>
2021-01-28 09:45:01 -07:00
Wenkai Du 2ddbe6646b Improve collective trace 2021-01-14 19:28:01 -05:00
Wenkai Du f4d5d3d620 Port alltoall[v] 2021-01-14 19:28:01 -05:00
Wenkai Du d469947641 Merge remote-tracking branch 'nccl/master' into no-target-id 2021-01-14 19:27:53 -05:00
Wenkai Du 373a108516 Fix Rome PCIe 2 node topology generation (#310) 2020-12-15 17:16:17 -08:00
gilbertlee-amd 41c35dad48 [TransferBench] Fixing bug with fine-grained memory allocation (#311)
* Fixing bug with fine-grained memory
2020-12-15 17:37:31 -07:00
gilbertlee-amd ae0c4092c7 [TransferBench] Adding ability to perform CPU-executed copies, various upgrades (#309)
* Adding CPU based execution, fixing typos, adding Fine-grained mem
* Exposing sampling factor when generating range of data sizes
* Refactoring how Links are launched, now once per thread
* Documentation updates
2020-12-11 10:21:14 -07:00
gilbertlee-amd b80ae551b1 [TransferBench] Support multiple of 4 byte sizes, changing default GPU timing mechanism (#307)
* Changing default timing mechanism, adjusting CPU bandwidth calc, adding flag to use combined timing
* Adding support for smaller transfers (byte size must be multiple of 4 instead of 128)
2020-12-04 14:57:13 -07:00
Wenkai Du 975b14dffa Add Rome model and improve search (#305) 2020-11-17 14:55:06 -08:00
gilbertlee-amd 41bcfb8878 Clique kernel support (#295)
* Adding experimental clique-based kernels (opt-in only)

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>
2020-11-10 15:44:10 -07:00
Wenkai Du dfa3c41ede Add more Rome models (#292) 2020-10-30 21:26:04 -07:00
gilbertlee-amd bfab1d3592 Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats (#290) 2020-10-27 09:00:33 -06:00
gilbertlee-amd 61e1a71d14 [TransferBench] Displaying PCIe Bus ID (#288)
* Adding PCIe BusID per GPU in topology display
2020-10-21 16:13:36 -06:00
gilbertlee-amd 769418c5c7 TransferBench Typo. Pinned host memory uses C not P (#286) 2020-10-21 12:05:38 -06:00
gilbertlee-amd 84a2541e01 Revert "Initial support for clique-based kernels (#276)" (#280)
This reverts commit 2b8184808d.
2020-10-15 11:30:18 -07:00
Wenkai Du 33babcb5e2 Update Rome single node models (#277) 2020-10-13 13:33:09 -07:00
gilbertlee-amd 2b8184808d Initial support for clique-based kernels (#276)
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Wenkai Du ae008fd2db Rework Rome detection and add multiple network ports models (#274)
* Rework Rome detection and add multiple network ports models

* Remove unused opCount in p2p transport
2020-10-07 13:37:36 -07:00
Wenkai Du b871ea3c0c Add Alltoallv RCCL kernel implementation (#269)
* Add alltoallv API and implementation

* Extend Rome P2P channel limit to multinode and alltoall kernels

* topo_expl: fix compilation and sync up with main

* gtest: use RCCL alltoallv API

* Code review changes
2020-09-30 16:25:36 -07:00
gilbertlee-amd ee262819a7 New TransferBench features (#273)
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars
2020-09-25 12:20:48 -06:00
lijietang bbe233f8c1 Add rccl bw test script in tools (#255) 2020-09-11 16:59:03 +08:00
Wenkai Du c5cbece6d0 Increase minimal channels for gfx908 (#259) 2020-08-26 11:40:11 -07:00
Wenkai Du 391bbf3f1e Add NPS4 support on some models (#256)
* Add NPS4 support on some models

* Add XML models
2020-08-19 11:03:20 -07:00
gilbertlee-amd ec9af40fcd Upgrading various TransferBench features (#257) 2020-08-19 09:47:19 -06:00
Wenkai Du a51e4071e3 Add another Rome model (#249)
* Add another Rome model

* Add gfx908 4P3L models and support

* Revert "Use cached value for detecting GDR support only once"

This reverts commit 67c8e72ce3.

* Skip using ibverb for GPU direct RDMA detection

* Fine tune one Rome model
2020-08-17 10:51:02 -07:00
gilbertlee-amd c985478133 Fixes to make TransferBench compile for hipclang (#254) 2020-08-13 12:25:28 -06:00