Revīziju grafs

165 Revīzijas

Autors SHA1 Ziņojums Datums
Edgar 0336ffdf70 Introduce multi-rank support per device.
This is a single commit of the source code changes required to
introduce support for multiple ranks per device.
A new interface (ncclCommRankInitMulti) has to be used to make use of
this new feature.
2022-06-10 14:23:12 +00:00
Wenkai Du ef499c4810 Add another Rome model (#553)
* Add another Rome model

* Add option to force enable intranet on single node

* Limit p2p channels to number of ranks

* Refine p2p channels handling
2022-05-31 11:31:30 -07:00
Wenkai Du c5b77121f0 Update Rome model (#552) 2022-05-26 09:59:23 -07:00
akolliasAMD 98f0809a39 Added creation of new tree and added switch for using treesplit for specific cases (#551) 2022-05-25 18:55:14 -04:00
Wenkai Du 283dc86a73 Refine and add new Rome models (#548) 2022-05-17 08:23:59 -07:00
gilbertlee-amd 685bcea127 [TransferBench] Syncing with TransferBench v1.02 (#541) 2022-04-27 20:43:24 -06:00
Wenkai Du 063da25563 topo_expl: fix build and add tuning support (#539) 2022-04-26 15:40:07 -07:00
Wenkai Du d28e1cb44f Merge remote-tracking branch 'nccl/master' into develop 2022-04-18 11:15:25 -07:00
Wenkai Du 2151c79d14 Add new Rome model (#536) 2022-04-13 11:45:40 -07:00
Wenkai Du ba4c165bf3 Add new Rome model (#535) 2022-04-12 13:27:32 -07:00
gilbertlee-amd def6832287 Transfer bench single stream mode (#531)
- Adding single stream mode
- Removing some unused env vars
- Adding output to CSV mode for p2p benchmark, topology listing modes
2022-04-08 15:20:55 -06:00
Wenkai Du bbe780ca6c Support multiple tuning tables (#522)
* Support multiple tuning tables

* [UnitTests] Skip managed memory testing
2022-03-31 17:09:21 -07:00
gilbertlee-amd 2d558c9abc Adding explicit request for coarse-grained host memory due to changes in HipHostMalloc (#517) 2022-03-25 13:05:07 -06:00
Wenkai Du cd17cf6dce Update Rome model matching and add new models (#516)
* Update Rome model matching and add new models

* Add missing file

* Models update
2022-03-21 10:54:40 -07:00
Ziyue Yang b569c0a1db Add Pivot AllToAll algorithm for Rome model (#503)
* add a2a pivot interface

* remove debug info

* address comments

* fix bug

* remove custom script

* address comments

* fix bug
2022-02-20 21:09:47 -08:00
gilbertlee-amd f3c2cafd9d [TransferBench] Fix for cases with subsets of configured numa nodes (#495) 2022-02-07 12:16:19 -07:00
gilbertlee-amd 84d5fce7dd TransferBench: Adding ability to reindex GPUs based on PCIe address (#494) 2022-02-02 08:51:41 -07:00
Wenkai Du 598c6fdded Update Rome models (#491) 2022-01-14 10:03:30 -08:00
Wenkai Du 369c021992 topo_expl: update for 2.11.4 (#490)
* topo_expl: update for 2.11.4

* topo_expl: revert a few logging changes
2022-01-13 13:33:07 -08:00
gilbertlee-amd 2530a2f084 [TransferBench] Updating for 2.11.4. Decoupling from RCCL kernel (#485) 2022-01-05 16:33:25 -07:00
Wenkai Du 4234a638b5 Merge pull request #482 from ROCmSoftwarePlatform/2.11.4
Sync up with 2.11.4
2022-01-05 09:31:51 -08:00
Wenkai Du f8d0775a6f Add another Rome model (#483) 2022-01-05 09:26:31 -08:00
Wenkai Du 434ecb0e1f Merge remote-tracking branch 'origin/develop' into 2.11.4 2022-01-03 09:54:16 -08:00
gilbertlee-amd 1157c2edfe [TransferBench] Adding more preset benchmarks to filter read mode, cpu vs gpu pairs (#477) 2021-11-24 18:05:37 -07:00
Wenkai Du 3a919c1f49 Merge remote-tracking branch 'nccl/master' into develop 2021-11-11 14:22:12 -08:00
gilbertlee-amd 1c7ef1b790 [TransferBench] Adding #CUs / RRLW mode to p2p benchmark (#464) 2021-11-08 14:36:04 -07:00
Wenkai Du 0331e39f81 Update Rome model matching (#461)
* Update Rome model matching

* Add another Rome model

* Automatically setup NET GDR level from model
2021-11-05 08:53:47 -07:00
Wenkai Du 14a184eb67 Query XGMI link count through rocm_smi_lib API (#442) 2021-10-26 10:30:20 -07:00
gilbertlee-amd 18246fc191 [TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var (#446) 2021-10-25 11:23:29 -06:00
gilbertlee-amd 550d732d6c TransferBench p2p benchmark mode (#444)
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
2021-10-21 15:28:16 -06:00
gilbertlee-amd f6b7ac693e [TransferBench] Adding comment echoing to help distinguish tests (#438) 2021-10-13 14:56:57 -06:00
gilbertlee-amd 269f07fbc3 [TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU (#436) 2021-10-12 09:32:54 -06:00
Wenkai Du 2249a1d9d3 Add more Rome models (#434)
* Add more Rome models

* Update models and tuning

* Update tuning
2021-10-12 08:23:20 -07:00
gilbertlee-amd aa917c3fc8 [TransferBench] Adding ability to specify suffix for numBytes (#435) 2021-10-08 16:36:19 -06:00
gilbertlee-amd e506d14d18 [TransferBench] Fixing advanced config, adding new all-1-hop sample test (#433)
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
2021-10-07 15:57:21 -06:00
Wenkai Du e0053311c0 Add another Rome model (#431) 2021-10-06 08:17:12 -07:00
Wenkai Du 29c729d8b6 Trim NICs when all GPUs are connected by XGMI (#430)
* Trim NICs when all GPUs are connected by XGMI

* Only enable clique with maximum of 2 hops
2021-10-05 18:27:43 -07:00
Gilbert Lee 68ec3f84e6 [TransferBench] Update to 2.10.3 2021-08-02 05:53:20 -05:00
Wenkai Du 8ee2b7932a Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-09-13 15:51:53 -07:00
Wenkai Du a2421f8b4a Merge pull request #423 from wenkaidu/prim-test
rccl-prim-test: support 8p1h and 16p1h testing
2021-09-08 17:01:19 -07:00
Wenkai Du 7558b5e2bf rccl-prim-test: enable 8p1h and 16p1h test 2021-09-08 11:51:26 -05:00
Wenkai Du b22d097524 Revert "rccl-prim-test: add all-to-all benchmark (#185)"
This reverts commit ebc823e603.
2021-09-07 16:41:46 -05:00
gilbertlee-amd 51d64894ff [TransferBench] ConfigFile parsing fixes, adding additional info (#422)
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix

* [TransferBench] Fixing up NUMA node detection by filtering pools
2021-09-07 15:28:16 -06:00
Wenkai Du 5c8380ff5b Implement NIC identification and remapping (#420)
* Add 1H16P GPU model

* Implement NIC identification and remapping

* Revert "Sort IB devices based on device name (#413)"

This reverts commit 2d0ed8dff6.

* Fix permute and check order

* Correction on IB speed reporting

* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)"

This reverts commit caf5c9992a.
2021-08-24 09:42:04 -07:00
Wenkai Du d5f93649ff Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-08-24 09:49:47 -07:00
Wenkai Du 5f15ed6e3e Add gfx908 VM model (#418) 2021-08-10 08:55:11 -07:00
gilbertlee-amd 1ed272e5f0 [TransferBench] Removing dependency on hip_fp16 header, fixing swapped output CSV header (#416) 2021-08-04 10:53:41 -06:00
Wenkai Du bf2339f93e Merge remote-tracking branch 'nccl/master' into 2.10.3 2021-07-30 16:23:14 -07:00
Wenkai Du 818cdb16a8 Query XGMI links from xml and adjust gfx906 channel usage (#410) 2021-07-27 17:32:41 -07:00
Wenkai Du 135d47d125 topo_expl: fix build after switching to rocm-smi-lib (#405)
* topo_expl: fix build after switching to rocm-smi-lib

* Use minimal of 4 channels for gfx908
2021-07-27 08:30:08 -07:00