Граф коммитов

146 Коммитов

Автор SHA1 Сообщение Дата
gilbertlee-amd 2530a2f084 [TransferBench] Updating for 2.11.4. Decoupling from RCCL kernel (#485) 2022-01-05 16:33:25 -07:00
Wenkai Du 4234a638b5 Merge pull request #482 from ROCmSoftwarePlatform/2.11.4
Sync up with 2.11.4
2022-01-05 09:31:51 -08:00
Wenkai Du f8d0775a6f Add another Rome model (#483) 2022-01-05 09:26:31 -08:00
Wenkai Du 434ecb0e1f Merge remote-tracking branch 'origin/develop' into 2.11.4 2022-01-03 09:54:16 -08:00
gilbertlee-amd 1157c2edfe [TransferBench] Adding more preset benchmarks to filter read mode, cpu vs gpu pairs (#477) 2021-11-24 18:05:37 -07:00
Wenkai Du 3a919c1f49 Merge remote-tracking branch 'nccl/master' into develop 2021-11-11 14:22:12 -08:00
gilbertlee-amd 1c7ef1b790 [TransferBench] Adding #CUs / RRLW mode to p2p benchmark (#464) 2021-11-08 14:36:04 -07:00
Wenkai Du 0331e39f81 Update Rome model matching (#461)
* Update Rome model matching

* Add another Rome model

* Automatically setup NET GDR level from model
2021-11-05 08:53:47 -07:00
Wenkai Du 14a184eb67 Query XGMI link count through rocm_smi_lib API (#442) 2021-10-26 10:30:20 -07:00
gilbertlee-amd 18246fc191 [TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var (#446) 2021-10-25 11:23:29 -06:00
gilbertlee-amd 550d732d6c TransferBench p2p benchmark mode (#444)
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
2021-10-21 15:28:16 -06:00
gilbertlee-amd f6b7ac693e [TransferBench] Adding comment echoing to help distinguish tests (#438) 2021-10-13 14:56:57 -06:00
gilbertlee-amd 269f07fbc3 [TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU (#436) 2021-10-12 09:32:54 -06:00
Wenkai Du 2249a1d9d3 Add more Rome models (#434)
* Add more Rome models

* Update models and tuning

* Update tuning
2021-10-12 08:23:20 -07:00
gilbertlee-amd aa917c3fc8 [TransferBench] Adding ability to specify suffix for numBytes (#435) 2021-10-08 16:36:19 -06:00
gilbertlee-amd e506d14d18 [TransferBench] Fixing advanced config, adding new all-1-hop sample test (#433)
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
2021-10-07 15:57:21 -06:00
Wenkai Du e0053311c0 Add another Rome model (#431) 2021-10-06 08:17:12 -07:00
Wenkai Du 29c729d8b6 Trim NICs when all GPUs are connected by XGMI (#430)
* Trim NICs when all GPUs are connected by XGMI

* Only enable clique with maximum of 2 hops
2021-10-05 18:27:43 -07:00
Gilbert Lee 68ec3f84e6 [TransferBench] Update to 2.10.3 2021-08-02 05:53:20 -05:00
Wenkai Du 8ee2b7932a Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-09-13 15:51:53 -07:00
Wenkai Du a2421f8b4a Merge pull request #423 from wenkaidu/prim-test
rccl-prim-test: support 8p1h and 16p1h testing
2021-09-08 17:01:19 -07:00
Wenkai Du 7558b5e2bf rccl-prim-test: enable 8p1h and 16p1h test 2021-09-08 11:51:26 -05:00
Wenkai Du b22d097524 Revert "rccl-prim-test: add all-to-all benchmark (#185)"
This reverts commit ebc823e603.
2021-09-07 16:41:46 -05:00
gilbertlee-amd 51d64894ff [TransferBench] ConfigFile parsing fixes, adding additional info (#422)
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix

* [TransferBench] Fixing up NUMA node detection by filtering pools
2021-09-07 15:28:16 -06:00
Wenkai Du 5c8380ff5b Implement NIC identification and remapping (#420)
* Add 1H16P GPU model

* Implement NIC identification and remapping

* Revert "Sort IB devices based on device name (#413)"

This reverts commit 2d0ed8dff6.

* Fix permute and check order

* Correction on IB speed reporting

* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)"

This reverts commit caf5c9992a.
2021-08-24 09:42:04 -07:00
Wenkai Du d5f93649ff Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-08-24 09:49:47 -07:00
Wenkai Du 5f15ed6e3e Add gfx908 VM model (#418) 2021-08-10 08:55:11 -07:00
gilbertlee-amd 1ed272e5f0 [TransferBench] Removing dependency on hip_fp16 header, fixing swapped output CSV header (#416) 2021-08-04 10:53:41 -06:00
Wenkai Du bf2339f93e Merge remote-tracking branch 'nccl/master' into 2.10.3 2021-07-30 16:23:14 -07:00
Wenkai Du 818cdb16a8 Query XGMI links from xml and adjust gfx906 channel usage (#410) 2021-07-27 17:32:41 -07:00
Wenkai Du 135d47d125 topo_expl: fix build after switching to rocm-smi-lib (#405)
* topo_expl: fix build after switching to rocm-smi-lib

* Use minimal of 4 channels for gfx908
2021-07-27 08:30:08 -07:00
gilbertlee-amd 2b0b608270 [TransferBench] Fixing a typo in TransferBench usage example (#401) 2021-06-22 17:08:57 -06:00
Wenkai Du fa6d7e9a63 Fixes for NCCL_MAX_NCHANNELS and topo_expl (#398) 2021-06-22 08:41:49 -07:00
gilbertlee-amd 720374a767 [TransferBench] Switching from little-endian fill pattern to big-endian (#399) 2021-06-21 14:28:51 -06:00
gilbertlee-amd ff413be933 [TransferBench] Adding ability to specify source data pattern (#394)
* [TransferBench] Adding ability to specify source data pattern
2021-06-15 08:41:57 -06:00
Wenkai Du b815a2800f Setup collectives threshold for enabling intranet (#387)
* Setup collectives threshold for enabling intranet

* Use separate operation counters for coll and p2p
2021-06-09 13:24:26 -07:00
Wenkai Du a3a8c2d56b Allow intranode use of network connection (#383)
* Allow intranode use of network connection

* Checking for graph for null pointer
2021-06-08 07:37:59 -07:00
Wenkai Du 961922ea02 Add option to enable multiple SAT in SHARP (#380)
* Add option to enable multiple SAT in SHARP

* Extend number of NICs to 16
2021-06-03 19:45:18 -07:00
Wenkai Du 13dc80ee14 topo_expl: update to 2.9.9 2021-05-26 09:24:34 -07:00
Wenkai Du 4c83adb75c Update Rome models matching (#376) 2021-05-25 10:12:40 -07:00
Wenkai Du a4ea1fed5b Merge remote-tracking branch 'nccl/master' into develop 2021-05-05 16:01:01 -07:00
Wenkai Du a79f74082e Limit max channels for ring graph on single node Rome (#347)
* Limit max channels for ring graph on single node Rome
* Partially revert "Use non-temporal access for streaming data (#341)"
2021-04-14 10:14:54 -07:00
Wenkai Du 1fe031402a Add gfx90a target (#344)
* Add gfx90a target

* Support gfx90a topology

Co-authored-by: Eiden Yoshida <eiden.yoshida@amd.com>
2021-04-14 09:29:00 -06:00
Wenkai Du 9dfc2c183e Use non-temporal access for streaming data (#341)
* Use non-temporal access for streaming data

* Revert to ulong2 after fixing compiling issue
2021-04-07 17:34:35 -07:00
Wenkai Du e26ad2995e Cleanup number of channels calculation (#340) 2021-04-05 17:51:56 -07:00
Wenkai Du 17491c918e Fix incorrect net counting (#339)
* Fix incorrect net counting

* Add comments
2021-04-05 12:21:57 -07:00
Wenkai Du 1d2946ee4b Rework network port trimming code (#338)
* Rework network port trimming code

* Move Rome related changes to separate source files
2021-03-31 10:25:59 -07:00
Wenkai Du d87dc7c2e8 collnet: support multiple NICs (#335) 2021-03-25 20:59:32 -07:00
Wenkai Du 1d6244b18d Enable collnet in RCCL (#333)
* Enable CollNet and use different number of channels

* topo_expl: enable collnet
2021-03-19 12:58:13 -07:00
Wenkai Du 8e180cf087 Revert "Port alltoall[v]" (#325)
This reverts commit f4d5d3d620.
2021-03-06 13:59:31 -08:00