gilbertlee-amd
2530a2f084
[TransferBench] Updating for 2.11.4. Decoupling from RCCL kernel ( #485 )
2022-01-05 16:33:25 -07:00
Wenkai Du
4234a638b5
Merge pull request #482 from ROCmSoftwarePlatform/2.11.4
...
Sync up with 2.11.4
2022-01-05 09:31:51 -08:00
Wenkai Du
f8d0775a6f
Add another Rome model ( #483 )
2022-01-05 09:26:31 -08:00
Wenkai Du
434ecb0e1f
Merge remote-tracking branch 'origin/develop' into 2.11.4
2022-01-03 09:54:16 -08:00
gilbertlee-amd
1157c2edfe
[TransferBench] Adding more preset benchmarks to filter read mode, cpu vs gpu pairs ( #477 )
2021-11-24 18:05:37 -07:00
Wenkai Du
3a919c1f49
Merge remote-tracking branch 'nccl/master' into develop
2021-11-11 14:22:12 -08:00
gilbertlee-amd
1c7ef1b790
[TransferBench] Adding #CUs / RRLW mode to p2p benchmark ( #464 )
2021-11-08 14:36:04 -07:00
Wenkai Du
0331e39f81
Update Rome model matching ( #461 )
...
* Update Rome model matching
* Add another Rome model
* Automatically setup NET GDR level from model
2021-11-05 08:53:47 -07:00
Wenkai Du
14a184eb67
Query XGMI link count through rocm_smi_lib API ( #442 )
2021-10-26 10:30:20 -07:00
gilbertlee-amd
18246fc191
[TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var ( #446 )
2021-10-25 11:23:29 -06:00
gilbertlee-amd
550d732d6c
TransferBench p2p benchmark mode ( #444 )
...
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
2021-10-21 15:28:16 -06:00
gilbertlee-amd
f6b7ac693e
[TransferBench] Adding comment echoing to help distinguish tests ( #438 )
2021-10-13 14:56:57 -06:00
gilbertlee-amd
269f07fbc3
[TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU ( #436 )
2021-10-12 09:32:54 -06:00
Wenkai Du
2249a1d9d3
Add more Rome models ( #434 )
...
* Add more Rome models
* Update models and tuning
* Update tuning
2021-10-12 08:23:20 -07:00
gilbertlee-amd
aa917c3fc8
[TransferBench] Adding ability to specify suffix for numBytes ( #435 )
2021-10-08 16:36:19 -06:00
gilbertlee-amd
e506d14d18
[TransferBench] Fixing advanced config, adding new all-1-hop sample test ( #433 )
...
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
2021-10-07 15:57:21 -06:00
Wenkai Du
e0053311c0
Add another Rome model ( #431 )
2021-10-06 08:17:12 -07:00
Wenkai Du
29c729d8b6
Trim NICs when all GPUs are connected by XGMI ( #430 )
...
* Trim NICs when all GPUs are connected by XGMI
* Only enable clique with maximum of 2 hops
2021-10-05 18:27:43 -07:00
Gilbert Lee
68ec3f84e6
[TransferBench] Update to 2.10.3
2021-08-02 05:53:20 -05:00
Wenkai Du
8ee2b7932a
Merge remote-tracking branch 'origin/develop' into 2.10.3
2021-09-13 15:51:53 -07:00
Wenkai Du
a2421f8b4a
Merge pull request #423 from wenkaidu/prim-test
...
rccl-prim-test: support 8p1h and 16p1h testing
2021-09-08 17:01:19 -07:00
Wenkai Du
7558b5e2bf
rccl-prim-test: enable 8p1h and 16p1h test
2021-09-08 11:51:26 -05:00
Wenkai Du
b22d097524
Revert "rccl-prim-test: add all-to-all benchmark ( #185 )"
...
This reverts commit ebc823e603 .
2021-09-07 16:41:46 -05:00
gilbertlee-amd
51d64894ff
[TransferBench] ConfigFile parsing fixes, adding additional info ( #422 )
...
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix
* [TransferBench] Fixing up NUMA node detection by filtering pools
2021-09-07 15:28:16 -06:00
Wenkai Du
5c8380ff5b
Implement NIC identification and remapping ( #420 )
...
* Add 1H16P GPU model
* Implement NIC identification and remapping
* Revert "Sort IB devices based on device name (#413 )"
This reverts commit 2d0ed8dff6 .
* Fix permute and check order
* Correction on IB speed reporting
* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361 )"
This reverts commit caf5c9992a .
2021-08-24 09:42:04 -07:00
Wenkai Du
d5f93649ff
Merge remote-tracking branch 'origin/develop' into 2.10.3
2021-08-24 09:49:47 -07:00
Wenkai Du
5f15ed6e3e
Add gfx908 VM model ( #418 )
2021-08-10 08:55:11 -07:00
gilbertlee-amd
1ed272e5f0
[TransferBench] Removing dependency on hip_fp16 header, fixing swapped output CSV header ( #416 )
2021-08-04 10:53:41 -06:00
Wenkai Du
bf2339f93e
Merge remote-tracking branch 'nccl/master' into 2.10.3
2021-07-30 16:23:14 -07:00
Wenkai Du
818cdb16a8
Query XGMI links from xml and adjust gfx906 channel usage ( #410 )
2021-07-27 17:32:41 -07:00
Wenkai Du
135d47d125
topo_expl: fix build after switching to rocm-smi-lib ( #405 )
...
* topo_expl: fix build after switching to rocm-smi-lib
* Use minimal of 4 channels for gfx908
2021-07-27 08:30:08 -07:00
gilbertlee-amd
2b0b608270
[TransferBench] Fixing a typo in TransferBench usage example ( #401 )
2021-06-22 17:08:57 -06:00
Wenkai Du
fa6d7e9a63
Fixes for NCCL_MAX_NCHANNELS and topo_expl ( #398 )
2021-06-22 08:41:49 -07:00
gilbertlee-amd
720374a767
[TransferBench] Switching from little-endian fill pattern to big-endian ( #399 )
2021-06-21 14:28:51 -06:00
gilbertlee-amd
ff413be933
[TransferBench] Adding ability to specify source data pattern ( #394 )
...
* [TransferBench] Adding ability to specify source data pattern
2021-06-15 08:41:57 -06:00
Wenkai Du
b815a2800f
Setup collectives threshold for enabling intranet ( #387 )
...
* Setup collectives threshold for enabling intranet
* Use separate operation counters for coll and p2p
2021-06-09 13:24:26 -07:00
Wenkai Du
a3a8c2d56b
Allow intranode use of network connection ( #383 )
...
* Allow intranode use of network connection
* Checking for graph for null pointer
2021-06-08 07:37:59 -07:00
Wenkai Du
961922ea02
Add option to enable multiple SAT in SHARP ( #380 )
...
* Add option to enable multiple SAT in SHARP
* Extend number of NICs to 16
2021-06-03 19:45:18 -07:00
Wenkai Du
13dc80ee14
topo_expl: update to 2.9.9
2021-05-26 09:24:34 -07:00
Wenkai Du
4c83adb75c
Update Rome models matching ( #376 )
2021-05-25 10:12:40 -07:00
Wenkai Du
a4ea1fed5b
Merge remote-tracking branch 'nccl/master' into develop
2021-05-05 16:01:01 -07:00
Wenkai Du
a79f74082e
Limit max channels for ring graph on single node Rome ( #347 )
...
* Limit max channels for ring graph on single node Rome
* Partially revert "Use non-temporal access for streaming data (#341 )"
2021-04-14 10:14:54 -07:00
Wenkai Du
1fe031402a
Add gfx90a target ( #344 )
...
* Add gfx90a target
* Support gfx90a topology
Co-authored-by: Eiden Yoshida <eiden.yoshida@amd.com >
2021-04-14 09:29:00 -06:00
Wenkai Du
9dfc2c183e
Use non-temporal access for streaming data ( #341 )
...
* Use non-temporal access for streaming data
* Revert to ulong2 after fixing compiling issue
2021-04-07 17:34:35 -07:00
Wenkai Du
e26ad2995e
Cleanup number of channels calculation ( #340 )
2021-04-05 17:51:56 -07:00
Wenkai Du
17491c918e
Fix incorrect net counting ( #339 )
...
* Fix incorrect net counting
* Add comments
2021-04-05 12:21:57 -07:00
Wenkai Du
1d2946ee4b
Rework network port trimming code ( #338 )
...
* Rework network port trimming code
* Move Rome related changes to separate source files
2021-03-31 10:25:59 -07:00
Wenkai Du
d87dc7c2e8
collnet: support multiple NICs ( #335 )
2021-03-25 20:59:32 -07:00
Wenkai Du
1d6244b18d
Enable collnet in RCCL ( #333 )
...
* Enable CollNet and use different number of channels
* topo_expl: enable collnet
2021-03-19 12:58:13 -07:00
Wenkai Du
8e180cf087
Revert "Port alltoall[v]" ( #325 )
...
This reverts commit f4d5d3d620 .
2021-03-06 13:59:31 -08:00