18 Коммитов

Автор SHA1 Сообщение Дата
Arm Patinyasakdikul 4d71cae249 [topo-expl] update header file location. (#1769)
[ROCm/rccl commit: 35024ca1cb]
2025-06-27 15:29:37 -05:00
Mustafa Abduljabbar ab4a3eb0c1 Fix topo explorer's compatibility with NCCL 2.24 (#1671)
* Fix build issues

* Fix failure to find path remote rank


[ROCm/rccl commit: f3f3336468]
2025-05-05 15:26:29 -04:00
Mustafa Abduljabbar 0a81478bd9 Fix topo explorer's nccl 2.23 compatibility (#1623)
* Fix compiler issues due to broken compatibility 

* Fix segfault and pass rank instead of busid and add a pointer to cover a new algorithm

[ROCm/rccl commit: aace4e27f8]
2025-04-02 09:47:29 -04:00
Benjamin Kitor fe806d5427 Add Topologies for 16-GPU gfx942 SuperNode (#1417)
* Add Topologies for 16-GPU gfx942 SuperNode

- Add GigaIO topologies to tools/topo_expl for dev and testing
- Add GigaIO Columba 16 GPU romeModel and adjust topology
  matching algorithm in rome_models for 16 GPU system
- Fix bug which failed to match Rome Model when using subsets
  of system resources (i.e. ROCR_VISIBLE_DEVICES is set)
- Fixes for topo_expl

* Fix bug w/ 1H16P

[ROCm/rccl commit: a05329bd0d]
2024-12-03 13:12:03 -08:00
BertanDogancay 9059445acb Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 84081064a0]
2024-10-02 09:31:25 -05:00
Wenkai Du f98715baea Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: abd0615351]
2023-06-26 22:51:56 +00:00
Wenkai Du 36e5e02e46 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 4f0e223db4]
2022-10-20 15:41:29 +00:00
Wenkai Du 7874a99c75 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: a79d9e3586]
2022-09-09 16:05:38 +00:00
Wenkai Du 67e7e6507e Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: d28e1cb44f]
2022-04-18 11:15:25 -07:00
Wenkai Du 5bebcb0015 Setup collectives threshold for enabling intranet (#387)
* Setup collectives threshold for enabling intranet

* Use separate operation counters for coll and p2p

[ROCm/rccl commit: b815a2800f]
2021-06-09 13:24:26 -07:00
Wenkai Du c8a432dc25 Allow intranode use of network connection (#383)
* Allow intranode use of network connection

* Checking for graph for null pointer

[ROCm/rccl commit: a3a8c2d56b]
2021-06-08 07:37:59 -07:00
Wenkai Du a76bebf8b6 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: a4ea1fed5b]
2021-05-05 16:01:01 -07:00
Wenkai Du 287ed0f18a Enable collnet in RCCL (#333)
* Enable CollNet and use different number of channels

* topo_expl: enable collnet

[ROCm/rccl commit: 1d6244b18d]
2021-03-19 12:58:13 -07:00
Wenkai Du adff98765c Merge remote-tracking branch 'nccl/master' into no-target-id
[ROCm/rccl commit: d469947641]
2021-01-14 19:27:53 -05:00
Wenkai Du 69eb70ce43 tpol_expl: update to 2.7
[ROCm/rccl commit: 71ec3e09df]
2020-06-09 17:40:24 -07:00
Wenkai Du 779ee97ada topo_expl: fix build error
[ROCm/rccl commit: 5743c6b7d2]
2020-04-27 17:17:05 +00:00
Wenkai Du 8852e54181 topo_expl: update to 2.6
[ROCm/rccl commit: 6f54b23503]
2020-04-01 13:37:08 -07:00
Wenkai Du 00f421ccbd Add topology explorer
[ROCm/rccl commit: 55f8e2dec7]
2020-02-19 14:42:06 -08:00