2
0
Gráfico de cometimentos

1568 Cometimentos

Autor(a) SHA1 Mensagem Data
Shilei Tian 8e9fcf111a Check -parallel-jobs before use (#1451)
`-parallel-jobs` is not always available, such as upstream LLVM.
2024-12-11 11:40:49 -06:00
Hujingbo ad4c36dc34 increase p2p channels for Intel platform (#1448)
Co-authored-by: hujingbo <hujingbo@kuaishou.com>
2024-12-10 07:33:37 -08:00
Jeffrey Novotny 9aa5b9f02e Refactor how to docs and formatting fixes (#1444) 2024-12-10 08:47:24 -05:00
Jeffrey Novotny 6d34fb7632 Add RCCL debugging guide (#1420)
* Add RCCL debugging guide

* Changes from external review

* More edits from internal review

* Additional edits

* Minor correction

* More changes after external review

* Integrate index and ToC changes with incoming merge changes

* Integrate feedback from management review

* Minor edits from the internal review
2024-12-06 13:25:58 -05:00
Nusrat Islam 42b6831a39 ext-src: tune TP=8 case on MI308 CPX mode (#1446)
Tune the number of blocks for hierarchical mscclpp allreduce.
2024-12-06 08:16:39 -06:00
Benjamin Kitor a05329bd0d Add Topologies for 16-GPU gfx942 SuperNode (#1417)
* Add Topologies for 16-GPU gfx942 SuperNode

- Add GigaIO topologies to tools/topo_expl for dev and testing
- Add GigaIO Columba 16 GPU romeModel and adjust topology
  matching algorithm in rome_models for 16 GPU system
- Fix bug which failed to match Rome Model when using subsets
  of system resources (i.e. ROCR_VISIBLE_DEVICES is set)
- Fixes for topo_expl

* Fix bug w/ 1H16P
2024-12-03 13:12:03 -08:00
Jeffrey Novotny 28594b26b3 Modify cmake instruction in build from source (#1445) 2024-12-03 11:26:02 -05:00
dependabot[bot] 1f789d6836 Bump rocm-docs-core from 1.8.3 to 1.9.2 in /docs/sphinx (#1441)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.3 to 1.9.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.3...v1.9.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-29 15:21:43 -07:00
Jeffrey Novotny bf7c130631 Refactor RCCL install guide into several pages (#1427)
* Refactor RCCL install guide into several pages

* Changes from code review and new docker guide

* Add missing entries to ToC

* Minor fixes

* Fix help strings

* Edits after review and remove extra white space
2024-11-27 15:34:26 -05:00
Jeffrey Novotny e42f10a361 Update rccl changelog for 6.3.1 (#1433)
* Update rccl changelog for 6.3.1

* Fix version number

* Correct RCCL release version

* Added details to 6.3.0 changelog

---------

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2024-11-26 08:46:37 -05:00
gilbertlee-amd 000575867c Adding RCCL_MODEL_REVERSAL_DISABLE env var to disable model reversal (#1431)
* Adding RCCL_MODEL_REVERSAL_DISABLE env var to disable model reversal
2024-11-25 11:24:54 -07:00
Bertan Dogancay dfe4a3ed81 Fix typo in ncclGetKernelIndex macro (#1424) 2024-11-18 10:40:05 -05:00
corey-derochie-amd 4336a0f3a3 Added latest users to CODEOWNERS. (#1422) 2024-11-14 16:55:18 -07:00
Bertan Dogancay cb175fb0b3 Template generic kernel for unroll factor (#1419)
* Template generic kernel for unroll factor
2024-11-12 18:27:29 -05:00
Jeffrey Novotny 2d07f18696 Refactor landing page and move some info to What is RCCL (#1415) 2024-11-12 13:15:27 -05:00
akolliasAMD 2284101624 removing unused gfx targets (#1411) 2024-11-06 08:50:08 -07:00
darren-amd 52d5f4cde2 Merge pull request #1406 from ROCm/darren-amd-remove-computeColl-declaration
remove undefined computeColl declaration
2024-11-06 10:43:35 -05:00
gilbertlee-amd cb1027de97 Updating RCCL Replayer README (#1408) 2024-11-05 08:06:11 -07:00
darren-amd ebf0417e90 remove undefined computeColl declaration 2024-11-04 13:42:01 -05:00
saurabhAMD 69b2b712ab GPU allocation for CPX Unit Tests using PCI bus id (#1403)
* mapping devices wrt pci

* Gpu allocation by using pci mapping

* Passing gpuPriorityOrder in as an argument rather than making the functions non-static.

* Removing redundant testBed instance calling
2024-11-04 10:51:00 -06:00
corey-derochie-amd 1c45962273 Hide or fix all build warnings (#1331)
* Changing C-strings to be const.

* Changed variable-length arrays to std::vector to avoid warnings. VLA is a compiler extension.

* Changed `#define` inside functions into `constexpr int` to preserve scoping and avoid macro redefinition warnings.

* Disabled warnings for modifying `CMAKE_CXX_FLAGS` caused by `check_symbol_exists`, which temporarily modifies the flag to do a compile check.

* Fixed VLA in rccl UT.
2024-11-04 09:46:42 -07:00
Abhishek Kulkarni 6178556853 GDR enablement logic fix for kernel 6.4.0+ (#1378) 2024-11-03 01:20:07 -05:00
Bertan Dogancay 984f1e4343 Increase MAX_STACK_SIZE for UT (#1398) 2024-11-01 13:07:45 -04:00
corey-derochie-amd 6db2644766 Set minimum ROCm version for MSCCLPP to 6.2 (#1401)
* Added ROCm version check around setting `ENABLE_MSCCLPP` flag.
2024-10-30 16:48:54 -06:00
Avinash d6006f0425 Memory leak fixes in hostside functions (#1388)
memory leak fixes for parseRome4P2H and ncclTopoAddGPU
2024-10-30 14:25:56 -05:00
Tim fd9924cfe7 Adjustment for UT Sendrecv (#1400)
Enabled UT sendrecv to same rank and refactor UBR call
2024-10-30 15:13:53 -04:00
Nusrat Islam 0fb3b5eba9 ext-src: Improved allreduce performance in cpx mode for MI308 (#1393)
To get the improved performance for TP=4, the user needs to use
RCCL_MSCCL_FORCE_ENABLE=1 and MSCCLPP_READ_ALLRED=1. For TP=8, the
user should use MSCCLPP_HIERARCHICAL_ALLRED=1.
2024-10-30 08:30:15 -05:00
corey-derochie-amd ea20af698e Remove MSCCL switch case fall-through by adding break statement. (#1342) 2024-10-29 15:47:59 -06:00
corey-derochie-amd 8ac63e7e70 6.2 final documentation fixes updated for 6.3 (#1252) (#1399)
* Update CHANGELOG.md

* Update NOTICES.txt

* [DOCS] Note on using less than 8 MI300 GPUs



* Update README.md



---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: nileshnegi <Nilesh.Negi@amd.com>
2024-10-29 15:23:45 -06:00
gilbertlee-amd 0cbce2a757 Adding support for odd nodes for model_87 (#1309) 2024-10-24 08:38:12 -06:00
corey-derochie-amd 6ed513e1b9 Update CHANGELOG to match release branches 6.2 and 6.3 (#1391)
* [CHANGELOG] Add Known issues for ROCm 6.2.1

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Updated 6.2.1 known issues to match the content in develop.

* Updated CHANGELOG for ROCm 6.3 release. (#1380)

* Updated CHANGELOG for ROCm 6.3 release.

* Update CHANGELOG to new format.

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2024-10-23 13:49:40 -06:00
Arm Patinyasakdikul 29f87c7191 Increased maximum number of XML nodes to support CPX mode. (#1386) 2024-10-23 11:15:11 -05:00
Wenkai Du e0780ba4d4 Fix topology discovery in container with subset of GPUs (#1384)
* Fix topology discovery in container with subset of GPUs

* Move links counting out of loop
2024-10-22 13:50:23 -07:00
Bertan Dogancay cfecce790f [Replayer] Add validation (#1387)
* Add validation to rccl_replayer
2024-10-22 10:41:08 -04:00
dependabot[bot] 4685d3c546 Bump rocm-docs-core from 1.8.2 to 1.8.3 in /docs/sphinx (#1385)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.2 to 1.8.3.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.2...v1.8.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-21 10:05:58 -06:00
Bertan Dogancay 373f113524 Dynamically select unroll factor to build for when targeting local arch (#1371)
* Dynamically select unroll factor to build for when targeting local arch only
2024-10-21 10:53:11 -04:00
Wenkai Du 7c077db307 Increase CQ size to 3*MAX_REQUESTS (#1374)
* Increase CQ size to 3*MAX_REQUESTS

Suggested by Rukhsana Ansari <rukhsana.ansari@broadcom.com>

* Reword comments based on feedback from Rukhsana
2024-10-18 11:01:03 -07:00
akolliasAMD af5678641d added atomic acquire for gfx12 on prims_simple (#1382) 2024-10-18 11:26:38 -06:00
Jeffrey Novotny 4822fd47ca Add missing metadata information (#1381) 2024-10-16 13:26:12 -04:00
Sean Karlage bdf9544c81 static: Enable true rccl static library build (#1379)
* static: Enable true rccl static library build

Rccl uses `-fgpu-rdc` to compile, which requires a specialized link command in order to produce a true static library.

When "linking" with `amdclang++`, you need to use `--emit-static-lib` and `--hip-link` to get a static library with all gpu code generated.  Subsequent links with binaries do not need any special flags to generate gpu code.`

Building a static library:
```
$ cmake -DROCM_PATH=$ROCM_PATH -DCMAKE_PREFIX_PATH=$ROCM_PATH -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=off -DCMAKE_POSITION_INDEPENDENT_CODE=on -DAMDGPU_TARGETS=gfx942 -DCMAKE_CXX_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang++ -DCMAKE_C_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang .. 2>&1 | tee -a /tmp/build.txt
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- Checking for ROCm support for GPU targets: gfx942
-- Compiling for gfx942
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- ROCM_PATH found: /opt/rocm
-- Compiling with amdclang++
-- HIP compiler:     clang
-- HIP runtime:      rocclr
-- amdclang++ executable: /opt/rocm/llvm/bin/amdclang++
-- amdclang++ version:    18.0.0git
-- hipconfig executable: /opt/rocm/bin/hipconfig
-- amdclang++ HIP version:    6.2.41133
-- ROCm version: 6.2.0
...
$ make -j 32
[  0%] Updating git_version.cpp if necessary
-- Updating git_version.cpp
[  0%] Built target git_version_check
[  0%] Hipifying src/transport/shm.cc -> /home/skarlage/local/rccl/build/hipify/src/transport/shm.cc
[  0%] Hipifying src/bootstrap.cc -> /home/skarlage/local/rccl/build/hipify/src/bootstrap.cc
[  0%] Hipifying src/channel.cc -> /home/skarlage/local/rccl/build/hipify/src/channel.cc
[  1%] Hipifying src/device/all_reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_reduce.h
[  1%] Hipifying src/device/broadcast.h -> /home/skarlage/local/rccl/build/hipify/src/device/broadcast.h
[  1%] Hipifying src/device/all_gather.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_gather.h
[  1%] Hipifying src/device/common.cu -> /home/skarlage/local/rccl/build/hipify/src/device/common.cu.cpp
[  1%] Hipifying src/debug.cc -> /home/skarlage/local/rccl/build/hipify/src/debug.cc
[  1%] Hipifying src/device/alltoall_pivot.h -> /home/skarlage/local/rccl/build/hipify/src/device/alltoall_pivot.h
[  1%] Hipifying src/device/network/unpack/unpack.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack.h
[  4%] Hipifying src/collectives.cc -> /home/skarlage/local/rccl/build/hipify/src/collectives.cc
[  4%] Hipifying src/device/msccl_kernel_impl.h -> /home/skarlage/local/rccl/build/hipify/src/device/msccl_kernel_impl.h
[  4%] Hipifying src/device/network/unpack/unpack_defs.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack_defs.h
[  4%] Hipifying src/device/op128.h -> /home/skarlage/local/rccl/build/hipify/src/device/op128.h
[  4%] Hipifying src/device/onerank.cu -> /home/skarlage/local/rccl/build/hipify/src/device/onerank.cu.cpp
[  4%] Hipifying src/device/common.h -> /home/skarlage/local/rccl/build/hipify/src/device/common.h
[  6%] Hipifying src/device/prims_ll.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll.h
[  6%] Hipifying src/device/primitives.h -> /home/skarlage/local/rccl/build/hipify/src/device/primitives.h
[  6%] Hipifying src/device/prims_ll128.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll128.h
[  6%] Hipifying src/device/reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce.h
[  7%] Hipifying src/device/common_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/common_kernel.h
[  7%] Hipifying src/device/reduce_scatter.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_scatter.h
[  7%] Hipifying src/device/sendrecv.h -> /home/skarlage/local/rccl/build/hipify/src/device/sendrecv.h
[  7%] Hipifying src/device/prims_simple.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_simple.h
[  7%] Hipifying src/enqueue.cc -> /home/skarlage/local/rccl/build/hipify/src/enqueue.cc
[  7%] Hipifying src/device/reduce_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_kernel.h
[  7%] Hipifying src/graph/connect.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/connect.cc
[  7%] Hipifying src/graph/rings.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.h
[  8%] Hipifying src/graph/rings.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.cc
[  8%] Hipifying src/graph/rome_models.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.cc
[  8%] Hipifying src/graph/rome_models.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.h
[  8%] Hipifying src/graph/paths.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/paths.cc
[  9%] Hipifying src/graph/search.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/search.cc
[  9%] Hipifying src/graph/topo.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/topo.cc
...
[100%] Linking CXX static library librccl.a
Elapsed time: 270 s. (time), 0.00046 s. (clock)
Elapsed time: 0 s. (time), 0.000342 s. (clock)
[100%] Built target rccl
```
Static rccl exists:
```
$ file librccl.a
librccl.a: current ar archive
```

* Fix up tests Cmake for static builds

We also need to fix up the tests CMakeLists.txt to:
* Remove the unused `BUILD_STATIC` option
* Use `SHARED_LIBS` as a definition of whether we're building static or
  not.
2024-10-16 06:58:50 -07:00
Wenkai Du c8d3543d3f Add back missing net flush (#1376) 2024-10-15 08:12:26 -07:00
Wenkai Du 62d10fdc25 msccl: disable 1-shot xmls (#1375)
MSCCL 1-shot xmls may cause different output values on different ranks.
Disabling them for now to avoid undefined behavior in applications.
2024-10-14 15:10:53 -07:00
Wenkai Du a680e329e6 Temporarily disable MSCCL all gather XMLs due to UT failure (#1373) 2024-10-12 08:43:16 -07:00
Wenkai Du 821d2e1f30 Allow zero byte sendrecv in alltoallv (#1349)
* Allow zero byte sendrecv in alltoallv

* Fix previous merge error
2024-10-11 10:40:32 -07:00
Wenkai Du 5c367a21d0 Improve model matching for GPUs with alltoall XGMI connection (#1372) 2024-10-11 09:53:14 -07:00
Arm Patinyasakdikul 133ea201cf Increase default number of channels for MI300A in multi-node scenario. (#1366)
This commit changed the default of channels of MI300A from 8 upto 24.
This helps bring up multi-node performance to the expected level.
2024-10-11 11:37:48 -05:00
Wenkai Du b55b6be0cb Fix crash when PXN is enabled on some platforms (#1369) 2024-10-11 09:02:59 -07:00
Nusrat Islam 6160603d4c ext-src: Fix compiler warnings for MSCCLPP integration (#1368) 2024-10-10 08:20:02 -05:00
Nilesh M Negi 364a6c2130 [BUILD] Simplify CMake args for building MSCCLPP (#1363)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-10-09 23:52:04 -05:00
Nilesh M Negi 41a2c02773 [BUILD] Require use of Python3 interpreter (#1367)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-10-09 22:36:50 -05:00