Граф коммитов

59 Коммитов

Автор SHA1 Сообщение Дата
randyh62 4e2eeafdf6 Update README.md (#1321)
update note formatting
2024-09-05 14:23:36 -07:00
randyh62 391c7ea070 what-is-rccl (#1312)
* what-is-rccl

* create Installation instreuctions from README

* update README link

* Add using-nccl

* Add note about docs

* correct doc path

* sources to source

* correct docs link
2024-09-05 06:54:48 -07:00
corey-derochie-amd e056fe8f7e Disable MSCCL for the non-multi-process case by default (#1307)
* Added `RCCL_MSCCL_ENABLE_SINGLE_PROCESS` runtime flag to return to the original MSCCL enablement behaviour except when explicitly enabling for multi-thread.

* Added documentation for the new `RCCL_MSCCL_ENABLE_SINGLE_PROCESS` runtime env var.
2024-09-04 11:11:50 -06:00
Nilesh M Negi a2474846f5 [README] Tips on using less than 8 MI300 GPUs (#1270)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-08-06 11:12:09 -05:00
Nilesh M Negi 4f31ab85ea [BUILD] Update gfxTargets for ASAN build (#1242)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-08-06 10:53:51 -05:00
corey-derochie-amd 6dc47eecd7 Integrated RCCL with MSCCL++ for small message sizes (#1231) 2024-07-12 15:32:58 -06:00
Nilesh M Negi 5aaf7121d9 [BUILD] Update install.sh for RCCL build (#1191)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-05-31 17:58:34 -05:00
corey-derochie-amd 503a472a25 Replaced ROCmSoftwarePlatform and RadeonOpenCompute links with ROCm links. (#1125) 2024-03-25 16:29:13 -06:00
Bertan Dogancay b617aecc31 Implement ROCTX (#1094)
* Implement roctx
2024-02-27 15:46:15 -07:00
Bertan Dogancay 28d9b170c9 [DEV] Configure functions in RCCL (#986)
* configure functions in rccl
2024-01-18 15:07:16 -07:00
searlmc1 15fa77bb57 Update README.md (#955)
Remove references to HCC, which was removed from ROCm ~2yrs ago
2023-11-15 18:01:45 -08:00
Bertan Dogancay 3807c203fc Update install.sh --fast and README (#924) 2023-10-19 16:35:10 -06:00
gilbertlee-amd bb55848450 Limiting # parallel jobs in install script to 16 by default, and new -j/--jobs flag (#785) 2023-06-22 14:30:44 -06:00
Sam Wu c3f47853bd Update Read the Docs, documentation, and dependabot (#772)
* update documentation

add version number to documentation

rename .sphinx/.doxygen to sphinx/doxygen

enable htmlzip, pdf, epub formats when publishing on Read the Docs

* add noCI label for dependabot PRs

since RTD CI is separate from math lib CI

* update rocm-docs-core to v0.13.4

* update README with link to rocm.docs.amd.com
2023-06-07 15:31:58 -06:00
gilbertlee-amd 777d8747a5 Refactoring CMakeFiles (#755) 2023-05-25 16:08:54 -06:00
akolliasAMD 9fe5a349f1 added npkit_enable on CI tests (#698) 2023-04-05 08:05:23 -06:00
Sam Wu 8c56e6c892 Fix Docs static analysis (#708)
* remove ref to old script in static analysis

* update README with doc build instructions

update gitignore with doc artifacts
2023-03-16 13:12:43 -06:00
Saad Rahim 6e48e518d9 Standard template implementation (#703) 2023-03-13 11:00:57 -06:00
PedramAlizadeh 45872d170f Changed the name of UnitTests to rccl-UnitTests (wrapper executable included). 2022-12-13 21:45:57 +00:00
akolliasAMD 5950942738 updated readme to reflect the newer tests 2022-07-13 16:08:28 +00:00
Wenkai Du 314da5a485 README.md: add CMAKE_PREFIX_PATH to build steps (#581) 2022-07-12 11:32:07 -07:00
Ziyue Yang 6e93fafdc3 Add Feature - Add NPKit Support in RCCL (#564)
* apply npkit

* fix bug

* add npkit in readme
2022-06-20 14:30:19 -07:00
gilbertlee-amd 700b473211 Moving opt-in custom signal handler from UnitTests into RCCL (#550)
* Enable via RCCL_ENABLE_SIGNALHANDLER=1
2022-05-20 09:56:38 -06:00
gilbertlee-amd b122dcd991 Update README.md (#364)
- Remove outdated HIP Direct call requirements
- Remove outdated chrpath requirement
- Adding section about HSA_FORCE_FINE_GRAIN_PCIE
2021-05-11 13:41:41 -06:00
Cory Bloor 8aea5edb29 Add Jenkins docs build (#18)
* Fix typo in copyright

* Minor README improvements

- Prevent underscores from being interpreted as italics in test name format.
- Switch URL to HTTPS.

* Update docs scripts config

- Allow run_doc.sh and run_doxygen.sh to be called from any directory.

* Add docs build to Jenkins
2021-02-18 16:37:37 -07:00
Wenkai Du c985358e11 Merge remote-tracking branch 'nccl/master' into 2.8.3 2021-02-15 18:44:47 -05:00
Sylvain Jeaugey 911d61f214 2.8.4-1
Fix hang in corner cases of alltoallv using point to point send/recv.
Harmonize error messages.
Fix missing NVTX section in the license.
Update README.
2021-02-09 15:36:48 -08:00
Stanley Tsang 8c90aefb6d Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos (#265)
* Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos

* Removing potentially unneccessary dependencies from install script
2020-09-10 17:27:22 -06:00
Stanley Tsang c5d4d9eb76 Adding static library building option. (#244)
* Adding static library building option.

* Disabling running tests for static build

* Removing static packaging in CI

Co-authored-by: Saad Rahim <saad.rahim@amd.com>
2020-08-06 11:19:43 -06:00
Stanley Tsang 8d21adb5e3 Documentation updates for NCCL 2.7.0 (#219)
* Making hip-clang the default compiler; documentation update

* Adding back --hip-clang to install.sh as a silent option for CI

* Documentation updates for NCCL 2.7

* Restoring deleted line in install script
2020-06-16 16:48:11 -06:00
Stanley Tsang dc403e0ca2 Making hip-clang the default compiler; documentation update (#216)
* Making hip-clang the default compiler; documentation update

* Adding back --hip-clang to install.sh as a silent option for CI
2020-06-04 11:58:27 -06:00
Stanley Tsang b59b9d328b Updating README and readthedocs documentation. 2020-05-12 20:11:49 +00:00
Stanley Tsang 20fa04d9b6 Updating copyright notices for 2020. 2020-01-29 15:28:08 -08:00
Wenkai Du 00a910c2da Change manual build instructions to fit most common usage 2019-11-26 12:40:26 -08:00
Stanley Tsang 329a62a01f Fixing install script to actually install library when requested (#88)
* Fixing install script to actually install library when requested.  Cleaning up unused code.

Removing unused arguments from install script.

Fixing weird whitespacing

* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib

* Updates and corrections to README and install script
2019-06-25 17:25:21 -06:00
Saad Rahim 02ef2d27e6 Fixing whitespace 2019-05-24 14:49:12 -07:00
Saad Rahim fac7ef9370 Adding link to readthedocs 2019-05-24 14:48:24 -07:00
Rajat Chopra 6d8b2421bc Update debian dependencies in README (#228)
'fakeroot' is needed for building deb packages
2019-05-22 21:19:36 -07:00
saadrahim 42c3e4b93d Updating readme for 2.5 release (#67) 2019-05-22 15:31:12 -06:00
Aaron Enye Shi 6e8f40eb22 Update README to note install rocm-cmake (#68) 2019-05-22 15:29:59 -06:00
Stanley Tsang 0d6a5a3d25 Update README.md
Adding mention of requirement for chrpath for unit tests.
2019-05-17 11:42:14 -06:00
Gilbert Lee 55a4b22ad7 Updating RCCL based on NCCL 2.3.7
- Contains modifications to support AMD hardware
- Adds unit tests
2019-05-16 16:16:18 +00:00
David Addison f40ce73e89 NCCL 2.4.6-1
Added detection of IBM/Power NVLink bridge device.
    Add NUMA support to PCI distance calculations.
    Added NCCL_IGNORE_CPU_AFFINITY env var.
    Fix memory leaks; GithubIssue#180
    Compiler warning fix; GithubIssue#178
    Replace non-standard variable length arrays. GithubIssue#171
    Fix Tree+Shared Memory crash. GithubPR#185
    Fix LL cleanup hang during long running DL jobs.
    Fix NCCL_RINGS environment variable handling.
    Added extra checks to catch repeat calls to ncclCommDestroy() GithubIssue#191
    Improve bootstrap socket connection reliability at scale.
    Fix hostname hashing issue. GithubIssue#187
    Code cleanup to rename all non device files from *.cu to *.cc
2019-04-05 13:05:45 -07:00
Ke Wen 21d9a877be Add official builds download link 2018-11-08 11:22:28 -08:00
Sylvain Jeaugey f7d31919d7 Add instructions to install packaging toolchain
Address #143 and #150 : debuild not installed.
2018-11-05 11:42:33 -08:00
Obihörnchen 3202d6b393 Fix nccl-tests all_reduce_perf path
It's `all_reduce_perf` not `allreduce_perf`
2018-10-14 00:53:17 -07:00
Sylvain Jeaugey f93fe9bfd9 2.3.5-5
Add support for inter-node communication using sockets and InfiniBand/RoCE.
Improve latency.
Add support for aggregation.
Improve LL/regular tuning.
Remove tests as those are now at github.com/nvidia/nccl-tests .
2018-09-25 14:12:01 -07:00
Sylvain Jeaugey 03d856977e Update README to link to NCCL2 2017-08-04 09:44:37 -07:00
Sylvain Jeaugey 4a33f66e27 Update README to link to NCCL2 part 3 2017-08-04 09:44:09 -07:00
Sylvain Jeaugey d66fb63679 Update README to link to NCCL2 #2 2017-08-04 09:43:29 -07:00