Commit Graph

105 Commits

Author SHA1 Message Date
Bertan Dogancay d52b6c0d24 add DMA_BUF support (#763)
* add DMA_BUF support

* remove unused libraries in src/init.cc

* change NCCL_ALL to NCCL_INIT

* remove extra pointer functions in transport/net.cc
2023-06-01 12:46:42 -06:00
gilbertlee-amd c62aebe882 Removing init_nvtx.cc from source list (#762) 2023-05-31 14:44:55 -06:00
gilbertlee-amd 777d8747a5 Refactoring CMakeFiles (#755) 2023-05-25 16:08:54 -06:00
Wenkai Du 53a1f91857 Merge remote-tracking branch 'nccl/master' into develop 2023-04-25 15:38:32 -07:00
Ziyue Yang e3b2342f39 MSCCL: Improve executor and integrate scheduler (#694)
* MSCCL: improve executor and add scheduler for testing

* Use external scheduler

* Fix cmake error

* Address comments

* Fix thread safe issue

* Make MSCCL lifecycle APIs thread safe

* Make MSCCL internal scheduler aware of topology hint

* Revise error message
2023-03-14 14:34:25 -07:00
Wenkai Du f7a456122c Remove workaround and use indirect function call (#684) 2023-02-14 13:59:48 -08:00
Wenkai Du 9461a43168 Merge pull request #681 from wenkaidu/gfx9
Add HIP event optimization and remove special code for gfx90a
2023-02-13 08:04:59 -08:00
Pedram Alizadeh f525b8e1e6 Adding -pthread flag into CMakeLists.txt (#682)
Adding -pthread flag for linking issues into CMakeLists.txt
2023-02-10 17:22:30 -05:00
Wenkai Du 39534e8724 Add HIP event optimization and remove special code for gfx90a 2023-02-10 16:46:01 +00:00
Wenkai Du e1cb45ff22 Merge remote-tracking branch 'nccl/master' into HEAD 2023-02-04 01:44:43 +00:00
Ziyue Yang adafc0f759 Add MSCCL Support (#658)
* Add MSCCL support

* Add alignment and message size checking

* Fix nRanks checking, in-place and out-of-place tests and group call handling

* Fix hipGraph unit test

* Change MSCCL init warning to INFO

* Revise license info
2022-12-12 15:51:04 -08:00
Wenkai Du aebed537a5 Reduce linking time through more parallel jobs (#657) 2022-11-30 16:06:03 -08:00
Wenkai Du fb9938cffa Query DMABuf support through HSA runtime API (#654) 2022-11-30 08:53:03 -08:00
akolliasAMD 11862f67de removed cmake HIP_CLANG_PATCH_LEVEL check (#652)
* removed HIP_CLANG_PATCH_LEVEL check
2022-11-29 09:48:59 -07:00
Wenkai Du 562dd87036 Move hipify to cmake stage
Add minimal ROCm/HIP version requirements for Graph support
2022-11-14 18:10:45 +00:00
Ranjith Ramakrishnan cf4e963aaf Correct include and library path for new directory layout
Use actual header files and libraries , rather than using wrapper header files and library softlinks
2022-10-14 01:32:04 -07:00
akolliasAMD ef71550738 Added new gpu targets (#631) 2022-09-29 14:53:55 -06:00
Wenkai Du 9e6c87a2bf Define ncclShmem as global shared (#618)
* Use global defined shared memory

* Add --hipcc-func-supp to compile option

* Force inline some device functions

* Add back threadfence
2022-09-20 09:00:20 -07:00
Wenkai Du a79d9e3586 Merge remote-tracking branch 'nccl/master' into develop 2022-09-09 16:05:38 +00:00
gilbertlee-amd 9ed9cd0e31 Fixing CMake to avoid unnecessary git_version relinking (#594) 2022-08-04 18:03:59 -06:00
Wenkai Du e2cb95a390 Add nccl_net.h to librccl-dev package (#589) 2022-07-29 13:39:49 -07:00
Wenkai Du 2e65881a79 Revert "Adding the missing roc:: namespace (#570)" (#576)
This reverts commit d5bea2cfaa.
2022-07-06 10:07:35 -07:00
gilbertlee-amd a89a9966aa Adding git hash info to version output line (#572) 2022-06-28 16:42:51 -06:00
Dmitry Mikushin d5bea2cfaa Adding the missing roc:: namespace (#570)
* Adding the missing roc:: namespace, effectively changing the value of RCCL_LIBRARY from rccl to roc::rccl.
The important difference is that rccl is treated as a symbolic "-lrccl" by linker (and fail the linking
due to a missing library search path), while roc::rccl is a target name, which can resolve into an absolute
library path.

Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

* Adding a changelog entry

* minor updates to wording

* missing period

Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2022-06-27 11:44:43 -06:00
Ziyue Yang 6e93fafdc3 Add Feature - Add NPKit Support in RCCL (#564)
* apply npkit

* fix bug

* add npkit in readme
2022-06-20 14:30:19 -07:00
arvindcheru a1fe1adf1c [CMake] GNU Install Dir Enhancements (#557)
* sd321110 (GNUInstall Dir) enhancements
2022-06-10 18:51:51 -04:00
gilbertlee-amd 700b473211 Moving opt-in custom signal handler from UnitTests into RCCL (#550)
* Enable via RCCL_ENABLE_SIGNALHANDLER=1
2022-05-20 09:56:38 -06:00
Edgar 4c4a7cb696 fix cmake logic to handle old and new include dirs
Starting from rocm 5.2 there is a reorganization of the
include directories. This pr allows to compile
rccl on both the old and the new directory layout.
This solution is using find_package() for identifying correct
settings for rocm_smi starting from rocm-5.2, and the original (manual)
settings for all previous releases.

Tested with rocm-5.2, 5.1.1, 5.0.2, and 4.5.2.
2022-04-28 14:33:46 -04:00
Wenkai Du d28e1cb44f Merge remote-tracking branch 'nccl/master' into develop 2022-04-18 11:15:25 -07:00
nunnikri b83efe9c5c Installing rccl.h wrapper to /opt/rocm-xxx/include path (#532)
* Fixing the broken library soft link

* Installing rccl.h wrapper to /opt/rocm-xxx/include path.

This missing wrapper was causing compilation errors with pytorch. Fixing it
2022-04-09 07:55:39 -07:00
nunnikri acfb0210ea Fixing the broken library soft link (#529) 2022-04-07 15:19:33 -07:00
Liam Wrubleski a8f1e61f48 Packages for test and benchmark executables on all supported OSes using CPack. (#512) 2022-03-21 15:04:14 -06:00
akolliasAMD 65ea3d80db Added alltoallv test and optional args variable on collective args (#514)
* Added alltoallv test and optional args variable on collective args
2022-03-18 13:55:11 -04:00
Nirmal Unnikrishnan 115461cc04 File reorganization with backward compatibility
Updated the header file location and export path
2022-03-10 01:28:41 +00:00
Nirmal Unnikrishnan 676a4737c1 File reorganization as per the new defined standard
The header files will in /opt/rocm-xxx/include/rccl
Libraries and cmake will be in /opt/rocm-xxx/lib folder.
Added wrappers for header files using rocm-cmake functions for backward compatibility.
2022-03-08 17:32:02 +00:00
Ziyue Yang b569c0a1db Add Pivot AllToAll algorithm for Rome model (#503)
* add a2a pivot interface

* remove debug info

* address comments

* fix bug

* remove custom script

* address comments

* fix bug
2022-02-20 21:09:47 -08:00
Wenkai Du 3a919c1f49 Merge remote-tracking branch 'nccl/master' into develop 2021-11-11 14:22:12 -08:00
Wenkai Du 29170a8b5f Support different protocols and algorithms in all reduce only build (#455)
* Support different protocols and algorithms in all reduce only build

* Restore deleted line in error
2021-11-02 08:39:08 -07:00
Wenkai Du 4643a17f83 Check rocm_smi64Config.h on older ROCm build (#452) 2021-10-28 07:26:28 -07:00
Wenkai Du ec36c4c326 Enable timing profiling mode (#447) 2021-10-27 08:21:48 -07:00
Stanley Tsang 7e55b211c5 Build AllReduce only mode (#443)
* Initial commit of all_reduce_only support

* Working AllReduce only build

* Removing printfs and restoring release build

* Restore P2P index

* Updates to build_allreduce_only mode.

* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Liam Wrubleski 97d9cf40e7 Setup runtime and development packages (#407)
* changes to enable devel package

* Update rocm-cmake version & build
2021-07-26 15:06:17 -06:00
Wenkai Du 56155ff5b6 Use rocm_smi_lib for getting topology information (#402)
* Use rocm_smi_lib for getting topology information

* Add rocm-smi-lib dependency to RCCL package
2021-07-08 13:23:11 -07:00
Eiden Yoshida 5c3e7d8b67 Fix static builds (#393) 2021-06-23 09:19:48 -06:00
Wenkai Du e75bc53e06 Deduct ROCM_PATH from CXX unless specified (#400) 2021-06-22 13:29:08 -07:00
Wenkai Du 59d2867b01 Remove hard coded /opt/rocm from cmake (#396) 2021-06-21 08:29:23 -07:00
Eiden Yoshida fb267ea333 Move address-sanitizer build above addition of rccl library in CMakeLists (#392) 2021-06-11 14:43:54 -06:00
Eiden Yoshida eea7b24058 Add address sanitizer build option (#389) 2021-06-10 09:14:54 -06:00
Wenkai Du a4ea1fed5b Merge remote-tracking branch 'nccl/master' into develop 2021-05-05 16:01:01 -07:00
Wenkai Du ad54a14a5c Add libdl linking option (#358) 2021-04-26 15:24:58 -07:00