* Fixing message queue leak.
* Using POSIX implementation of Message Queues
* Adding unlink to msgqueue
* MsgQueue update
* Adding timeout check to msgqueue broadcast; tightening up system checks
* Removing unnecessary code
* Removing extra argument from print
* Adding explicit msg queue close call to all other ranks
[ROCm/rccl commit: 70597789d0]
* Limit max channels for ring graph on single node Rome
* Partially revert "Use non-temporal access for streaming data (#341)"
[ROCm/rccl commit: a79f74082e]
* Fix typo in copyright
* Minor README improvements
- Prevent underscores from being interpreted as italics in test name format.
- Switch URL to HTTPS.
* Update docs scripts config
- Allow run_doc.sh and run_doxygen.sh to be called from any directory.
* Add docs build to Jenkins
[ROCm/rccl commit: 8aea5edb29]
* Update install.sh
Install.sh having hard code like /opt/rocm/bin/hipcc for rocm_path and default_path=/opt/rocm
This will work only when we have standalone rocm installed. If anyone has installed, side-by-side, they will face below error.
Can we keep like ROCM_PATH=$ROCM_PATH instead of “default_path” as variable name and
ROCM_BIN_PATH=$ROCM_PATH/bin ,rocm_path can be replaced with ROCM_BIN_PATH.
This way, we will have option to export ROCM_PATH as env variable as per need and use the script.
I have also tried locally, it’s working. ROCM_PATH is common variable name, we are having.
If you are ok, I can also submit the PR for the same.
Error when side-by-side install is done for driver.
# ./install.sh -dtr 2>&1 | tee /dockerx/6519_rccl-test.log
CMake Error at /usr/share/cmake/Modules/CMakeDetermineCXXCompiler.cmake:48 (message):
Could not find compiler set in environment variable CXX:
/opt/rocm/bin/hipcc.
Call Stack (most recent call first):
CMakeLists.txt:12 (project)
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!
See also "/root/driver/rccl/build/release/CMakeFiles/CMakeOutput.log".
* Update install.sh
Removed ROCM_PATH=$ROCM_PATH
* Update install.sh
Set default value if external value is not supplied.
[ROCm/rccl commit: e9f7908592]
Fix hang in corner cases of alltoallv using point to point send/recv.
Harmonize error messages.
Fix missing NVTX section in the license.
Update README.
[ROCm/rccl commit: 911d61f214]
* Fixing temp file creation/deletion for Clique kernel mode.
* Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs
* GroupCall MP UT properly quits when too many devices specified
* MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script
[ROCm/rccl commit: d00b7d17bd]