* addressing hip_fp8 support compatibility issue
* skipping mulsum and avg test for fp8, using hip_fp8 for product
* syncing with nccl-tests
removing the fp8 filter for pre-hopper gpus and resolving the merge conflict
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl-tests commit: 4b2b635766]
* [BUILD] Add options to install script for compiler and GPU targets
* Fix GPU_TARGETS field and add option for custom ROCm path
* Check for ROCM_PATH
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
[ROCm/rccl-tests commit: 41b383a0d4]
`pthread.h` is included in `src/common.h` but lib is not properly
linked, resulting in the build failing with unresolved symbols when
trying to link.
[ROCm/rccl-tests commit: 5b27b961b2]
Previously, the logger was logging the number of expected bytes a node was to recieve.
This differs from the stdout logging, where the reported message size is the total size of a message.
Signed-off-by: Grant Pinkert <gpinkert@amd.com>
[ROCm/rccl-tests commit: f611dbd49a]
* skipping the prod test for FP8 types in reduce and reduce-scatter
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl-tests commit: 5e838ad9df]
`NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators.
Will be overrided by `NCCL_TESTS_SPLIT_MASK`.
Examples:
NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node.
NCCL_TESTS_SPLIT="AND 0x7" # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7
NCCL_TESTS_SPLIT="MOD 72" # color = rank % 72. One GPU per NVLink domain on an NVL72 system.
NCCL_TESTS_SPLIT="DIV 72" # color = rank / 72. Intra NVLink domain on NVL72.
You can also use: "%" "&" "|" "/" for short.
Extra spaces in the middle will be automatically ignored.
Not case sensitive.
The followings are all equivalent:
NCCL_TESTS_SPLIT="%0x7"
NCCL_TESTS_SPLIT="%0b111"
NCCL_TESTS_SPLIT="AND 7"
NCCL_TESTS_SPLIT="and 0x7"
[ROCm/rccl-tests commit: a89cf07fe8]
- Set cron to weekly
- Remove unused properties
- Try rccl install as sudo
- Clear existing rccl repo
- Run install with sudo and env vars
- Fix path
- Add rccl to path
- Attempt to fix build and install of rccl during compile stage.
- Remove existing clone from workspace
- Fix path when install rccl
- Fix path for install rccl-tests
- Install rccl local only
- Set RCCL_DIR
- Build rccl and rccl-tests with cmake
- Add extra env vars
- Use installer instead of cmake for rccl
- Update .jenkins/common.groovy
- Get librccl.so from rccl/build/release
- Switching job command to build rccl and rccl-tests using install.sh because those work properly together.
[ROCm/rccl-tests commit: 5c41a915c8]
Ensure that ncclstringtotype iterates only over data types known to
nccl-tests (as indicated by test_typenum), not over a potentially larger
set of all NCCL types.
[ROCm/rccl-tests commit: 34d6d53910]