Commit Graph

261 Commits

Author SHA1 Message Date
mberenjk db5ab33461 Switching to old version of rccl_float8 for ROCm versions earlier than 6.3 for backward compatibility. (#128)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl-tests commit: 9076091602]
2025-05-16 09:14:46 -05:00
Rahul Vaidya fa5259894c Ensure backward compatibility for fp8 datatypes (#126)
* Ensure backward compatibility for fp8 datatypes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Update code comments

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl-tests commit: 0abe3c80bb]
2025-05-15 13:56:40 -05:00
mberenjk ed6ebb12a7 Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues.(#109)
* addressing hip_fp8 support compatibility issue

* skipping mulsum and avg test for fp8, using hip_fp8 for product

* syncing with nccl-tests

removing the fp8 filter for pre-hopper gpus and resolving the merge conflict

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl-tests commit: 4b2b635766]
2025-05-14 15:30:07 -05:00
Wenkai Du fe47d3dd77 Automatically set in-place option from out-of-place (#123)
[ROCm/rccl-tests commit: cac33a8c2f]
2025-05-09 16:48:42 -05:00
Nilesh M Negi e3b9d785cc [BUILD] Add options to install script for compiler and GPU targets (#121)
* [BUILD] Add options to install script for compiler and GPU targets
* Fix GPU_TARGETS field and add option for custom ROCm path
* Check for ROCM_PATH

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: 41b383a0d4]
2025-05-07 13:19:10 -05:00
Marius Brehler b0b615091e Link Threads::Threads (#119)
`pthread.h` is included in `src/common.h` but lib is not properly
linked, resulting in the build failing with unresolved symbols when
trying to link.

[ROCm/rccl-tests commit: 5b27b961b2]
2025-04-29 16:18:51 -05:00
Nilesh M Negi 6d2ec88eec [BUILD] Fix rccl-tests version string for packaging (#117)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: c96deb13cd]
2025-04-29 08:51:43 -05:00
Rahul Vaidya 10c31fb05f Fix build issues caused by 2.24.3 sync (#118)
[ROCm/rccl-tests commit: a4fd8f4667]
2025-04-28 10:22:38 -05:00
Grant Pinkert 3f962f5d58 Fix message size logging (#115)
Previously, the logger was logging the number of expected bytes a node was to recieve.
This differs from the stdout logging, where the reported message size is the total size of a message.

Signed-off-by: Grant Pinkert <gpinkert@amd.com>

[ROCm/rccl-tests commit: f611dbd49a]
2025-04-25 11:05:21 -05:00
Nilesh M Negi ba1adc3316 Merge pull request #116 from nileshnegi/sync/nccl-tests/02-28-2025
[SYNC] NCCL-Tests v2.14.1

[ROCm/rccl-tests commit: 83d38d91b6]
2025-04-21 19:53:35 -05:00
nileshnegi 8d887aad0d Merge remote-tracking branch 'nccl-tests/master' into develop
[ROCm/rccl-tests commit: 5625599dda]
2025-04-21 19:46:10 -05:00
mberenjk f3f3158a7e skipping the prod test for FP8 types in reduce and reduce-scatter (#111)
* skipping the prod test for FP8 types in reduce and reduce-scatter
---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl-tests commit: 5e838ad9df]
2025-04-15 09:38:33 -05:00
Alex Breslow 9da345dadf Add instructions to README regarding benchmarking on pre ROCm 6.4.x versions with HSA_NO_SCRATCH_RECLAIM=1 (#114)
[ROCm/rccl-tests commit: 284ff2ac84]
2025-04-08 09:59:57 -07:00
David Addison d516392fac Add PCI domain and device ID for GPU device BDF display
[ROCm/rccl-tests commit: b4300cc79d]
2025-02-28 13:25:51 -08:00
Sylvain Jeaugey b740da9a31 Add NCCL_TESTS_SPLIT documentation in the README
[ROCm/rccl-tests commit: 903918fc54]
2025-02-06 14:10:07 +01:00
Junyu Ma e2a9cbb362 Perftests: Introduce NCCL_TESTS_SPLIT env
`NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators.

Will be overrided by `NCCL_TESTS_SPLIT_MASK`.

Examples:

NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node.
NCCL_TESTS_SPLIT="AND 0x7"  # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7
NCCL_TESTS_SPLIT="MOD 72"   # color = rank % 72.  One GPU per NVLink domain on an NVL72 system.
NCCL_TESTS_SPLIT="DIV 72"   # color = rank / 72.  Intra NVLink domain on NVL72.

You can also use: "%" "&" "|" "/" for short.
Extra spaces in the middle will be automatically ignored.
Not case sensitive.

The followings are all equivalent:

NCCL_TESTS_SPLIT="%0x7"
NCCL_TESTS_SPLIT="%0b111"
NCCL_TESTS_SPLIT="AND 7"
NCCL_TESTS_SPLIT="and 0x7"


[ROCm/rccl-tests commit: a89cf07fe8]
2025-02-04 15:18:09 -08:00
David Addison 6f2e0f8a21 Update CUDA gencodes
Add support for Blackwell sm100 and sm120 from CUDA 12.8

Add support for Hopper sm90 from CUDA 12.0


[ROCm/rccl-tests commit: cb6a46fdd6]
2025-01-25 17:32:16 -08:00
Nilesh M Negi 590c2b0187 [GIT] Add CODEOWNERS and PR Template (#102)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: 448c4c7269]
2025-01-16 17:05:48 -07:00
David Sidler 8135eefce7 Use find_package for MPI (#92)
* Use find_package for MPI
* Minor fixes

[ROCm/rccl-tests commit: 46152785f0]
2025-01-14 11:49:20 -06:00
David Sidler 93977b8866 Add option to output results to a file (#93)
* Use find_package for MPI
* Add functionality to output results to file
* fix compilation
* report num gpus
* Revert "Use find_package for MPI"
This reverts commit c8fa253724ef4d0beac0d9c72f968062fbc6908e.
* Change inplace key
* remove dependency on json library
* Print "ranks, ranksPerNode, gpusPerRank"
* Add "nodes" field
---------

Co-authored-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: 959cc19920]
2025-01-13 17:28:29 -06:00
saurabhAMD a2a3515e33 Updating to use hipDeviceMallocUncached (#95)
Use hipDeviceMallocUncached instead of hipDeviceMallocFinegrained on newer ROCm versions.

[ROCm/rccl-tests commit: fc9917e0da]
2025-01-11 23:25:24 -06:00
Mustafa Abduljabbar 7fbad10633 Memset to fix inflated performance when GPU is reset (#94)
* Memset to fix inflated performance when GPU is reset
* use hipMemset for both memsets

[ROCm/rccl-tests commit: f2a48983ae]
2025-01-11 23:25:24 -06:00
Sam Wu 7ecf5d7fda [CI] Clone rccl and build from tip of develop (#99)
- Set cron to weekly
- Remove unused properties
- Try rccl install as sudo
- Clear existing rccl repo
- Run install with sudo and env vars
- Fix path
- Add rccl to path
- Attempt to fix build and install of rccl during compile stage.
- Remove existing clone from workspace
- Fix path when install rccl
- Fix path for install rccl-tests
- Install rccl local only
- Set RCCL_DIR
- Build rccl and rccl-tests with cmake
- Add extra env vars
- Use installer instead of cmake for rccl
- Update .jenkins/common.groovy
- Get librccl.so from rccl/build/release
- Switching job command to build rccl and rccl-tests using install.sh because those work properly together.


[ROCm/rccl-tests commit: 5c41a915c8]
2025-01-11 23:23:06 -06:00
Sam Wu 6328a42ab0 Remove precheckin steps from staticanalysis (#101)
[ROCm/rccl-tests commit: df26b32687]
2025-01-10 16:41:43 -07:00
Tim 549f58f2a1 hot fixing ncclMemFree for mscclpp (#100)
[ROCm/rccl-tests commit: f7a5df7fc4]
2025-01-09 12:03:52 -05:00
mberenjk efa2d204b2 removing FP8 product from allReduce test cases (#97)
* removing FP8 product from allReduce test cases

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>


[ROCm/rccl-tests commit: 77ae744c18]
2025-01-06 14:05:38 -06:00
John Bachan 69b9a05e71 Fixes to all tests that divide buffers by nranks so that they trim buffer sizes to be multiples of 16 bytes.
This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.


[ROCm/rccl-tests commit: 29f4114f02]
2024-12-18 11:20:28 -08:00
Sylvain Jeaugey 637acfba05 Merge pull request #259 from NVIDIA/fix-ncclstringtotype
Future-proof ncclstringtotype

[ROCm/rccl-tests commit: 8dfeab9eb9]
2024-10-24 10:28:02 -07:00
Kamil Iskra 0a2f08311b Future-proof ncclstringtotype
Ensure that ncclstringtotype iterates only over data types known to
nccl-tests (as indicated by test_typenum), not over a potentially larger
set of all NCCL types.


[ROCm/rccl-tests commit: 34d6d53910]
2024-10-24 09:21:37 -07:00
Tim c5ab7dc5b5 Scaling tests to #ngpus (#81)
* scaling tests to #ngpus

Signed-off-by: AtlantaPepsi <hyj1999110@gmail.com>

* switching to rocminfo

---------

Signed-off-by: AtlantaPepsi <hyj1999110@gmail.com>

[ROCm/rccl-tests commit: ae3e6357cb]
2024-09-10 19:05:22 -04:00
Tim ee4dd140bf Merge pull request #86 from AtlantaPepsi/UBR_merge
Registered Buffer option from nccl-tests merged

[ROCm/rccl-tests commit: 52aee698fa]
2024-07-31 11:05:49 -04:00
AtlantaPepsi e67844cc67 Fixing typo in readme
Signed-off-by: AtlantaPepsi <timhu102@amd.com>


[ROCm/rccl-tests commit: 71355df959]
2024-07-31 14:59:47 +00:00
AtlantaPepsi ccf92dcea2 Merge -R option for memory allocation
Signed-off-by: AtlantaPepsi <timhu102@amd.com>


[ROCm/rccl-tests commit: afd5ca10ae]
2024-07-31 14:57:20 +00:00
David Addison 58349a42b2 Merge pull request #226 from netgroup/master
improve parsing of stepbytes (increment size) argument

[ROCm/rccl-tests commit: 9d26b8422b]
2024-07-30 14:58:54 -07:00
David Addison 98b958afbd Added some missing command line options to README.md
Also updated single and multi-node examples.


[ROCm/rccl-tests commit: 0d86b5a6e7]
2024-07-30 14:50:45 -07:00
David Addison cf3ffb2f5f Added -N,--run_cycles option
[ROCm/rccl-tests commit: d2d40cc824]
2024-07-25 22:00:23 -07:00
David Addison 95419a2f47 Merge pull request #240 from OrenLeung/patch-1
doc: add all2all factor

[ROCm/rccl-tests commit: 3a3f790efd]
2024-07-25 22:00:06 -07:00
Oren 5061074d09 doc: add all2all factor
[ROCm/rccl-tests commit: c6eb15875f]
2024-07-24 22:55:00 -04:00
Nilesh M Negi 987a1a8a67 [CI] Add static analysis CI (#85)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: e635e9c9be]
2024-07-23 22:21:26 -05:00
Rahul Vaidya 77524bffd0 Fix --root all issue. (#83)
Signed-off-by: rahulvaidya20 <ravaidya@amd.com>

[ROCm/rccl-tests commit: c5cae38bb8]
2024-06-14 11:46:08 -05:00
Stefano Salsano a0e06f2133 improve parsing of stepbytes (increment size) argument
[ROCm/rccl-tests commit: 746549b28d]
2024-06-14 11:28:55 +02:00
Kaiming Ouyang 1922bd71cb Change ncclCommRegister size to maxBytes in serial comm init
[ROCm/rccl-tests commit: d028efcf35]
2024-06-06 06:54:48 -07:00
saurabhAMD bdaf71070a Merge pull request #79 from saurabhAMD/rotating_tensor
Rotating tensor -R (default:off)

[ROCm/rccl-tests commit: 073d56f6e2]
2024-06-04 17:51:26 -05:00
saurabhAMD 351a047627 Rotating tensor -R (default:off)
[ROCm/rccl-tests commit: 36a2c372ac]
2024-06-04 11:35:39 -05:00
Giuseppe Congiu 411a7912f8 Add -R option to register user buffers
[ROCm/rccl-tests commit: a1efb427e7]
2024-06-03 01:04:58 -07:00
saurabhAMD 78b5325328 Merge pull request #76 from saurabhAMD/develop
Enable cache flush after every -F iteration. Default : 0 (No cache flush)

[ROCm/rccl-tests commit: 6378438c27]
2024-05-13 10:41:31 -05:00
saurabhAMD 5700751b7e updating cache flush on functionality
[ROCm/rccl-tests commit: 74c4177f58]
2024-05-10 08:46:13 -07:00
saurabhAMD ce8e61cc3b Enable cache flush after every -F iteration. Default : 0 (No cache flush)
[ROCm/rccl-tests commit: 699478dadf]
2024-05-07 11:32:30 -05:00
saurabhAMD e1a3c5a6dc Cache flush
[ROCm/rccl-tests commit: 3c0728e8eb]
2024-05-07 11:09:32 -05:00
Wenkai Du baf9242e07 Fix incorrect device ordinal with limited device visibility (#74)
[ROCm/rccl-tests commit: 16dfeaf89b]
2024-05-02 11:14:57 -07:00