提交線圖

260 次程式碼提交

作者 SHA1 備註 日期
Rahul Vaidya 0abe3c80bb Ensure backward compatibility for fp8 datatypes (#126)
* Ensure backward compatibility for fp8 datatypes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Update code comments

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>
2025-05-15 13:56:40 -05:00
mberenjk 4b2b635766 Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues.(#109)
* addressing hip_fp8 support compatibility issue

* skipping mulsum and avg test for fp8, using hip_fp8 for product

* syncing with nccl-tests

removing the fp8 filter for pre-hopper gpus and resolving the merge conflict

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-05-14 15:30:07 -05:00
Wenkai Du cac33a8c2f Automatically set in-place option from out-of-place (#123) 2025-05-09 16:48:42 -05:00
Nilesh M Negi 41b383a0d4 [BUILD] Add options to install script for compiler and GPU targets (#121)
* [BUILD] Add options to install script for compiler and GPU targets
* Fix GPU_TARGETS field and add option for custom ROCm path
* Check for ROCM_PATH

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2025-05-07 13:19:10 -05:00
Marius Brehler 5b27b961b2 Link Threads::Threads (#119)
`pthread.h` is included in `src/common.h` but lib is not properly
linked, resulting in the build failing with unresolved symbols when
trying to link.
2025-04-29 16:18:51 -05:00
Nilesh M Negi c96deb13cd [BUILD] Fix rccl-tests version string for packaging (#117)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2025-04-29 08:51:43 -05:00
Rahul Vaidya a4fd8f4667 Fix build issues caused by 2.24.3 sync (#118) 2025-04-28 10:22:38 -05:00
Grant Pinkert f611dbd49a Fix message size logging (#115)
Previously, the logger was logging the number of expected bytes a node was to recieve.
This differs from the stdout logging, where the reported message size is the total size of a message.

Signed-off-by: Grant Pinkert <gpinkert@amd.com>
2025-04-25 11:05:21 -05:00
Nilesh M Negi 83d38d91b6 Merge pull request #116 from nileshnegi/sync/nccl-tests/02-28-2025
[SYNC] NCCL-Tests v2.14.1
2025-04-21 19:53:35 -05:00
nileshnegi 5625599dda Merge remote-tracking branch 'nccl-tests/master' into develop 2025-04-21 19:46:10 -05:00
mberenjk 5e838ad9df skipping the prod test for FP8 types in reduce and reduce-scatter (#111)
* skipping the prod test for FP8 types in reduce and reduce-scatter
---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-04-15 09:38:33 -05:00
Alex Breslow 284ff2ac84 Add instructions to README regarding benchmarking on pre ROCm 6.4.x versions with HSA_NO_SCRATCH_RECLAIM=1 (#114) 2025-04-08 09:59:57 -07:00
David Addison b4300cc79d Add PCI domain and device ID for GPU device BDF display 2025-02-28 13:25:51 -08:00
Sylvain Jeaugey 903918fc54 Add NCCL_TESTS_SPLIT documentation in the README 2025-02-06 14:10:07 +01:00
Junyu Ma a89cf07fe8 Perftests: Introduce NCCL_TESTS_SPLIT env
`NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators.

Will be overrided by `NCCL_TESTS_SPLIT_MASK`.

Examples:

NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node.
NCCL_TESTS_SPLIT="AND 0x7"  # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7
NCCL_TESTS_SPLIT="MOD 72"   # color = rank % 72.  One GPU per NVLink domain on an NVL72 system.
NCCL_TESTS_SPLIT="DIV 72"   # color = rank / 72.  Intra NVLink domain on NVL72.

You can also use: "%" "&" "|" "/" for short.
Extra spaces in the middle will be automatically ignored.
Not case sensitive.

The followings are all equivalent:

NCCL_TESTS_SPLIT="%0x7"
NCCL_TESTS_SPLIT="%0b111"
NCCL_TESTS_SPLIT="AND 7"
NCCL_TESTS_SPLIT="and 0x7"
2025-02-04 15:18:09 -08:00
David Addison cb6a46fdd6 Update CUDA gencodes
Add support for Blackwell sm100 and sm120 from CUDA 12.8

Add support for Hopper sm90 from CUDA 12.0
2025-01-25 17:32:16 -08:00
Nilesh M Negi 448c4c7269 [GIT] Add CODEOWNERS and PR Template (#102)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2025-01-16 17:05:48 -07:00
David Sidler 46152785f0 Use find_package for MPI (#92)
* Use find_package for MPI
* Minor fixes
2025-01-14 11:49:20 -06:00
David Sidler 959cc19920 Add option to output results to a file (#93)
* Use find_package for MPI
* Add functionality to output results to file
* fix compilation
* report num gpus
* Revert "Use find_package for MPI"
This reverts commit c8fa253724ef4d0beac0d9c72f968062fbc6908e.
* Change inplace key
* remove dependency on json library
* Print "ranks, ranksPerNode, gpusPerRank"
* Add "nodes" field
---------

Co-authored-by: nileshnegi <Nilesh.Negi@amd.com>
2025-01-13 17:28:29 -06:00
saurabhAMD fc9917e0da Updating to use hipDeviceMallocUncached (#95)
Use hipDeviceMallocUncached instead of hipDeviceMallocFinegrained on newer ROCm versions.
2025-01-11 23:25:24 -06:00
Mustafa Abduljabbar f2a48983ae Memset to fix inflated performance when GPU is reset (#94)
* Memset to fix inflated performance when GPU is reset
* use hipMemset for both memsets
2025-01-11 23:25:24 -06:00
Sam Wu 5c41a915c8 [CI] Clone rccl and build from tip of develop (#99)
- Set cron to weekly
- Remove unused properties
- Try rccl install as sudo
- Clear existing rccl repo
- Run install with sudo and env vars
- Fix path
- Add rccl to path
- Attempt to fix build and install of rccl during compile stage.
- Remove existing clone from workspace
- Fix path when install rccl
- Fix path for install rccl-tests
- Install rccl local only
- Set RCCL_DIR
- Build rccl and rccl-tests with cmake
- Add extra env vars
- Use installer instead of cmake for rccl
- Update .jenkins/common.groovy
- Get librccl.so from rccl/build/release
- Switching job command to build rccl and rccl-tests using install.sh because those work properly together.
2025-01-11 23:23:06 -06:00
Sam Wu df26b32687 Remove precheckin steps from staticanalysis (#101) 2025-01-10 16:41:43 -07:00
Tim f7a5df7fc4 hot fixing ncclMemFree for mscclpp (#100) 2025-01-09 12:03:52 -05:00
mberenjk 77ae744c18 removing FP8 product from allReduce test cases (#97)
* removing FP8 product from allReduce test cases

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-01-06 14:05:38 -06:00
John Bachan 29f4114f02 Fixes to all tests that divide buffers by nranks so that they trim buffer sizes to be multiples of 16 bytes.
This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.
2024-12-18 11:20:28 -08:00
Sylvain Jeaugey 8dfeab9eb9 Merge pull request #259 from NVIDIA/fix-ncclstringtotype
Future-proof ncclstringtotype
2024-10-24 10:28:02 -07:00
Kamil Iskra 34d6d53910 Future-proof ncclstringtotype
Ensure that ncclstringtotype iterates only over data types known to
nccl-tests (as indicated by test_typenum), not over a potentially larger
set of all NCCL types.
2024-10-24 09:21:37 -07:00
Tim ae3e6357cb Scaling tests to #ngpus (#81)
* scaling tests to #ngpus

Signed-off-by: AtlantaPepsi <hyj1999110@gmail.com>

* switching to rocminfo

---------

Signed-off-by: AtlantaPepsi <hyj1999110@gmail.com>
2024-09-10 19:05:22 -04:00
Tim 52aee698fa Merge pull request #86 from AtlantaPepsi/UBR_merge
Registered Buffer option from nccl-tests merged
2024-07-31 11:05:49 -04:00
AtlantaPepsi 71355df959 Fixing typo in readme
Signed-off-by: AtlantaPepsi <timhu102@amd.com>
2024-07-31 14:59:47 +00:00
AtlantaPepsi afd5ca10ae Merge -R option for memory allocation
Signed-off-by: AtlantaPepsi <timhu102@amd.com>
2024-07-31 14:57:20 +00:00
David Addison 9d26b8422b Merge pull request #226 from netgroup/master
improve parsing of stepbytes (increment size) argument
2024-07-30 14:58:54 -07:00
David Addison 0d86b5a6e7 Added some missing command line options to README.md
Also updated single and multi-node examples.
2024-07-30 14:50:45 -07:00
David Addison d2d40cc824 Added -N,--run_cycles option 2024-07-25 22:00:23 -07:00
David Addison 3a3f790efd Merge pull request #240 from OrenLeung/patch-1
doc: add all2all factor
2024-07-25 22:00:06 -07:00
Oren c6eb15875f doc: add all2all factor 2024-07-24 22:55:00 -04:00
Nilesh M Negi e635e9c9be [CI] Add static analysis CI (#85)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-07-23 22:21:26 -05:00
Rahul Vaidya c5cae38bb8 Fix --root all issue. (#83)
Signed-off-by: rahulvaidya20 <ravaidya@amd.com>
2024-06-14 11:46:08 -05:00
Stefano Salsano 746549b28d improve parsing of stepbytes (increment size) argument 2024-06-14 11:28:55 +02:00
Kaiming Ouyang d028efcf35 Change ncclCommRegister size to maxBytes in serial comm init 2024-06-06 06:54:48 -07:00
saurabhAMD 073d56f6e2 Merge pull request #79 from saurabhAMD/rotating_tensor
Rotating tensor -R (default:off)
2024-06-04 17:51:26 -05:00
saurabhAMD 36a2c372ac Rotating tensor -R (default:off) 2024-06-04 11:35:39 -05:00
Giuseppe Congiu a1efb427e7 Add -R option to register user buffers 2024-06-03 01:04:58 -07:00
saurabhAMD 6378438c27 Merge pull request #76 from saurabhAMD/develop
Enable cache flush after every -F iteration. Default : 0 (No cache flush)
2024-05-13 10:41:31 -05:00
saurabhAMD 74c4177f58 updating cache flush on functionality 2024-05-10 08:46:13 -07:00
saurabhAMD 699478dadf Enable cache flush after every -F iteration. Default : 0 (No cache flush) 2024-05-07 11:32:30 -05:00
saurabhAMD 3c0728e8eb Cache flush 2024-05-07 11:09:32 -05:00
Wenkai Du 16dfeaf89b Fix incorrect device ordinal with limited device visibility (#74) 2024-05-02 11:14:57 -07:00
corey-derochie-amd 605fa143ad Merge pull request #75 from corey-derochie-amd/develop
Wrapped the warmup iters in captures when doing graph mode to do a proper warmup.
2024-05-02 09:18:52 -06:00