提交图

199 次代码提交

作者 SHA1 备注 提交日期
David Sidler 8135eefce7 Use find_package for MPI (#92)
* Use find_package for MPI
* Minor fixes

[ROCm/rccl-tests commit: 46152785f0]
2025-01-14 11:49:20 -06:00
David Sidler 93977b8866 Add option to output results to a file (#93)
* Use find_package for MPI
* Add functionality to output results to file
* fix compilation
* report num gpus
* Revert "Use find_package for MPI"
This reverts commit c8fa253724ef4d0beac0d9c72f968062fbc6908e.
* Change inplace key
* remove dependency on json library
* Print "ranks, ranksPerNode, gpusPerRank"
* Add "nodes" field
---------

Co-authored-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: 959cc19920]
2025-01-13 17:28:29 -06:00
saurabhAMD a2a3515e33 Updating to use hipDeviceMallocUncached (#95)
Use hipDeviceMallocUncached instead of hipDeviceMallocFinegrained on newer ROCm versions.

[ROCm/rccl-tests commit: fc9917e0da]
2025-01-11 23:25:24 -06:00
Mustafa Abduljabbar 7fbad10633 Memset to fix inflated performance when GPU is reset (#94)
* Memset to fix inflated performance when GPU is reset
* use hipMemset for both memsets

[ROCm/rccl-tests commit: f2a48983ae]
2025-01-11 23:25:24 -06:00
Sam Wu 7ecf5d7fda [CI] Clone rccl and build from tip of develop (#99)
- Set cron to weekly
- Remove unused properties
- Try rccl install as sudo
- Clear existing rccl repo
- Run install with sudo and env vars
- Fix path
- Add rccl to path
- Attempt to fix build and install of rccl during compile stage.
- Remove existing clone from workspace
- Fix path when install rccl
- Fix path for install rccl-tests
- Install rccl local only
- Set RCCL_DIR
- Build rccl and rccl-tests with cmake
- Add extra env vars
- Use installer instead of cmake for rccl
- Update .jenkins/common.groovy
- Get librccl.so from rccl/build/release
- Switching job command to build rccl and rccl-tests using install.sh because those work properly together.


[ROCm/rccl-tests commit: 5c41a915c8]
2025-01-11 23:23:06 -06:00
Sam Wu 6328a42ab0 Remove precheckin steps from staticanalysis (#101)
[ROCm/rccl-tests commit: df26b32687]
2025-01-10 16:41:43 -07:00
Tim 549f58f2a1 hot fixing ncclMemFree for mscclpp (#100)
[ROCm/rccl-tests commit: f7a5df7fc4]
2025-01-09 12:03:52 -05:00
mberenjk efa2d204b2 removing FP8 product from allReduce test cases (#97)
* removing FP8 product from allReduce test cases

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>


[ROCm/rccl-tests commit: 77ae744c18]
2025-01-06 14:05:38 -06:00
Tim c5ab7dc5b5 Scaling tests to #ngpus (#81)
* scaling tests to #ngpus

Signed-off-by: AtlantaPepsi <hyj1999110@gmail.com>

* switching to rocminfo

---------

Signed-off-by: AtlantaPepsi <hyj1999110@gmail.com>

[ROCm/rccl-tests commit: ae3e6357cb]
2024-09-10 19:05:22 -04:00
AtlantaPepsi e67844cc67 Fixing typo in readme
Signed-off-by: AtlantaPepsi <timhu102@amd.com>


[ROCm/rccl-tests commit: 71355df959]
2024-07-31 14:59:47 +00:00
AtlantaPepsi ccf92dcea2 Merge -R option for memory allocation
Signed-off-by: AtlantaPepsi <timhu102@amd.com>


[ROCm/rccl-tests commit: afd5ca10ae]
2024-07-31 14:57:20 +00:00
Nilesh M Negi 987a1a8a67 [CI] Add static analysis CI (#85)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: e635e9c9be]
2024-07-23 22:21:26 -05:00
Rahul Vaidya 77524bffd0 Fix --root all issue. (#83)
Signed-off-by: rahulvaidya20 <ravaidya@amd.com>

[ROCm/rccl-tests commit: c5cae38bb8]
2024-06-14 11:46:08 -05:00
saurabhAMD 351a047627 Rotating tensor -R (default:off)
[ROCm/rccl-tests commit: 36a2c372ac]
2024-06-04 11:35:39 -05:00
Giuseppe Congiu 411a7912f8 Add -R option to register user buffers
[ROCm/rccl-tests commit: a1efb427e7]
2024-06-03 01:04:58 -07:00
saurabhAMD 5700751b7e updating cache flush on functionality
[ROCm/rccl-tests commit: 74c4177f58]
2024-05-10 08:46:13 -07:00
saurabhAMD ce8e61cc3b Enable cache flush after every -F iteration. Default : 0 (No cache flush)
[ROCm/rccl-tests commit: 699478dadf]
2024-05-07 11:32:30 -05:00
saurabhAMD e1a3c5a6dc Cache flush
[ROCm/rccl-tests commit: 3c0728e8eb]
2024-05-07 11:09:32 -05:00
Wenkai Du baf9242e07 Fix incorrect device ordinal with limited device visibility (#74)
[ROCm/rccl-tests commit: 16dfeaf89b]
2024-05-02 11:14:57 -07:00
corey-derochie-amd fe151f517b Fixed spelling
[ROCm/rccl-tests commit: f74c04b686]
2024-05-02 09:18:25 -06:00
Corey Derochie 85bdda3812 Wrapped the warmup iters in captures when doing graph mode to do a proper warmup.
[ROCm/rccl-tests commit: 0c762d210c]
2024-05-01 20:41:12 -05:00
mberenjk 9cfb0745c0 replacing rccl_bfloat16 with hip_bfloat16 (#70)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl-tests commit: eb65dadfc5]
2024-04-23 17:00:20 -05:00
Nilesh M Negi af102613e4 [DOCS] Update README for performance-oriented runs (#73)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: e8650b1844]
2024-04-23 14:30:06 -05:00
Nilesh M Negi 9d50d2c185 Ammend use of CUSTOM_RCCL_LIB to avoid build error (#71)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: 990f88cbaa]
2024-04-12 12:01:32 -05:00
mberenjk ca4ba933a3 adding git version to rccl-tests (#69)
Co-authored-by: mberenjk <mberenjk@amd.com>

[ROCm/rccl-tests commit: 3f7f7859bf]
2024-03-28 14:03:59 -05:00
akolliasAMD f8c62f64c3 Revert "adding git version to rccl-test (#66)"
This reverts commit 82c71f1838.


[ROCm/rccl-tests commit: 91609be0ef]
2024-03-22 10:21:37 -06:00
mberenjk 82c71f1838 adding git version to rccl-test (#66)
* adding git version to rccl-test

---------

Co-authored-by: mberenjk <mberenjk@banff-cyxtera-s74-2.ctr.dcgpu>

[ROCm/rccl-tests commit: a31679775c]
2024-03-20 10:04:12 -05:00
Andy li aaf1e27af2 update the fp8 header file name (#65)
* update the fp8 header name

[ROCm/rccl-tests commit: e447c17382]
2024-03-08 10:02:40 -08:00
Andy li c128f0422d Enable fp8 support (#63)
* initial checkin

* rename the fp8 datatype name

* update based on cr comments

* resolve the build issue

* resolve fp8 campability issue

* fix minior bug and catch up to reflex latest develop branch change

* add fp8 + operatior support

* update fp8 header file

* resolve merge issue from develop branch

[ROCm/rccl-tests commit: 21e59fb283]
2024-03-07 16:54:41 -08:00
Bertan Dogancay efbfad7fe5 Revert __nv_bfloat16 back to hip_bfloat16 (#64)
[ROCm/rccl-tests commit: 7a7a5969d0]
2024-03-06 11:11:44 -07:00
Bertan Dogancay 882a96f5cb Add hipify steps prior to build (#62)
* Add hipify steps prior to build

[ROCm/rccl-tests commit: 88cf7dbf45]
2024-03-05 09:47:18 -07:00
Wenkai Du b49f6da1ec Merge remote-tracking branch 'nccl-tests/master' into HEAD
[ROCm/rccl-tests commit: 621dde544d]
2024-03-01 18:34:44 +00:00
Wenkai Du ff97af6529 Fix typo in rank assignment (#59)
[ROCm/rccl-tests commit: 7715a0cf1f]
2024-02-15 12:04:38 -08:00
David Addison 5d52f0285c Added missing MPI_Comm_free() call before MPI_Finalize()
[ROCm/rccl-tests commit: c6afef0b6f]
2024-02-05 08:53:54 -08:00
akolliasAMD 73f1f3cb3a Merge branch 'develop'
[ROCm/rccl-tests commit: 56a5bb0486]
2024-01-30 12:50:38 -05:00
Nusrat Islam 26b1b0b822 Add option to disable out-of-place
[ROCm/rccl-tests commit: a2bec5d2f6]
2024-01-04 16:43:50 -06:00
Nilesh M Negi cffd823582 Update default GPUs and build for AMDGPU_TARGETS (#55)
* Update default GPUs and build for AMDGPU_TARGETS
* Make GPU_TARGETS a cache variable
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: b1f86ea6eb]
2023-12-06 17:24:37 -06:00
Lauren Wrubleski 5070d67f9a Offload arch linking (#54)
* Update CMakeLists.txt

* Update CMakeLists.txt

* Link rccl_common object against hip::device

Previously the tests were compiled with `--amdgpu-target` to compile for multiple architectures, As rccl_common was not compiled against those architectures, this didn't work. Linking it against hip::device automatically links against all architectures in `AMDGPU_TARGETS`, and so are the test executables.

[ROCm/rccl-tests commit: e1a816b869]
2023-12-05 19:20:46 -06:00
Bertan Dogancay c07676388b Fixing hipcc location for develop CI (#52)
[ROCm/rccl-tests commit: 8bfb67faf3]
2023-10-19 13:29:42 -06:00
Wenkai Du ccad358bc9 Warm up both out-of-place and in-place collectives (#51)
[ROCm/rccl-tests commit: 5ee7a08994]
2023-10-16 12:13:50 -07:00
David Addison 459b52158f Added an MPI_Barrier() call after MPI_Bcast() for HCOLL issue
[ROCm/rccl-tests commit: 1292b25553]
2023-10-12 16:53:32 -07:00
gilbertlee-amd fa4dd9dbf0 Fixing hipcc location for CI (#47)
[ROCm/rccl-tests commit: 46375b1c52]
2023-09-22 14:38:31 -06:00
David Addison e1f13fac90 Make the -c option be a datacheck iteration count parameter
Default is 1


[ROCm/rccl-tests commit: 6c46206a47]
2023-09-13 14:03:38 -07:00
arvindcheru 0232bf4300 Update Makefile - HIPCC Path Updated to latest (#46)
[ROCm/rccl-tests commit: c1ec0c8aaf]
2023-08-04 19:42:33 -04:00
arvindcheru 2e12f0cfce Update Makefile - HIPCC Path Updated to latest (#45)
[ROCm/rccl-tests commit: a6593375bc]
2023-08-04 19:33:39 -04:00
Edgar Gabriel aaae644862 search SLES install paths for MPI
[ROCm/rccl-tests commit: efdd4ad40b]
2023-07-25 19:29:13 +00:00
Edgar Gabriel 84e9a3bbfc revamp cmake MPI detection
we honor user requested MPI installations using MPI_PATH first,
and check afterwards for MPICH and Open MPI in the default
Ubuntu and RHEL installation directories.


[ROCm/rccl-tests commit: 8fc00ec32e]
2023-07-25 19:28:35 +00:00
Edgar Gabriel 3ff4890b6f auto-detect and enable MPI
[ROCm/rccl-tests commit: c96ff57ac7]
2023-07-25 19:27:13 +00:00
Edgar Gabriel 1b21a6fdd8 search SLES install paths for MPI
[ROCm/rccl-tests commit: 6048078be2]
2023-07-24 12:02:44 -07:00
Wenkai Du 431f21ca11 Remove hardcoded number of GPUs limit for alltoallv (#41)
[ROCm/rccl-tests commit: fcd0888d53]
2023-06-18 18:07:29 -07:00