David Sidler
8135eefce7
Use find_package for MPI ( #92 )
...
* Use find_package for MPI
* Minor fixes
[ROCm/rccl-tests commit: 46152785f0 ]
2025-01-14 11:49:20 -06:00
David Sidler
93977b8866
Add option to output results to a file ( #93 )
...
* Use find_package for MPI
* Add functionality to output results to file
* fix compilation
* report num gpus
* Revert "Use find_package for MPI"
This reverts commit c8fa253724ef4d0beac0d9c72f968062fbc6908e.
* Change inplace key
* remove dependency on json library
* Print "ranks, ranksPerNode, gpusPerRank"
* Add "nodes" field
---------
Co-authored-by: nileshnegi <Nilesh.Negi@amd.com >
[ROCm/rccl-tests commit: 959cc19920 ]
2025-01-13 17:28:29 -06:00
saurabhAMD
a2a3515e33
Updating to use hipDeviceMallocUncached ( #95 )
...
Use hipDeviceMallocUncached instead of hipDeviceMallocFinegrained on newer ROCm versions.
[ROCm/rccl-tests commit: fc9917e0da ]
2025-01-11 23:25:24 -06:00
Mustafa Abduljabbar
7fbad10633
Memset to fix inflated performance when GPU is reset ( #94 )
...
* Memset to fix inflated performance when GPU is reset
* use hipMemset for both memsets
[ROCm/rccl-tests commit: f2a48983ae ]
2025-01-11 23:25:24 -06:00
Sam Wu
7ecf5d7fda
[CI] Clone rccl and build from tip of develop ( #99 )
...
- Set cron to weekly
- Remove unused properties
- Try rccl install as sudo
- Clear existing rccl repo
- Run install with sudo and env vars
- Fix path
- Add rccl to path
- Attempt to fix build and install of rccl during compile stage.
- Remove existing clone from workspace
- Fix path when install rccl
- Fix path for install rccl-tests
- Install rccl local only
- Set RCCL_DIR
- Build rccl and rccl-tests with cmake
- Add extra env vars
- Use installer instead of cmake for rccl
- Update .jenkins/common.groovy
- Get librccl.so from rccl/build/release
- Switching job command to build rccl and rccl-tests using install.sh because those work properly together.
[ROCm/rccl-tests commit: 5c41a915c8 ]
2025-01-11 23:23:06 -06:00
Sam Wu
6328a42ab0
Remove precheckin steps from staticanalysis ( #101 )
...
[ROCm/rccl-tests commit: df26b32687 ]
2025-01-10 16:41:43 -07:00
Tim
549f58f2a1
hot fixing ncclMemFree for mscclpp ( #100 )
...
[ROCm/rccl-tests commit: f7a5df7fc4 ]
2025-01-09 12:03:52 -05:00
mberenjk
efa2d204b2
removing FP8 product from allReduce test cases ( #97 )
...
* removing FP8 product from allReduce test cases
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
[ROCm/rccl-tests commit: 77ae744c18 ]
2025-01-06 14:05:38 -06:00
Tim
c5ab7dc5b5
Scaling tests to #ngpus ( #81 )
...
* scaling tests to #ngpus
Signed-off-by: AtlantaPepsi <hyj1999110@gmail.com >
* switching to rocminfo
---------
Signed-off-by: AtlantaPepsi <hyj1999110@gmail.com >
[ROCm/rccl-tests commit: ae3e6357cb ]
2024-09-10 19:05:22 -04:00
AtlantaPepsi
e67844cc67
Fixing typo in readme
...
Signed-off-by: AtlantaPepsi <timhu102@amd.com >
[ROCm/rccl-tests commit: 71355df959 ]
2024-07-31 14:59:47 +00:00
AtlantaPepsi
ccf92dcea2
Merge -R option for memory allocation
...
Signed-off-by: AtlantaPepsi <timhu102@amd.com >
[ROCm/rccl-tests commit: afd5ca10ae ]
2024-07-31 14:57:20 +00:00
Nilesh M Negi
987a1a8a67
[CI] Add static analysis CI ( #85 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
[ROCm/rccl-tests commit: e635e9c9be ]
2024-07-23 22:21:26 -05:00
Rahul Vaidya
77524bffd0
Fix --root all issue. ( #83 )
...
Signed-off-by: rahulvaidya20 <ravaidya@amd.com >
[ROCm/rccl-tests commit: c5cae38bb8 ]
2024-06-14 11:46:08 -05:00
saurabhAMD
351a047627
Rotating tensor -R (default:off)
...
[ROCm/rccl-tests commit: 36a2c372ac ]
2024-06-04 11:35:39 -05:00
Giuseppe Congiu
411a7912f8
Add -R option to register user buffers
...
[ROCm/rccl-tests commit: a1efb427e7 ]
2024-06-03 01:04:58 -07:00
saurabhAMD
5700751b7e
updating cache flush on functionality
...
[ROCm/rccl-tests commit: 74c4177f58 ]
2024-05-10 08:46:13 -07:00
saurabhAMD
ce8e61cc3b
Enable cache flush after every -F iteration. Default : 0 (No cache flush)
...
[ROCm/rccl-tests commit: 699478dadf ]
2024-05-07 11:32:30 -05:00
saurabhAMD
e1a3c5a6dc
Cache flush
...
[ROCm/rccl-tests commit: 3c0728e8eb ]
2024-05-07 11:09:32 -05:00
Wenkai Du
baf9242e07
Fix incorrect device ordinal with limited device visibility ( #74 )
...
[ROCm/rccl-tests commit: 16dfeaf89b ]
2024-05-02 11:14:57 -07:00
corey-derochie-amd
fe151f517b
Fixed spelling
...
[ROCm/rccl-tests commit: f74c04b686 ]
2024-05-02 09:18:25 -06:00
Corey Derochie
85bdda3812
Wrapped the warmup iters in captures when doing graph mode to do a proper warmup.
...
[ROCm/rccl-tests commit: 0c762d210c ]
2024-05-01 20:41:12 -05:00
mberenjk
9cfb0745c0
replacing rccl_bfloat16 with hip_bfloat16 ( #70 )
...
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
[ROCm/rccl-tests commit: eb65dadfc5 ]
2024-04-23 17:00:20 -05:00
Nilesh M Negi
af102613e4
[DOCS] Update README for performance-oriented runs ( #73 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
[ROCm/rccl-tests commit: e8650b1844 ]
2024-04-23 14:30:06 -05:00
Nilesh M Negi
9d50d2c185
Ammend use of CUSTOM_RCCL_LIB to avoid build error ( #71 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
[ROCm/rccl-tests commit: 990f88cbaa ]
2024-04-12 12:01:32 -05:00
mberenjk
ca4ba933a3
adding git version to rccl-tests ( #69 )
...
Co-authored-by: mberenjk <mberenjk@amd.com >
[ROCm/rccl-tests commit: 3f7f7859bf ]
2024-03-28 14:03:59 -05:00
akolliasAMD
f8c62f64c3
Revert "adding git version to rccl-test ( #66 )"
...
This reverts commit 82c71f1838 .
[ROCm/rccl-tests commit: 91609be0ef ]
2024-03-22 10:21:37 -06:00
mberenjk
82c71f1838
adding git version to rccl-test ( #66 )
...
* adding git version to rccl-test
---------
Co-authored-by: mberenjk <mberenjk@banff-cyxtera-s74-2.ctr.dcgpu >
[ROCm/rccl-tests commit: a31679775c ]
2024-03-20 10:04:12 -05:00
Andy li
aaf1e27af2
update the fp8 header file name ( #65 )
...
* update the fp8 header name
[ROCm/rccl-tests commit: e447c17382 ]
2024-03-08 10:02:40 -08:00
Andy li
c128f0422d
Enable fp8 support ( #63 )
...
* initial checkin
* rename the fp8 datatype name
* update based on cr comments
* resolve the build issue
* resolve fp8 campability issue
* fix minior bug and catch up to reflex latest develop branch change
* add fp8 + operatior support
* update fp8 header file
* resolve merge issue from develop branch
[ROCm/rccl-tests commit: 21e59fb283 ]
2024-03-07 16:54:41 -08:00
Bertan Dogancay
efbfad7fe5
Revert __nv_bfloat16 back to hip_bfloat16 ( #64 )
...
[ROCm/rccl-tests commit: 7a7a5969d0 ]
2024-03-06 11:11:44 -07:00
Bertan Dogancay
882a96f5cb
Add hipify steps prior to build ( #62 )
...
* Add hipify steps prior to build
[ROCm/rccl-tests commit: 88cf7dbf45 ]
2024-03-05 09:47:18 -07:00
Wenkai Du
b49f6da1ec
Merge remote-tracking branch 'nccl-tests/master' into HEAD
...
[ROCm/rccl-tests commit: 621dde544d ]
2024-03-01 18:34:44 +00:00
Wenkai Du
ff97af6529
Fix typo in rank assignment ( #59 )
...
[ROCm/rccl-tests commit: 7715a0cf1f ]
2024-02-15 12:04:38 -08:00
David Addison
5d52f0285c
Added missing MPI_Comm_free() call before MPI_Finalize()
...
[ROCm/rccl-tests commit: c6afef0b6f ]
2024-02-05 08:53:54 -08:00
akolliasAMD
73f1f3cb3a
Merge branch 'develop'
...
[ROCm/rccl-tests commit: 56a5bb0486 ]
2024-01-30 12:50:38 -05:00
Nusrat Islam
26b1b0b822
Add option to disable out-of-place
...
[ROCm/rccl-tests commit: a2bec5d2f6 ]
2024-01-04 16:43:50 -06:00
Nilesh M Negi
cffd823582
Update default GPUs and build for AMDGPU_TARGETS ( #55 )
...
* Update default GPUs and build for AMDGPU_TARGETS
* Make GPU_TARGETS a cache variable
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
[ROCm/rccl-tests commit: b1f86ea6eb ]
2023-12-06 17:24:37 -06:00
Lauren Wrubleski
5070d67f9a
Offload arch linking ( #54 )
...
* Update CMakeLists.txt
* Update CMakeLists.txt
* Link rccl_common object against hip::device
Previously the tests were compiled with `--amdgpu-target` to compile for multiple architectures, As rccl_common was not compiled against those architectures, this didn't work. Linking it against hip::device automatically links against all architectures in `AMDGPU_TARGETS`, and so are the test executables.
[ROCm/rccl-tests commit: e1a816b869 ]
2023-12-05 19:20:46 -06:00
Bertan Dogancay
c07676388b
Fixing hipcc location for develop CI ( #52 )
...
[ROCm/rccl-tests commit: 8bfb67faf3 ]
2023-10-19 13:29:42 -06:00
Wenkai Du
ccad358bc9
Warm up both out-of-place and in-place collectives ( #51 )
...
[ROCm/rccl-tests commit: 5ee7a08994 ]
2023-10-16 12:13:50 -07:00
David Addison
459b52158f
Added an MPI_Barrier() call after MPI_Bcast() for HCOLL issue
...
[ROCm/rccl-tests commit: 1292b25553 ]
2023-10-12 16:53:32 -07:00
gilbertlee-amd
fa4dd9dbf0
Fixing hipcc location for CI ( #47 )
...
[ROCm/rccl-tests commit: 46375b1c52 ]
2023-09-22 14:38:31 -06:00
David Addison
e1f13fac90
Make the -c option be a datacheck iteration count parameter
...
Default is 1
[ROCm/rccl-tests commit: 6c46206a47 ]
2023-09-13 14:03:38 -07:00
arvindcheru
0232bf4300
Update Makefile - HIPCC Path Updated to latest ( #46 )
...
[ROCm/rccl-tests commit: c1ec0c8aaf ]
2023-08-04 19:42:33 -04:00
arvindcheru
2e12f0cfce
Update Makefile - HIPCC Path Updated to latest ( #45 )
...
[ROCm/rccl-tests commit: a6593375bc ]
2023-08-04 19:33:39 -04:00
Edgar Gabriel
aaae644862
search SLES install paths for MPI
...
[ROCm/rccl-tests commit: efdd4ad40b ]
2023-07-25 19:29:13 +00:00
Edgar Gabriel
84e9a3bbfc
revamp cmake MPI detection
...
we honor user requested MPI installations using MPI_PATH first,
and check afterwards for MPICH and Open MPI in the default
Ubuntu and RHEL installation directories.
[ROCm/rccl-tests commit: 8fc00ec32e ]
2023-07-25 19:28:35 +00:00
Edgar Gabriel
3ff4890b6f
auto-detect and enable MPI
...
[ROCm/rccl-tests commit: c96ff57ac7 ]
2023-07-25 19:27:13 +00:00
Edgar Gabriel
1b21a6fdd8
search SLES install paths for MPI
...
[ROCm/rccl-tests commit: 6048078be2 ]
2023-07-24 12:02:44 -07:00
Wenkai Du
431f21ca11
Remove hardcoded number of GPUs limit for alltoallv ( #41 )
...
[ROCm/rccl-tests commit: fcd0888d53 ]
2023-06-18 18:07:29 -07:00