yangxingwu
cba8bfd093
makefile: remove extra space
...
[ROCm/rccl-tests commit: 52ea1b2148 ]
2023-06-06 09:47:50 +00:00
Sylvain Jeaugey
5244aae891
Merge pull request #135 from aavbsouza/fix_nvcc_variable_handling
...
fix handling of variable NVCC.
[ROCm/rccl-tests commit: e98ef24bc0 ]
2023-03-27 11:14:10 +02:00
alan.souza
4fd5ceeba8
fix handling of variable NVCC. Permit overriding the variable using environment variables
...
[ROCm/rccl-tests commit: 7ccda3c97b ]
2023-03-25 16:56:16 -03:00
David Addison
f5512c6a2e
Merge pull request #134 from flx42/patch-1
...
Update README.md to fix -i default increment value.
[ROCm/rccl-tests commit: e76e36e9a9 ]
2023-03-23 09:53:15 -07:00
Felix Abecassis
b3db782c3f
Update README.md
...
[ROCm/rccl-tests commit: 17d0a42d5a ]
2023-03-23 09:05:41 -07:00
Sylvain Jeaugey
b70cac2b33
Update README.md
...
Improve MPI example to avoid confusion of number of processes / total number of GPUs.
https://github.com/NVIDIA/nccl-tests/issues/54#issuecomment-1212023369
[ROCm/rccl-tests commit: 2cbb968101 ]
2023-01-03 08:47:43 +01:00
David Addison
129a1b4b78
Add boot_id to the hostname hash due to collisions on Azure
...
Fixes #60
[ROCm/rccl-tests commit: 0b4c4cb99f ]
2022-12-12 01:16:46 -08:00
Jithin Jose
5ba670d551
Use DJB2a hash algorithm in getHostHash()
...
[ROCm/rccl-tests commit: 0aeba157db ]
2022-12-12 01:16:38 -08:00
David Addison
6313530fcc
Call cudaFreeHost() on wrongPerGpu not cudaFree()
...
[ROCm/rccl-tests commit: 24fcf64ed1 ]
2022-11-22 11:18:37 -08:00
David Addison
04b5c40b1c
Add fflush(stdout) before perf output
...
[ROCm/rccl-tests commit: 3bd2bd292b ]
2022-11-22 11:16:47 -08:00
Sylvain Jeaugey
c0e3f4d443
Fix build on RHEL7 with GCC 4.8
...
Add -std=c++11 to CXXFLAGS.
Fixes #116 .
[ROCm/rccl-tests commit: 365b92a1ea ]
2022-10-12 01:24:14 -07:00
Sylvain Jeaugey
fdaa88710b
Update NCCL tests
...
[ROCm/rccl-tests commit: d313d20a26 ]
2022-09-23 01:13:29 -07:00
David Addison
35ee4ec3eb
Fix preprocessor version check for ncclGetLastError()
...
ncclGetLastError() was added in NCCL 2.13.0
[ROCm/rccl-tests commit: 749573f2d6 ]
2022-09-07 16:10:41 -07:00
David Addison
a43863e1a7
Fix an issue with the last commit when data checking is disabled
...
[ROCm/rccl-tests commit: afa4c56b6a ]
2022-09-07 11:23:49 -07:00
David Addison
59ed17798f
Display N/A for error count in AlltoAll in-place test
...
AlltoAll does not support in-place buffers
[ROCm/rccl-tests commit: a0a14911ee ]
2022-09-06 13:17:15 -07:00
John Bachan
70b6c0f5e5
Changed top-level Makefile behavior so that BUILDDIR is interpreted
...
as relative to top-level directory. This done is by abspath'ing it before
passing it to subdirectory Makefile's.
The old behavior had two cases: with and without BUILDDIR being set by
the user. With BUILDDIR not set, the build dir would be named "build"
in the top-level directory. If BUILDDIR was set, then the build dir
would be placed at "src/${BUILDDIR}".
The new behavior is simpler, if BUILDDIR is not set then it defaults
to "build", and the directory holding the final build is always at just
"${BUILDDIR}" in the top level.
[ROCm/rccl-tests commit: bc5f7cfb0a ]
2022-08-23 10:08:49 -07:00
John Bachan
b5d746b58e
Resync with NCCL 2.13
...
* Added "verifiable", a suite of kernels for generating and verifying reduction
input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
[ROCm/rccl-tests commit: 51af5572bf ]
2022-08-22 17:51:06 -07:00
David Addison
22ebb430a6
Merge pull request #96 from NVIDIA/nersc-linkage-fix
...
Add option to statically link cudart
[ROCm/rccl-tests commit: 8274cb47b6 ]
2022-05-26 16:54:44 -07:00
David Addison
dd8563b279
Add option to statically link cudart
...
Build with CUDARTLIB=cudart_static to remove dynamic linkage
Also removed unused curand and nvToolsExt dependencies
BUG 95
[ROCm/rccl-tests commit: de3ddbe261 ]
2021-11-10 10:02:41 -08:00
David Addison
ad9aac78df
Add MPI_IBM build option
...
[ROCm/rccl-tests commit: 7130fa6096 ]
2021-10-25 16:30:57 -07:00
David Addison
56ff821802
Resync with NCCL 2.11
...
New operator: mulsum
New test: gather
[ROCm/rccl-tests commit: f773748b46 ]
2021-09-17 09:02:45 -07:00
David Addison
f81f5baaed
Add CUDA graph support only for CUDA 11.3 and later builds
...
Fixes #90
[ROCm/rccl-tests commit: 1f8f541686 ]
2021-07-13 10:47:47 -07:00
David Addison
6719794fc8
Removed MPI_SUPPORT conditional compilation of average flag
...
[ROCm/rccl-tests commit: b9f90d12a9 ]
2021-07-12 11:43:57 -07:00
David Addison
d3061dc2a9
Fix issues with MPI_Allreduce and multi-threaded tests
...
[ROCm/rccl-tests commit: 547e119d35 ]
2021-07-08 16:42:40 -07:00
David Addison
ea6eec9e80
Updated with new command line arguments
...
[ROCm/rccl-tests commit: 11cff17a04 ]
2021-07-06 16:27:45 -07:00
David Addison
230983c84e
Merge branch 'bfloat16'
...
[ROCm/rccl-tests commit: f476f4a17a ]
2021-07-06 10:20:32 -07:00
David Addison
a23cffe28a
Added new option to report average iteration time
...
[ROCm/rccl-tests commit: 1dfc76eccc ]
2021-06-30 19:36:07 -07:00
David Addison
1044cd1f32
Resync with changes in gitilab-master code
...
[ROCm/rccl-tests commit: 1ae8cdc315 ]
2021-06-30 13:16:04 -07:00
David Addison
efaaf56199
Merge pull request #88 from nzmsv/master
...
Cleanup argument error handling and messages
[ROCm/rccl-tests commit: 44df0bf010 ]
2021-06-30 12:35:47 -07:00
David Addison
d30e35f150
Added new tests: scatter, sendrecv, hypercube
...
[ROCm/rccl-tests commit: 9dae3d3a37 ]
2021-06-28 16:49:10 -07:00
David Addison
e73e5a239b
Added support for CUDA graph capture/replay (-G)
...
[ROCm/rccl-tests commit: e55ad3796d ]
2021-06-28 14:19:45 -07:00
David Addison
20b63cf465
Fixed formatting for bfloat16 support
...
[ROCm/rccl-tests commit: 526eacadf7 ]
2021-06-28 10:12:34 -07:00
David Addison
a41268e26e
Add support for ncclAvg operation
...
[ROCm/rccl-tests commit: cde7e769c1 ]
2021-06-28 09:41:58 -07:00
Greg Inozemtsev
45c28c6c36
Cleanup argument error handling and messages
...
Add error checking for minbytes and maxbytes arguments
Also accept lowercase literals when parsing size arguments and print errors and usage on stderr.
[ROCm/rccl-tests commit: c4de829d91 ]
2021-06-04 21:47:40 +00:00
Sylvain Jeaugey
05f0ab10e6
Update PERFORMANCE.md
...
[ROCm/rccl-tests commit: e12c35d84b ]
2021-05-27 09:12:52 -07:00
David Addison
882c60210b
Add support for new datatype: bfloat16
...
[ROCm/rccl-tests commit: e37545e491 ]
2021-03-15 17:13:35 -07:00
David Addison
a74716696b
Merge pull request #67 from NVIDIA/big_buffers
...
Do not allocate memory for expected buffer if checking disabled
[ROCm/rccl-tests commit: 0b30de583f ]
2021-02-04 09:24:09 -08:00
David Addison
c62bde3272
Do not allocate memory for expected buffer if checking disabled
...
This allows the tests to be run with larger buffers
[ROCm/rccl-tests commit: 7677f3f608 ]
2021-01-20 17:08:40 -08:00
David Addison
281348cba9
Merge pull request #64 from NVIDIA/hosthash_boot_id
...
Add boot_id to the hostname hash due to collisions on Azure
[ROCm/rccl-tests commit: 2f9bba9f20 ]
2021-01-11 10:02:20 -08:00
David Addison
819d6ce228
Add boot_id to the hostname hash due to collisions on Azure
...
Fixes #60
[ROCm/rccl-tests commit: ae1ce98e69 ]
2021-01-04 11:38:45 -08:00
Sylvain Jeaugey
5a9f62c2b7
Merge pull request #61 from jithinjosepkl/master
...
Use DJB2a hash algorithm in getHostHash()
[ROCm/rccl-tests commit: 464f038106 ]
2020-12-18 10:39:43 -08:00
Jithin Jose
f770d161f3
Use DJB2a hash algorithm in getHostHash()
...
[ROCm/rccl-tests commit: da67a81c8e ]
2020-12-18 10:12:54 -08:00
Sylvain Jeaugey
f35cba73c8
Merge pull request #48 from NVIDIA/fix-makefile-typo
...
Fix typo in src/Makefile
[ROCm/rccl-tests commit: bd0755c95c ]
2020-06-24 14:52:55 -07:00
Luke Yeager
8b83a414c5
Fix typo in src/Makefile
...
[ROCm/rccl-tests commit: afdaf59b3b ]
2020-06-24 14:39:22 -07:00
Sylvain Jeaugey
0624d2cede
Add gencode for CUDA11
...
[ROCm/rccl-tests commit: b2603a2e85 ]
2020-06-23 18:16:46 -07:00
Sylvain Jeaugey
12d86bd58f
Change all_gather/reduce_scatter algbw to match the documentation.
...
Fix #45 : All_gather and reduce_scatter algorithm bandwidth was
computed as time/count*(nranks-1) which is not consistent with the
way we compute it for other collectives.
This change makes algbw higher; busbw is unchanged.
[ROCm/rccl-tests commit: ec1b5e22e6 ]
2020-06-19 10:42:19 -07:00
Sylvain Jeaugey
fcaaf2c4a1
Fix #47 : compilation error on NCCL<2.7
...
Return an error when trying to run alltoall test when compiled
against NCCL<2.7.
[ROCm/rccl-tests commit: 07ac716c1a ]
2020-06-18 15:02:51 -07:00
Sylvain Jeaugey
cf70df2498
Merge pull request #46 from NVIDIA/p2p
...
Add alltoall perf test
[ROCm/rccl-tests commit: a7b304dde5 ]
2020-06-17 10:45:29 -07:00
Luke Yeager
3a6293b748
Fix some memory leaks
...
[ROCm/rccl-tests commit: af4fa0f4cf ]
2020-06-17 10:44:32 -07:00
Sylvain Jeaugey
0dfae3da28
Remove sm_30
...
[ROCm/rccl-tests commit: 7a833631b2 ]
2020-06-15 08:54:21 -07:00