Edgar Gabriel
a80fbba12b
Merge pull request #23 from edgargabriel/pr/link-fix
...
add the rccl/lib directory to the link path
2022-10-31 15:54:55 -05:00
Edgar Gabriel
9c9746739a
add the rccl/lib directory to the link path
2022-10-31 19:01:22 +00:00
Edgar Gabriel
fb0d339c1b
Merge pull request #22 from edgargabriel/pr/compile-fix
...
fix a messing endif statement
2022-10-25 12:19:25 -05:00
Edgar Gabriel
8a754f15ad
fix a messing endif statement
...
error introduced with the web merger-resolution tool :-(
2022-10-25 16:31:57 +00:00
Edgar Gabriel
84e8be8e65
Merge pull request #21 from ROCmSoftwarePlatform/topic/v2.13.4-sync
...
Topic/v2.13.4 sync
2022-10-21 17:17:27 -05:00
Edgar Gabriel
4d7cd871c1
Merge branch 'develop' into topic/v2.13.4-sync
2022-10-21 17:12:45 -05:00
Wenkai Du
9a89c300b6
Allow more precise measurements of single operation ( #20 )
2022-10-21 22:07:41 +00:00
Edgar Gabriel
641e93e99c
make rccl-test compile again.
...
all files compile now.
mpi tests also pass
2022-10-21 22:07:33 +00:00
Edgar Gabriel
3ae371cce7
Merge remote-tracking branch 'nccl-tests/master' into topic/v2.13.4-sync
2022-10-14 16:02:54 -05:00
Wenkai Du
d22281cb3f
Allow more precise measurements of single operation ( #20 )
2022-10-12 17:28:04 -07:00
akolliasAMD
3fbd3280ce
removed hypercube from Makefile ( #19 )
2022-09-29 15:36:39 -06:00
Sylvain Jeaugey
d313d20a26
Update NCCL tests
2022-09-23 01:13:29 -07:00
David Addison
749573f2d6
Fix preprocessor version check for ncclGetLastError()
...
ncclGetLastError() was added in NCCL 2.13.0
2022-09-07 16:10:41 -07:00
David Addison
afa4c56b6a
Fix an issue with the last commit when data checking is disabled
2022-09-07 11:23:49 -07:00
David Addison
a0a14911ee
Display N/A for error count in AlltoAll in-place test
...
AlltoAll does not support in-place buffers
2022-09-06 13:17:15 -07:00
John Bachan
bc5f7cfb0a
Changed top-level Makefile behavior so that BUILDDIR is interpreted
...
as relative to top-level directory. This done is by abspath'ing it before
passing it to subdirectory Makefile's.
The old behavior had two cases: with and without BUILDDIR being set by
the user. With BUILDDIR not set, the build dir would be named "build"
in the top-level directory. If BUILDDIR was set, then the build dir
would be placed at "src/${BUILDDIR}".
The new behavior is simpler, if BUILDDIR is not set then it defaults
to "build", and the directory holding the final build is always at just
"${BUILDDIR}" in the top level.
2022-08-23 10:08:49 -07:00
John Bachan
51af5572bf
Resync with NCCL 2.13
...
* Added "verifiable", a suite of kernels for generating and verifying reduction
input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
2022-08-22 17:51:06 -07:00
Wenkai Du
45ec598ac4
Fix typo from previous merge
2022-08-12 14:42:17 +00:00
gilbertlee-amd
f6f3c44a7a
Enabling hipGraph codepath for future support ( #18 )
2022-08-09 16:45:27 -06:00
Wenkai Du
9025051bbb
Fix missing error checking for AllocateBuffs due to merge ( #17 )
2022-08-09 11:04:38 -07:00
Liam Wrubleski
d704668bf7
Add CMake files to build & package ( #15 )
...
* Add CMake files to build & package
* Change build technique on CI
* Correct CI build command
2022-08-09 11:17:07 -06:00
Eiden Yoshida
2af4f6bc3a
Allow gpu config override in CI ( #14 )
2022-07-28 09:19:16 -06:00
akolliasAMD
9925195afc
updated alltoallV test to not have any zero values ( #12 )
...
updated alltoallV test to not have any zero values between ranks
2022-07-21 10:28:53 -06:00
Edgar Gabriel
2a18737dc6
Merge pull request #11 from edgargabriel/ci-fix
...
update pytest before running CI
2022-06-13 09:52:40 -05:00
Edgar
67544e2c34
update pytest before running CI
...
There seems to be in an incompatibility between the python installation
used in the CI and pytest. Update pytest before running CI.
2022-06-13 10:20:33 -04:00
Edgar Gabriel
937ea1926e
Merge pull request #10 from edgargabriel/multi-rank
...
Multi rank support
2022-06-10 14:03:33 -05:00
Edgar
0500f2f132
implementation of multi-rank support in rccl-tests.
2022-06-10 14:54:10 -04:00
Edgar
5cd2374edb
create branch up-to-date with rccl-test
2022-06-10 12:41:56 -04:00
amdkila
3d6f70659a
Check for error code in install script ( #2 )
2022-06-10 12:37:53 -04:00
David Addison
8274cb47b6
Merge pull request #96 from NVIDIA/nersc-linkage-fix
...
Add option to statically link cudart
2022-05-26 16:54:44 -07:00
Wenkai Du
6156759a40
Print GPU's full PCI bus ID
2022-04-06 16:46:17 +00:00
Wenkai Du
47238336d9
Update include path for custom RCCL build
2022-03-31 13:18:02 -04:00
Ziyue Yang
698524e42e
move to a2a api ( #9 )
2022-02-18 08:31:40 -08:00
Wenkai Du
602b745ff4
Add missing hipStreamDestroy at test exit
2021-11-16 07:50:18 -08:00
David Addison
de3ddbe261
Add option to statically link cudart
...
Build with CUDARTLIB=cudart_static to remove dynamic linkage
Also removed unused curand and nvToolsExt dependencies
BUG 95
2021-11-10 10:02:41 -08:00
David Addison
7130fa6096
Add MPI_IBM build option
2021-10-25 16:30:57 -07:00
Wenkai Du
8b35847d36
Use rccl_bfloat16 class
2021-09-23 16:39:11 -07:00
Wenkai Du
dc1ad4853d
Fix divide by zero error
2021-09-22 08:43:01 -07:00
Wenkai Du
213abee002
Merge remote-tracking branch 'nccl/master' into develop
2021-09-20 14:01:22 -07:00
David Addison
f773748b46
Resync with NCCL 2.11
...
New operator: mulsum
New test: gather
2021-09-17 09:02:45 -07:00
Wenkai Du
cc34c54509
Use ROCM_PATH instead of ROCM_HOME
2021-07-21 14:19:48 -07:00
Wenkai Du
2d9be62621
Merge remote-tracking branch 'nccl/master'
2021-07-15 13:54:43 -07:00
David Addison
1f8f541686
Add CUDA graph support only for CUDA 11.3 and later builds
...
Fixes #90
2021-07-13 10:47:47 -07:00
Wenkai Du
9f8ddadcdf
Merge remote-tracking branch 'nccl/master' into develop
2021-07-13 08:11:44 -07:00
David Addison
b9f90d12a9
Removed MPI_SUPPORT conditional compilation of average flag
2021-07-12 11:43:57 -07:00
David Addison
547e119d35
Fix issues with MPI_Allreduce and multi-threaded tests
2021-07-08 16:42:40 -07:00
David Addison
11cff17a04
Updated with new command line arguments
2021-07-06 16:27:45 -07:00
David Addison
f476f4a17a
Merge branch 'bfloat16'
2021-07-06 10:20:32 -07:00
David Addison
1dfc76eccc
Added new option to report average iteration time
2021-06-30 19:36:07 -07:00
David Addison
1ae8cdc315
Resync with changes in gitilab-master code
2021-06-30 13:16:04 -07:00