as relative to top-level directory. This done is by abspath'ing it before
passing it to subdirectory Makefile's.
The old behavior had two cases: with and without BUILDDIR being set by
the user. With BUILDDIR not set, the build dir would be named "build"
in the top-level directory. If BUILDDIR was set, then the build dir
would be placed at "src/${BUILDDIR}".
The new behavior is simpler, if BUILDDIR is not set then it defaults
to "build", and the directory holding the final build is always at just
"${BUILDDIR}" in the top level.
[ROCm/rccl-tests commit: bc5f7cfb0a]
* Added "verifiable", a suite of kernels for generating and verifying reduction
input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
[ROCm/rccl-tests commit: 51af5572bf]
Build with CUDARTLIB=cudart_static to remove dynamic linkage
Also removed unused curand and nvToolsExt dependencies
BUG 95
[ROCm/rccl-tests commit: de3ddbe261]
Add error checking for minbytes and maxbytes arguments
Also accept lowercase literals when parsing size arguments and print errors and usage on stderr.
[ROCm/rccl-tests commit: c4de829d91]
Fix#45 : All_gather and reduce_scatter algorithm bandwidth was
computed as time/count*(nranks-1) which is not consistent with the
way we compute it for other collectives.
This change makes algbw higher; busbw is unchanged.
[ROCm/rccl-tests commit: ec1b5e22e6]
In some cases, the MPI library is not in $(MPI_HOME)/lib but
in $(MPI_HOME)/lib64. For example, on RedHat like Linux system
(CentOS, Amazon Linux), and MPI is installed by yum or rpm.
Under such circumstance, the current make file will cause failure.
This patch address this issue by adding -L$(MPI_HOME)/lib64 to
NVLDFLAGS in src/Makefile.
Signed-off-by: Wei Zhang <wzam@amazon.com>
[ROCm/rccl-tests commit: 0f173234bb]
Major rework to merge most of the changes from the NCCL internal
tests into the public ones
Added "-m <agg_iters>" operation aggregation option.
Data integrity checking is now much more performant at scale.
Startup times at scale are improved.
Test latency units are now displayed in usec.
[ROCm/rccl-tests commit: cbe7f65400]