نمودار کامیت

129 کامیت‌ها

مولف SHA1 پیام تاریخ
Edgar Gabriel efdd4ad40b search SLES install paths for MPI 2023-07-25 19:29:13 +00:00
Edgar Gabriel 8fc00ec32e revamp cmake MPI detection
we honor user requested MPI installations using MPI_PATH first,
and check afterwards for MPICH and Open MPI in the default
Ubuntu and RHEL installation directories.
2023-07-25 19:28:35 +00:00
Edgar Gabriel c96ff57ac7 auto-detect and enable MPI 2023-07-25 19:27:13 +00:00
Pedram Alizadeh d16d1fb16b fixing the error message for mpirun when number of requested GPUs exceeds the limits (#37) 2023-04-27 14:06:17 -04:00
Pedram Alizadeh e856fa720f Revert "fixing the error message for mpirun when number of requested GPUs exceeds the limits (#33)" (#36)
This reverts commit e146460810.
2023-04-25 13:44:43 -04:00
Pedram Alizadeh e146460810 fixing the error message for mpirun when number of requested GPUs exceeds the limits (#33) 2023-04-03 11:37:13 -04:00
Pedram Alizadeh 255750b094 Adding -pthread flag for linking issues into CMakeLists.txt and src/Makefile (#31) 2023-03-02 11:05:25 -05:00
akolliasAMD 3fbd3280ce removed hypercube from Makefile (#19) 2022-09-29 15:36:39 -06:00
Wenkai Du 45ec598ac4 Fix typo from previous merge 2022-08-12 14:42:17 +00:00
gilbertlee-amd f6f3c44a7a Enabling hipGraph codepath for future support (#18) 2022-08-09 16:45:27 -06:00
Wenkai Du 9025051bbb Fix missing error checking for AllocateBuffs due to merge (#17) 2022-08-09 11:04:38 -07:00
Liam Wrubleski d704668bf7 Add CMake files to build & package (#15)
* Add CMake files to build & package

* Change build technique on CI

* Correct CI build command
2022-08-09 11:17:07 -06:00
Eiden Yoshida 2af4f6bc3a Allow gpu config override in CI (#14) 2022-07-28 09:19:16 -06:00
akolliasAMD 9925195afc updated alltoallV test to not have any zero values (#12)
updated alltoallV test to not have any zero values between ranks
2022-07-21 10:28:53 -06:00
Edgar Gabriel 2a18737dc6 Merge pull request #11 from edgargabriel/ci-fix
update pytest before running CI
2022-06-13 09:52:40 -05:00
Edgar 67544e2c34 update pytest before running CI
There seems to be in an incompatibility between the python installation
used in the CI and pytest. Update pytest before running CI.
2022-06-13 10:20:33 -04:00
Edgar Gabriel 937ea1926e Merge pull request #10 from edgargabriel/multi-rank
Multi rank support
2022-06-10 14:03:33 -05:00
Edgar 0500f2f132 implementation of multi-rank support in rccl-tests. 2022-06-10 14:54:10 -04:00
Edgar 5cd2374edb create branch up-to-date with rccl-test 2022-06-10 12:41:56 -04:00
amdkila 3d6f70659a Check for error code in install script (#2) 2022-06-10 12:37:53 -04:00
Wenkai Du 6156759a40 Print GPU's full PCI bus ID 2022-04-06 16:46:17 +00:00
Wenkai Du 47238336d9 Update include path for custom RCCL build 2022-03-31 13:18:02 -04:00
Ziyue Yang 698524e42e move to a2a api (#9) 2022-02-18 08:31:40 -08:00
Wenkai Du 602b745ff4 Add missing hipStreamDestroy at test exit 2021-11-16 07:50:18 -08:00
Wenkai Du 8b35847d36 Use rccl_bfloat16 class 2021-09-23 16:39:11 -07:00
Wenkai Du dc1ad4853d Fix divide by zero error 2021-09-22 08:43:01 -07:00
Wenkai Du 213abee002 Merge remote-tracking branch 'nccl/master' into develop 2021-09-20 14:01:22 -07:00
David Addison f773748b46 Resync with NCCL 2.11
New operator: mulsum
New test: gather
2021-09-17 09:02:45 -07:00
Wenkai Du cc34c54509 Use ROCM_PATH instead of ROCM_HOME 2021-07-21 14:19:48 -07:00
Wenkai Du 2d9be62621 Merge remote-tracking branch 'nccl/master' 2021-07-15 13:54:43 -07:00
David Addison 1f8f541686 Add CUDA graph support only for CUDA 11.3 and later builds
Fixes #90
2021-07-13 10:47:47 -07:00
Wenkai Du 9f8ddadcdf Merge remote-tracking branch 'nccl/master' into develop 2021-07-13 08:11:44 -07:00
David Addison b9f90d12a9 Removed MPI_SUPPORT conditional compilation of average flag 2021-07-12 11:43:57 -07:00
David Addison 547e119d35 Fix issues with MPI_Allreduce and multi-threaded tests 2021-07-08 16:42:40 -07:00
David Addison 11cff17a04 Updated with new command line arguments 2021-07-06 16:27:45 -07:00
David Addison f476f4a17a Merge branch 'bfloat16' 2021-07-06 10:20:32 -07:00
David Addison 1dfc76eccc Added new option to report average iteration time 2021-06-30 19:36:07 -07:00
David Addison 1ae8cdc315 Resync with changes in gitilab-master code 2021-06-30 13:16:04 -07:00
David Addison 44df0bf010 Merge pull request #88 from nzmsv/master
Cleanup argument error handling and messages
2021-06-30 12:35:47 -07:00
David Addison 9dae3d3a37 Added new tests: scatter, sendrecv, hypercube 2021-06-28 16:49:10 -07:00
David Addison e55ad3796d Added support for CUDA graph capture/replay (-G) 2021-06-28 14:19:45 -07:00
David Addison 526eacadf7 Fixed formatting for bfloat16 support 2021-06-28 10:12:34 -07:00
David Addison cde7e769c1 Add support for ncclAvg operation 2021-06-28 09:41:58 -07:00
Greg Inozemtsev c4de829d91 Cleanup argument error handling and messages
Add error checking for minbytes and maxbytes arguments

Also accept lowercase literals when parsing size arguments and print errors and usage on stderr.
2021-06-04 21:47:40 +00:00
Sylvain Jeaugey e12c35d84b Update PERFORMANCE.md 2021-05-27 09:12:52 -07:00
Wenkai Du 0fccaec26f Update mpich include path 2021-04-16 18:25:54 -04:00
Stanley Tsang a065ec606d Merge pull request #8 from stanleytsang-amd/less_mem_types_unittests
Disabling host and fine memory types for CI testing
2021-03-16 17:00:44 -06:00
Stanley Tsang 5373e3c630 Disabling host and fine memory types for CI testing 2021-03-16 20:38:13 +00:00
David Addison e37545e491 Add support for new datatype: bfloat16 2021-03-15 17:13:35 -07:00
David Addison 0b30de583f Merge pull request #67 from NVIDIA/big_buffers
Do not allocate memory for expected buffer if checking disabled
2021-02-04 09:24:09 -08:00