rocm-systems

Author	SHA1	Message	Date
Wenkai Du	c985358e11	Merge remote-tracking branch 'nccl/master' into 2.8.3	2021-02-15 18:44:47 -05:00
Wenkai Du	3a1aebd742	Merge remote-tracking branch 'rccl/develop' into 2.8.3	2021-02-15 13:17:38 -05:00
Wenkai Du	bf8eb40705	Move HDP flush to CPU	2021-02-12 18:06:19 +00:00
pramenku	e9f7908592	Update install.sh (#317 ) * Update install.sh Install.sh having hard code like /opt/rocm/bin/hipcc for rocm_path and default_path=/opt/rocm This will work only when we have standalone rocm installed. If anyone has installed, side-by-side, they will face below error. Can we keep like ROCM_PATH=$ROCM_PATH instead of “default_path” as variable name and ROCM_BIN_PATH=$ROCM_PATH/bin ,rocm_path can be replaced with ROCM_BIN_PATH. This way, we will have option to export ROCM_PATH as env variable as per need and use the script. I have also tried locally, it’s working. ROCM_PATH is common variable name, we are having. If you are ok, I can also submit the PR for the same. Error when side-by-side install is done for driver. # ./install.sh -dtr 2>&1 \| tee /dockerx/6519_rccl-test.log CMake Error at /usr/share/cmake/Modules/CMakeDetermineCXXCompiler.cmake:48 (message): Could not find compiler set in environment variable CXX: /opt/rocm/bin/hipcc. Call Stack (most recent call first): CMakeLists.txt:12 (project) CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage -- Configuring incomplete, errors occurred! See also "/root/driver/rccl/build/release/CMakeFiles/CMakeOutput.log". * Update install.sh Removed ROCM_PATH=$ROCM_PATH * Update install.sh Set default value if external value is not supplied.	2021-02-12 08:44:30 -08:00
Stanley Tsang	6b7b312fb9	Fixed temp file creation/deletion with clique mode (#316 )	2021-02-12 08:44:10 -08:00
Sylvain Jeaugey	911d61f214	2.8.4-1 Fix hang in corner cases of alltoallv using point to point send/recv. Harmonize error messages. Fix missing NVTX section in the license. Update README.	2021-02-09 15:36:48 -08:00
Gilbert Lee	f1a9ce3fa5	Using GTEST_SKIP() to skip unit tests that have insufficient devices. Skipping out earlier	2021-02-09 03:54:04 +00:00
Wenkai Du	9cc3b56166	Fix GDRDMA read and remove unused files	2021-02-09 01:34:39 +00:00
Stanley Tsang	d00b7d17bd	Update MP UT to support arbitrary # of GPUs; multiple bugfixes (#16 ) * Fixing temp file creation/deletion for Clique kernel mode. * Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs * GroupCall MP UT properly quits when too many devices specified * MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script	2021-02-05 16:49:25 -08:00
Wenkai Du	6dfdfef98f	Add gfx908 Rome 4 NICs model	2021-02-06 00:19:47 +00:00
Gilbert Lee	f372c53d52	[TransferBench] Fixing some merge issues	2021-02-05 16:46:20 +00:00
Wenkai Du	ab1e7a0318	Merge remote-tracking branch 'origin/develop' into 2.8.3	2021-02-04 20:02:34 -05:00
Gilbert Lee	2f541508c5	[topo_expl] Updating for 2.8.3	2021-02-04 19:08:42 +00:00
Gilbert Lee	9aac1ed38f	[ib-test] Update for 2.8.3]	2021-02-04 19:05:03 +00:00
Gilbert Lee	9ce203dd0a	[TransferBench] Updating for 2.8.3	2021-02-04 18:58:25 +00:00
gilbertlee-amd	1990ffd76a	Tuning some clique-based kernel parameters (#315 )	2021-02-03 20:00:08 -07:00
Wenkai Du	5f97122442	Enable GPU direct RDMA read from GPU	2021-02-03 02:48:30 +00:00
gilbertlee-amd	62e0447e9a	[TransferBench] Restore some previous fixes - memory leak, PCIe address (#314 )	2021-02-01 09:48:09 -07:00
Gilbert Lee	01a998b17c	Removing in-place tests from Combined calls (no support for send/recv)	2021-01-28 20:09:03 +00:00
gilbertlee-amd	3e62ceddc5	Clique kernel support (#295 ) (#15 ) * Adding experimental clique-based kernels (opt-in only) Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Gilbert Lee <gilbert.lee@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com> Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>	2021-01-28 09:45:01 -07:00
Wenkai Du	41e47a36e7	Use less unroll for clique kernels (#313 )	2021-01-15 17:48:10 -08:00
Stanley Tsang	d3fa257682	Adding multiprocess unit tests (#312 ) Adding multiprocess unit tests for collectives. To run, NCCL_COMM_ID=$HOSTNAME:12345 build/release/test/UnitTestsMultiProcess	2021-01-15 16:34:36 -07:00
Wenkai Du	2ddbe6646b	Improve collective trace	2021-01-14 19:28:01 -05:00
Wenkai Du	b33a2cac8b	gtest: add scatter to combined calls and use loops (#303 ) * gtest: add scatter to combined calls and use loops * gtest: run validation inside loop * gtest: revert small element count to 2520 * gtest: fix memory leak in validation (cherry picked from commit `b0853ccd51`) * Fix combined call UT * Fix memory leak * Fix alltoallv test	2021-01-14 19:28:01 -05:00
Wenkai Du	f4d5d3d620	Port alltoall[v]	2021-01-14 19:28:01 -05:00
Wenkai Du	105db19a11	Do not allow GPU as intermediate	2021-01-14 19:28:01 -05:00
Wenkai Du	e055229e56	Revert "Changes to topology based on XGMI (#272 )" This reverts commit `01bd2573db`.	2021-01-14 19:28:01 -05:00
Wenkai Du	d469947641	Merge remote-tracking branch 'nccl/master' into no-target-id	2021-01-14 19:27:53 -05:00
Jonas Zhou	3996562690	x86: Add CPU detection for Zhaoxin processors Signed-off-by: Jonas Zhou <JonasZhou@zhaoxin.com>	2020-12-17 11:15:18 -08:00
Wenkai Du	373a108516	Fix Rome PCIe 2 node topology generation (#310 )	2020-12-15 17:16:17 -08:00
gilbertlee-amd	41c35dad48	[TransferBench] Fixing bug with fine-grained memory allocation (#311 ) * Fixing bug with fine-grained memory	2020-12-15 17:37:31 -07:00
gilbertlee-amd	ae0c4092c7	[TransferBench] Adding ability to perform CPU-executed copies, various upgrades (#309 ) * Adding CPU based execution, fixing typos, adding Fine-grained mem * Exposing sampling factor when generating range of data sizes * Refactoring how Links are launched, now once per thread * Documentation updates	2020-12-11 10:21:14 -07:00
gilbertlee-amd	b80ae551b1	[TransferBench] Support multiple of 4 byte sizes, changing default GPU timing mechanism (#307 ) * Changing default timing mechanism, adjusting CPU bandwidth calc, adding flag to use combined timing * Adding support for smaller transfers (byte size must be multiple of 4 instead of 128)	2020-12-04 14:57:13 -07:00
Wenkai Du	882d52ad7e	Adding backward compatibility for target-id syntax for AMDGPU_TARGETS (#306 )	2020-12-04 13:55:56 -08:00
Wenkai Du	975b14dffa	Add Rome model and improve search (#305 )	2020-11-17 14:55:06 -08:00
Sylvain Jeaugey	920dbe5b35	2.8.3-1 Optimization for Tree allreduce on A100. Improve aggregation performance. Use shared buffers for inter-node send/recv. Add NVTX profiling hooks. Accelerate alltoall connections by merging communication for all channels. Add support for one hop communication through NVLink, for faster send/recv communication on cubemesh topologies like DGX-1. Improve alltoall scheduling to better balance intra/inter node communication. Increase send/recv parallelism by 8x, each warp sending or receiving to a different peer. Net: move to v4. Net: make flush operation asynchronous to accelerate alltoall. Net: define maximum number of requests. Fix hang when using LL128 protocol after 2^31 steps. Fix #379 : topology injection failing when using less GPUs than described in the XML. Fix #394 : protocol mismatch causing hangs or crashes when using one GPU per node.	2020-11-17 11:08:52 -08:00
Wenkai Du	1943bac646	Merge remote-tracking branch 'origin/master' into develop	2020-11-16 12:16:53 -05:00
Wenkai Du	554729079d	Use device's link width and speed if port doesn't report (#304 )	2020-11-13 17:58:04 -08:00
Wenkai Du	b0853ccd51	gtest: add scatter to combined calls and use loops (#303 ) * gtest: add scatter to combined calls and use loops * gtest: run validation inside loop * gtest: revert small element count to 2520 * gtest: fix memory leak in validation	2020-11-13 17:57:44 -08:00
Stanley Tsang	2958f7eace	Fixing IPC handle leak (#302 )	2020-11-13 10:32:42 -07:00
gilbertlee-amd	c8d08a7c2f	Adding RCCL_CLIQUE_DEBUG to help debug experimental clique feature (#300 )	2020-11-13 09:07:11 -07:00
Wenkai Du	4e68229c8b	Skip unused peer connection in scatter and gather (#301 )	2020-11-12 15:47:34 -08:00
Colin Smith	377b43470b	Merge pull request #299 from ROCmSoftwarePlatform/develop Enable target id build	2020-11-10 15:47:42 -07:00
gilbertlee-amd	41bcfb8878	Clique kernel support (#295 ) * Adding experimental clique-based kernels (opt-in only) Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Gilbert Lee <gilbert.lee@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>	2020-11-10 15:44:10 -07:00
Wenkai Du	1fdb216f87	Use target id of xnack off (#298 )	2020-11-10 11:10:48 -08:00
Wenkai Du	2e8b3a0857	Use ncclSend/ncclRecv for alltoall type of collectives as default (#297 )	2020-11-09 11:23:17 -08:00
gilbertlee-amd	bdd8adf1ca	Adding a CHANGELOG (#296 )	2020-11-05 13:38:30 -07:00
Wenkai Du	709b7e4880	Improve GPU direct RDMA handling on Rome (#294 )	2020-11-03 14:29:08 -08:00
Wenkai Du	dfa3c41ede	Add more Rome models (#292 )	2020-10-30 21:26:04 -07:00
gilbertlee-amd	bfab1d3592	Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats (#290 )	2020-10-27 09:00:33 -06:00

1 2 3 4 5 ...

555 Commits