rocm-systems

作者	SHA1	備註	日期
Wenkai Du	bf8eb40705	Move HDP flush to CPU	2021-02-12 18:06:19 +00:00
Wenkai Du	9cc3b56166	Fix GDRDMA read and remove unused files	2021-02-09 01:34:39 +00:00
Stanley Tsang	d00b7d17bd	Update MP UT to support arbitrary # of GPUs; multiple bugfixes (#16 ) * Fixing temp file creation/deletion for Clique kernel mode. * Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs * GroupCall MP UT properly quits when too many devices specified * MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script	2021-02-05 16:49:25 -08:00
Wenkai Du	ab1e7a0318	Merge remote-tracking branch 'origin/develop' into 2.8.3	2021-02-04 20:02:34 -05:00
gilbertlee-amd	1990ffd76a	Tuning some clique-based kernel parameters (#315 )	2021-02-03 20:00:08 -07:00
Wenkai Du	5f97122442	Enable GPU direct RDMA read from GPU	2021-02-03 02:48:30 +00:00
gilbertlee-amd	3e62ceddc5	Clique kernel support (#295 ) (#15 ) * Adding experimental clique-based kernels (opt-in only) Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Gilbert Lee <gilbert.lee@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com> Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>	2021-01-28 09:45:01 -07:00
Wenkai Du	41e47a36e7	Use less unroll for clique kernels (#313 )	2021-01-15 17:48:10 -08:00
Wenkai Du	2ddbe6646b	Improve collective trace	2021-01-14 19:28:01 -05:00
Wenkai Du	f4d5d3d620	Port alltoall[v]	2021-01-14 19:28:01 -05:00
Wenkai Du	105db19a11	Do not allow GPU as intermediate	2021-01-14 19:28:01 -05:00
Wenkai Du	e055229e56	Revert "Changes to topology based on XGMI (#272 )" This reverts commit `01bd2573db`.	2021-01-14 19:28:01 -05:00
Wenkai Du	d469947641	Merge remote-tracking branch 'nccl/master' into no-target-id	2021-01-14 19:27:53 -05:00
Wenkai Du	373a108516	Fix Rome PCIe 2 node topology generation (#310 )	2020-12-15 17:16:17 -08:00
Wenkai Du	975b14dffa	Add Rome model and improve search (#305 )	2020-11-17 14:55:06 -08:00
Sylvain Jeaugey	920dbe5b35	2.8.3-1 Optimization for Tree allreduce on A100. Improve aggregation performance. Use shared buffers for inter-node send/recv. Add NVTX profiling hooks. Accelerate alltoall connections by merging communication for all channels. Add support for one hop communication through NVLink, for faster send/recv communication on cubemesh topologies like DGX-1. Improve alltoall scheduling to better balance intra/inter node communication. Increase send/recv parallelism by 8x, each warp sending or receiving to a different peer. Net: move to v4. Net: make flush operation asynchronous to accelerate alltoall. Net: define maximum number of requests. Fix hang when using LL128 protocol after 2^31 steps. Fix #379 : topology injection failing when using less GPUs than described in the XML. Fix #394 : protocol mismatch causing hangs or crashes when using one GPU per node.	2020-11-17 11:08:52 -08:00
Wenkai Du	554729079d	Use device's link width and speed if port doesn't report (#304 )	2020-11-13 17:58:04 -08:00
Stanley Tsang	2958f7eace	Fixing IPC handle leak (#302 )	2020-11-13 10:32:42 -07:00
gilbertlee-amd	c8d08a7c2f	Adding RCCL_CLIQUE_DEBUG to help debug experimental clique feature (#300 )	2020-11-13 09:07:11 -07:00
Wenkai Du	4e68229c8b	Skip unused peer connection in scatter and gather (#301 )	2020-11-12 15:47:34 -08:00
gilbertlee-amd	41bcfb8878	Clique kernel support (#295 ) * Adding experimental clique-based kernels (opt-in only) Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> Co-authored-by: Gilbert Lee <gilbert.lee@amd.com> Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>	2020-11-10 15:44:10 -07:00
Wenkai Du	2e8b3a0857	Use ncclSend/ncclRecv for alltoall type of collectives as default (#297 )	2020-11-09 11:23:17 -08:00
Wenkai Du	709b7e4880	Improve GPU direct RDMA handling on Rome (#294 )	2020-11-03 14:29:08 -08:00
Wenkai Du	dfa3c41ede	Add more Rome models (#292 )	2020-10-30 21:26:04 -07:00
xietingwew	084207e685	fix proxyArgs for trace log	2020-10-21 09:18:40 -07:00
Wenkai Du	dcad0ef7cb	Fix incorrect pointer checking for scatter and gather (#285 )	2020-10-19 13:27:09 -07:00
Wenkai Du	c835d8263a	Merge remote-tracking branch 'nccl/master' into nccl_sync	2020-10-15 18:42:38 -04:00
gilbertlee-amd	84a2541e01	Revert "Initial support for clique-based kernels (#276 )" (#280 ) This reverts commit `2b8184808d`.	2020-10-15 11:30:18 -07:00
Sylvain Jeaugey	0e14394c5f	Fix affinity move	2020-10-13 16:58:05 -07:00
Sylvain Jeaugey	c6dbdb0084	Make sure proxy threads inherit the CPU affinity.	2020-10-13 16:37:52 -07:00
Wenkai Du	33babcb5e2	Update Rome single node models (#277 )	2020-10-13 13:33:09 -07:00
gilbertlee-amd	2b8184808d	Initial support for clique-based kernels (#276 ) * Initial support for clique-based kernels	2020-10-13 11:22:04 -06:00
Wenkai Du	ae008fd2db	Rework Rome detection and add multiple network ports models (#274 ) * Rework Rome detection and add multiple network ports models * Remove unused opCount in p2p transport	2020-10-07 13:37:36 -07:00
Wenkai Du	b871ea3c0c	Add Alltoallv RCCL kernel implementation (#269 ) * Add alltoallv API and implementation * Extend Rome P2P channel limit to multinode and alltoall kernels * topo_expl: fix compilation and sync up with main * gtest: use RCCL alltoallv API * Code review changes	2020-09-30 16:25:36 -07:00
Stanley Tsang	acca2ae20a	Updating inline asm to not require explicit L1 cache invalidation (#270 )	2020-09-25 13:46:26 -06:00
gilbertlee-amd	01bd2573db	Changes to topology based on XGMI (#272 ) * Alterations to topology search to improve XGMI-enabled nodes	2020-09-25 12:20:09 -06:00
Wenkai Du	44fcde7835	Ensure all ranks on same send/receive or alltoall kernel path (#271 )	2020-09-24 08:25:04 -07:00
Wenkai Du	d871fceb54	Change network plugin name to librccl-net.so (#266 )	2020-09-18 13:23:30 -07:00
Wenkai Du	42955f5f4f	Limit P2P channels on Rome	2020-09-17 17:20:32 -07:00
Wenkai Du	60819dcf8d	Merge pull request #262 from wenkaidu/alignment Make data alignment requirements matching ISA manual	2020-09-08 10:40:42 -07:00
Wenkai Du	e2042ccf8a	Fix broken profiling build (#263 )	2020-09-02 15:39:52 -07:00
Wenkai Du	4751992231	Make data alignment requirements matching ISA manual From https://developer.amd.com/wp-content/resources/Vega_Shader_ISA.pdf 8.1.7. Alignment For Dword or larger reads or writes, the two LSBs of the byte-address are ignored, thus forcing Dword alignment.	2020-09-01 21:21:58 +00:00
Wenkai Du	4180e6409e	Fix incorrect threads split in sendrecv (#261 )	2020-08-31 17:33:22 -07:00
Wenkai Du	c5cbece6d0	Increase minimal channels for gfx908 (#259 )	2020-08-26 11:40:11 -07:00
Wenkai Du	b0919dc46c	Only use software barrier for synchronization (#258 )	2020-08-25 13:16:34 -07:00
Wenkai Du	391bbf3f1e	Add NPS4 support on some models (#256 ) * Add NPS4 support on some models * Add XML models	2020-08-19 11:03:20 -07:00
Wenkai Du	a51e4071e3	Add another Rome model (#249 ) * Add another Rome model * Add gfx908 4P3L models and support * Revert "Use cached value for detecting GDR support only once" This reverts commit `67c8e72ce3`. * Skip using ibverb for GPU direct RDMA detection * Fine tune one Rome model	2020-08-17 10:51:02 -07:00
Wenkai Du	7e3d8a31cc	Collect gcnArch and hipDeviceArch_t in XML (#252 )	2020-08-12 15:48:38 -07:00
Wenkai Du	066223333d	Merge pull request #248 from wenkaidu/2.7.8 2.7.8	2020-08-11 08:20:37 -07:00
Wenkai Du	7e3f841fab	Merge remote-tracking branch 'nccl/master' into 2.7.8	2020-08-10 16:11:00 +00:00

1 2 3 4 5

226 次程式碼提交