rocm-systems

作者	SHA1	备注	提交日期
Wenkai Du	e0053311c0	Add another Rome model (#431 )	2021-10-06 08:17:12 -07:00
Wenkai Du	5c8380ff5b	Implement NIC identification and remapping (#420 ) * Add 1H16P GPU model * Implement NIC identification and remapping * Revert "Sort IB devices based on device name (#413)" This reverts commit `2d0ed8dff6`. * Fix permute and check order * Correction on IB speed reporting * Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)" This reverts commit `caf5c9992a`.	2021-08-24 09:42:04 -07:00
Wenkai Du	5f15ed6e3e	Add gfx908 VM model (#418 )	2021-08-10 08:55:11 -07:00
Wenkai Du	135d47d125	topo_expl: fix build after switching to rocm-smi-lib (#405 ) * topo_expl: fix build after switching to rocm-smi-lib * Use minimal of 4 channels for gfx908	2021-07-27 08:30:08 -07:00
Wenkai Du	961922ea02	Add option to enable multiple SAT in SHARP (#380 ) * Add option to enable multiple SAT in SHARP * Extend number of NICs to 16	2021-06-03 19:45:18 -07:00
Wenkai Du	4c83adb75c	Update Rome models matching (#376 )	2021-05-25 10:12:40 -07:00
Wenkai Du	a4ea1fed5b	Merge remote-tracking branch 'nccl/master' into develop	2021-05-05 16:01:01 -07:00
Wenkai Du	1fe031402a	Add gfx90a target (#344 ) * Add gfx90a target * Support gfx90a topology Co-authored-by: Eiden Yoshida <eiden.yoshida@amd.com>	2021-04-14 09:29:00 -06:00
Wenkai Du	d87dc7c2e8	collnet: support multiple NICs (#335 )	2021-03-25 20:59:32 -07:00
Wenkai Du	1d6244b18d	Enable collnet in RCCL (#333 ) * Enable CollNet and use different number of channels * topo_expl: enable collnet	2021-03-19 12:58:13 -07:00
Wenkai Du	6dfdfef98f	Add gfx908 Rome 4 NICs model	2021-02-06 00:19:47 +00:00
Wenkai Du	ab1e7a0318	Merge remote-tracking branch 'origin/develop' into 2.8.3	2021-02-04 20:02:34 -05:00
Wenkai Du	d469947641	Merge remote-tracking branch 'nccl/master' into no-target-id	2021-01-14 19:27:53 -05:00
Wenkai Du	373a108516	Fix Rome PCIe 2 node topology generation (#310 )	2020-12-15 17:16:17 -08:00
Wenkai Du	975b14dffa	Add Rome model and improve search (#305 )	2020-11-17 14:55:06 -08:00
Wenkai Du	dfa3c41ede	Add more Rome models (#292 )	2020-10-30 21:26:04 -07:00
Wenkai Du	33babcb5e2	Update Rome single node models (#277 )	2020-10-13 13:33:09 -07:00
Wenkai Du	ae008fd2db	Rework Rome detection and add multiple network ports models (#274 ) * Rework Rome detection and add multiple network ports models * Remove unused opCount in p2p transport	2020-10-07 13:37:36 -07:00
Wenkai Du	c5cbece6d0	Increase minimal channels for gfx908 (#259 )	2020-08-26 11:40:11 -07:00
Wenkai Du	391bbf3f1e	Add NPS4 support on some models (#256 ) * Add NPS4 support on some models * Add XML models	2020-08-19 11:03:20 -07:00
Wenkai Du	a51e4071e3	Add another Rome model (#249 ) * Add another Rome model * Add gfx908 4P3L models and support * Revert "Use cached value for detecting GDR support only once" This reverts commit `67c8e72ce3`. * Skip using ibverb for GPU direct RDMA detection * Fine tune one Rome model	2020-08-17 10:51:02 -07:00
Wenkai Du	09ef75656a	Add more Rome 4P2H models	2020-08-06 18:20:02 +00:00
Wenkai Du	e7a10aa0e4	Topology tuning for 4P2H on Rome (#242 ) * Topology tuning for 4P2H on Rome * Use ncclTopoIdToIndex	2020-07-27 11:53:57 -07:00
Sourav Chakraborty	2475daafee	add 4 node 8P6L 1 NIC 2nd Hive model	2020-07-22 16:27:15 +00:00
Sourav Chakraborty	db55afb014	simplify model definitions in topo expl	2020-07-22 16:05:53 +00:00
Wenkai Du	d5f90e19b5	Add 8P6L multi-node models (#239 )	2020-07-21 14:10:36 -07:00
Wenkai Du	d3548cc474	topo_expl: each rank needs to have its own memory for graphs (#225 )	2020-07-01 15:11:02 -07:00
Wenkai Du	e80e29573c	Add gather, scatter and alltoall collectives Introducing 3 new APIs: ncclResult_t ncclGather(const void* sendbuff, void* recvbuff, size_t sendcount, ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream); ncclResult_t ncclScatter(const void* sendbuff, void* recvbuff, size_t recvcount, ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream); ncclResult_t ncclAllToAll(const void* sendbuff, void* recvbuff, size_t count, ncclDataType_t datatype, ncclComm_t comm, hipStream_t stream); Only out of place operation is supported. Preprocessor symbol RCCL_GATHER_SCATTER=1 indicates API availibility. By default the APIs launche RCCL kernel implementation, which can be disabled by RCCL_ALLTOALL_KERNEL_DISABLE=1. Then the APIs use wrapper around ncclSend and ncclRecv.	2020-06-09 17:44:08 -07:00
Wenkai Du	b3c9852634	Give preference to path with more XGMI connections	2020-05-14 15:33:16 -07:00
Wenkai Du	6f54b23503	topo_expl: update to 2.6	2020-04-01 13:37:08 -07:00
Wenkai Du	32388d60a9	topo_expl: add a few more single node models	2020-03-02 11:43:03 -08:00
Wenkai Du	934b6de557	topo_expl: use bandwidth numbers defined in graph in CPU models	2020-02-26 14:17:36 -08:00
Wenkai Du	d2adc61bf6	Revise PCI BW numbers on Rome	2020-02-26 13:17:49 -08:00
Wenkai Du	55f8e2dec7	Add topology explorer	2020-02-19 14:42:06 -08:00

34 次代码提交