rocm-systems

مولف	SHA1	پیام	تاریخ
Bertan Dogancay	8a442faa12	Nvtx support (#1076 ) * NVTX support	2024-02-08 14:08:24 -07:00
Wenkai Du	5257c753c5	msccl: use relaxed atomics on scratch buffer (#1075 )	2024-02-08 12:09:56 -08:00
Wenkai Du	704c9ef0d1	Doubling P2P channels per peer on single node gfx94x only (#1074 )	2024-02-07 14:05:57 -08:00
Wenkai Du	1d989f6524	Doubling P2P channels per peer on single node only (#1069 )	2024-02-02 12:41:00 -08:00
Bertan Dogancay	01b359027b	Include common.h in enqueue.cc instead (#1067 )	2024-01-30 08:24:22 -08:00
Wenkai Du	f7550d83b8	msccl: ensure memory coherence after data receive (#1062 )	2024-01-30 08:22:50 -08:00
Pedram Alizadeh	ccfb35fa6d	modifying the tuning table to improve the performance of allreduce for 8MB and 16MB for single-node MI300X (#1063 )	2024-01-26 09:05:53 -05:00
Wenkai Du	be8ef4367f	colltrace: fix dropped trace messages (#1059 ) * colltrace: fix dropped trace messages * Remove extra space	2024-01-25 13:31:53 -08:00
Wenkai Du	ffde530af5	Increase P2P channels per peer (#1060 )	2024-01-25 11:21:58 -08:00
Wenkai Du	7987015a19	Revert "msccl: build same number of kernels as in ROCm 5.7" (#1058 ) This reverts commit f960174d03be7e5174baa83b256526d388a38842.	2024-01-24 08:43:50 -08:00
Bertan Dogancay	5564d65e71	Use binary search for direct function calls (#1057 ) * Use binary search for direct function calls * fix scratch mem issue on MI300	2024-01-22 17:37:56 -07:00
Bertan Dogancay	c4dbf8a914	Fix collective trace when rccl is configured (#1056 ) * Fix collective trace when rccl is configured	2024-01-22 09:26:44 -07:00
Wenkai Du	7e25d5bc55	Use new HIP graph API compatible with CUDA 11030 (#991 ) * Use new HIP graph API compatible with CUDA 11030 * Update dependency to ROCm 6.1 * Fix single stream use case	2024-01-21 19:00:50 -08:00
Nilesh M Negi	8b97a20943	COLLECTIVES: Switch to unroll 2 for MI300 (#1051 ) Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>	2024-01-19 12:16:05 -06:00
Bertan Dogancay	28d9b170c9	[DEV] Configure functions in RCCL (#986 ) * configure functions in rccl	2024-01-18 15:07:16 -07:00
Wenkai Du	3325f96c56	Only use full MAXCHANNELS for gfx94x (#1050 )	2024-01-17 09:00:49 -08:00
Pedram Alizadeh	b08124c85d	adding rccl tuning parameters for MI300X gfx942 with 8 GPUs single and multi-node (#1047 )	2024-01-16 13:44:32 -05:00
Wenkai Du	261707d90a	Add option to force enable network transport on single node (#1046 )	2024-01-16 07:54:18 -08:00
PedramAlizadeh	767fde8210	Revert "2.18.5-1" This reverts commit `559b70f86c`.	2024-01-12 16:54:19 +00:00
Bertan Dogancay	cf248d9402	Addressing the compiler warning (#988 )	2024-01-10 14:59:40 -07:00
Hossein Pourreza	735178c1fe	cover more gpu/nic mapping cases (#1037 )	2024-01-10 08:01:37 -08:00
Wenkai Du	5851ae5974	Re-enable L128 on gfx90a of compiler supports it (#1036 )	2024-01-10 08:01:11 -08:00
Nilesh M Negi	249e9f7f65	Un-escaped character causes error with address sanitizer builds (#992 ) Signed-off-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Jenkins <jenkins-compute@amd.com>	2024-01-09 13:28:32 -06:00
Pedram Alizadeh	aa5c84c997	Merge pull request #1022 from PedramAlizadeh/sync_nccl_2.18.6 Sync to nccl 2.18.6	2024-01-09 13:29:29 -05:00
Wenkai Du	d9871d171b	msccl: use custom reduce function (#1033 )	2024-01-08 14:53:12 -08:00
Wenkai Du	f7e39fced2	Doubling buffer size to fix NCCL INFO corruption with increased channels (#1035 )	2024-01-08 08:14:33 -08:00
Wenkai Du	e5bf56c6d8	Increase stack size for gfx906 (#1034 ) Occationally "Memory access fault by GPU node-8 (Agent handle: 0x23a5640) on address 0x7f461ec00000. Reason: Page not present or supervisor privilege" can be seen from gfx906 CI	2024-01-07 20:25:02 -08:00
Ziyue Yang	70bbeb4773	Fix MSCCL multi-node (#1032 ) 1) Move needsProxy initialization before mscclSetupConnections since the latter will revise it later. 2) Remove mscclAvailable check in net.cc since it's no more required and caused non-shared buffer allocated for MSCCL which is not expected.	2024-01-05 17:03:43 -08:00
Wenkai Du	abf265a911	Rework barriers and adjust scope of atomics (#1019 )	2024-01-04 08:18:48 -08:00
Ziyue Yang	0a53077c9c	Improve MSCCL algorithms (#1023 )	2024-01-03 14:51:34 -08:00
akolliasAMD	f4858e14b2	rearranged how the min and max functions are part of msccl (#1025 ) * rearranged how the min and max functions are part of msccl * added more coverage on in place graph tests	2023-12-21 08:58:33 -07:00
PedramAlizadeh	0d515f9388	resolved conflicts, fixed the localNetCount/0 bug	2023-12-18 08:11:34 +00:00
Ziyue Yang	655742a3a6	Fully disable MSCCL when machine is not matched (#1017 ) * Disable MSCCL algorithm meta loading when machine is not matched * fully disable init * fix potential segfault	2023-12-13 08:36:21 -08:00
Wenkai Du	53d807a5b9	msccl: disable on multi-node (#1018 )	2023-12-13 07:41:40 -08:00
Wenkai Du	81602814a7	msccl: fix data corruption with MTYPE_RW (#1014 )	2023-12-11 20:33:15 -08:00
Wenkai Du	7965c8b53c	Fix memory fence and use non-temporal store (#1007 ) * Fix memory fence and use non-temporal store * Use amdgcn builtin instead of inline asm * Move threadfence location * Revert changes to gfx90a * Rework gfx90a change * Apply changes to gfx94x	2023-12-09 12:16:08 -08:00
Ziyue Yang	c002f20029	Fix MSCCL scratch allocation (#1010 )	2023-12-08 17:47:10 -06:00
Wen-Heng (Jack) Chung	baadda4bd8	Relax workgroup barrier implementation for MSCCL send/recv ops. (#997 ) * Trim logic. * Revert "Trim logic." This reverts commit 8f2dba6c764108acf2bf5428366b9f41d4d206b9. * Introduce MSCCL template parameters to send / recv. * Address review feedbacks.	2023-12-08 17:46:53 -06:00
Wenkai Du	12c08fc52a	msccl: build same number of kernels as in ROCm 5.7 (#1005 ) Removed fullOps kernels from build	2023-12-07 13:36:04 -06:00
Wen-Heng (Jack) Chung	293f0fb752	Use a map to host scratch buffers (#1004 ) * Use a map to host scratch buffers * Address review feedbacks. Deliberately keep mscclSetupScratch function.	2023-12-05 13:15:28 -06:00
Nilesh M Negi	bc44e3faa7	Fix gcnArch bug in IFC mix build (#998 ) (#1002 ) Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>	2023-12-04 16:20:22 -06:00
Bertan Dogancay	7c0f49a878	IFC mix build (#998 )	2023-12-02 18:49:52 -07:00
Wenkai Du	4ba65d1d6a	Increase max channles to 64 (#993 )	2023-12-01 16:01:11 -08:00
pradeep-ramanna	0b53f79196	Fix GPU to NIC mapping for peertopeer (#994 )	2023-12-01 08:00:17 -08:00
Ziyue Yang	e44e112a17	Fix mscclAlgoHandle not initialized issue (#995 )	2023-12-01 07:58:01 -08:00
Ziyue Yang	4bb0b4a380	Move MSCCL algorithm loading to initialization to workaround HIP graph conflict (#982 ) * MSCCL: pre-specify channels and pre-load algorithms * add mutex * fix bug * clean include * disable all-gathers temporarily	2023-11-30 09:47:20 -08:00
akolliasAMD	56ce9ef05f	recreated pr 914 to work with current develop branch (#979 )	2023-11-28 16:33:47 -07:00
Wenkai Du	50b2dd9fd7	Add special handling of gfx940 (#976 ) * Add special handling of gfx940 * Update ring base	2023-11-22 15:07:36 -08:00
Wenkai Du	569d3f7d59	msccl: allocate scratch as ext-scope fine-grained (#968 )	2023-11-16 09:57:25 -06:00
Wenkai Du	bc8661f092	Fix kernel command line warnings (#961 ) * Fix kernel command line warnings * Remove while loop	2023-11-15 18:01:12 -08:00

1 2 3 4 5 ...

640 کامیت‌ها