نمودار کامیت

640 کامیت‌ها

مولف SHA1 پیام تاریخ
Bertan Dogancay 8a442faa12 Nvtx support (#1076)
* NVTX support
2024-02-08 14:08:24 -07:00
Wenkai Du 5257c753c5 msccl: use relaxed atomics on scratch buffer (#1075) 2024-02-08 12:09:56 -08:00
Wenkai Du 704c9ef0d1 Doubling P2P channels per peer on single node gfx94x only (#1074) 2024-02-07 14:05:57 -08:00
Wenkai Du 1d989f6524 Doubling P2P channels per peer on single node only (#1069) 2024-02-02 12:41:00 -08:00
Bertan Dogancay 01b359027b Include common.h in enqueue.cc instead (#1067) 2024-01-30 08:24:22 -08:00
Wenkai Du f7550d83b8 msccl: ensure memory coherence after data receive (#1062) 2024-01-30 08:22:50 -08:00
Pedram Alizadeh ccfb35fa6d modifying the tuning table to improve the performance of allreduce for 8MB and 16MB for single-node MI300X (#1063) 2024-01-26 09:05:53 -05:00
Wenkai Du be8ef4367f colltrace: fix dropped trace messages (#1059)
* colltrace: fix dropped trace messages

* Remove extra space
2024-01-25 13:31:53 -08:00
Wenkai Du ffde530af5 Increase P2P channels per peer (#1060) 2024-01-25 11:21:58 -08:00
Wenkai Du 7987015a19 Revert "msccl: build same number of kernels as in ROCm 5.7" (#1058)
This reverts commit f960174d03be7e5174baa83b256526d388a38842.
2024-01-24 08:43:50 -08:00
Bertan Dogancay 5564d65e71 Use binary search for direct function calls (#1057)
* Use binary search for direct function calls

* fix scratch mem issue on MI300
2024-01-22 17:37:56 -07:00
Bertan Dogancay c4dbf8a914 Fix collective trace when rccl is configured (#1056)
* Fix collective trace when rccl is configured
2024-01-22 09:26:44 -07:00
Wenkai Du 7e25d5bc55 Use new HIP graph API compatible with CUDA 11030 (#991)
* Use new HIP graph API compatible with CUDA 11030

* Update dependency to ROCm 6.1

* Fix single stream use case
2024-01-21 19:00:50 -08:00
Nilesh M Negi 8b97a20943 COLLECTIVES: Switch to unroll 2 for MI300 (#1051)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-01-19 12:16:05 -06:00
Bertan Dogancay 28d9b170c9 [DEV] Configure functions in RCCL (#986)
* configure functions in rccl
2024-01-18 15:07:16 -07:00
Wenkai Du 3325f96c56 Only use full MAXCHANNELS for gfx94x (#1050) 2024-01-17 09:00:49 -08:00
Pedram Alizadeh b08124c85d adding rccl tuning parameters for MI300X gfx942 with 8 GPUs single and multi-node (#1047) 2024-01-16 13:44:32 -05:00
Wenkai Du 261707d90a Add option to force enable network transport on single node (#1046) 2024-01-16 07:54:18 -08:00
PedramAlizadeh 767fde8210 Revert "2.18.5-1"
This reverts commit 559b70f86c.
2024-01-12 16:54:19 +00:00
Bertan Dogancay cf248d9402 Addressing the compiler warning (#988) 2024-01-10 14:59:40 -07:00
Hossein Pourreza 735178c1fe cover more gpu/nic mapping cases (#1037) 2024-01-10 08:01:37 -08:00
Wenkai Du 5851ae5974 Re-enable L128 on gfx90a of compiler supports it (#1036) 2024-01-10 08:01:11 -08:00
Nilesh M Negi 249e9f7f65 Un-escaped character causes error with address sanitizer builds (#992)
Signed-off-by: Nilesh M Negi <Nilesh.Negi@amd.com>
Co-authored-by: Jenkins <jenkins-compute@amd.com>
2024-01-09 13:28:32 -06:00
Pedram Alizadeh aa5c84c997 Merge pull request #1022 from PedramAlizadeh/sync_nccl_2.18.6
Sync to nccl 2.18.6
2024-01-09 13:29:29 -05:00
Wenkai Du d9871d171b msccl: use custom reduce function (#1033) 2024-01-08 14:53:12 -08:00
Wenkai Du f7e39fced2 Doubling buffer size to fix NCCL INFO corruption with increased channels (#1035) 2024-01-08 08:14:33 -08:00
Wenkai Du e5bf56c6d8 Increase stack size for gfx906 (#1034)
Occationally "Memory access fault by GPU node-8 (Agent handle: 0x23a5640) on address 0x7f461ec00000. Reason: Page not present or supervisor privilege" can be seen from gfx906 CI
2024-01-07 20:25:02 -08:00
Ziyue Yang 70bbeb4773 Fix MSCCL multi-node (#1032)
1) Move needsProxy initialization before mscclSetupConnections since the latter
will revise it later.
2) Remove mscclAvailable check in net.cc since it's no more required and caused
non-shared buffer allocated for MSCCL which is not expected.
2024-01-05 17:03:43 -08:00
Wenkai Du abf265a911 Rework barriers and adjust scope of atomics (#1019) 2024-01-04 08:18:48 -08:00
Ziyue Yang 0a53077c9c Improve MSCCL algorithms (#1023) 2024-01-03 14:51:34 -08:00
akolliasAMD f4858e14b2 rearranged how the min and max functions are part of msccl (#1025)
* rearranged how the min and max functions are part of msccl

* added more coverage on in place graph tests
2023-12-21 08:58:33 -07:00
PedramAlizadeh 0d515f9388 resolved conflicts, fixed the localNetCount/0 bug 2023-12-18 08:11:34 +00:00
Ziyue Yang 655742a3a6 Fully disable MSCCL when machine is not matched (#1017)
* Disable MSCCL algorithm meta loading when machine is not matched

* fully disable init

* fix potential segfault
2023-12-13 08:36:21 -08:00
Wenkai Du 53d807a5b9 msccl: disable on multi-node (#1018) 2023-12-13 07:41:40 -08:00
Wenkai Du 81602814a7 msccl: fix data corruption with MTYPE_RW (#1014) 2023-12-11 20:33:15 -08:00
Wenkai Du 7965c8b53c Fix memory fence and use non-temporal store (#1007)
* Fix memory fence and use non-temporal store

* Use amdgcn builtin instead of inline asm

* Move threadfence location

* Revert changes to gfx90a

* Rework gfx90a change

* Apply changes to gfx94x
2023-12-09 12:16:08 -08:00
Ziyue Yang c002f20029 Fix MSCCL scratch allocation (#1010) 2023-12-08 17:47:10 -06:00
Wen-Heng (Jack) Chung baadda4bd8 Relax workgroup barrier implementation for MSCCL send/recv ops. (#997)
* Trim logic.

* Revert "Trim logic."

This reverts commit 8f2dba6c764108acf2bf5428366b9f41d4d206b9.

* Introduce MSCCL template parameters to send / recv.

* Address review feedbacks.
2023-12-08 17:46:53 -06:00
Wenkai Du 12c08fc52a msccl: build same number of kernels as in ROCm 5.7 (#1005)
Removed fullOps kernels from build
2023-12-07 13:36:04 -06:00
Wen-Heng (Jack) Chung 293f0fb752 Use a map to host scratch buffers (#1004)
* Use a map to host scratch buffers

* Address review feedbacks. Deliberately keep mscclSetupScratch function.
2023-12-05 13:15:28 -06:00
Nilesh M Negi bc44e3faa7 Fix gcnArch bug in IFC mix build (#998) (#1002)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2023-12-04 16:20:22 -06:00
Bertan Dogancay 7c0f49a878 IFC mix build (#998) 2023-12-02 18:49:52 -07:00
Wenkai Du 4ba65d1d6a Increase max channles to 64 (#993) 2023-12-01 16:01:11 -08:00
pradeep-ramanna 0b53f79196 Fix GPU to NIC mapping for peertopeer (#994) 2023-12-01 08:00:17 -08:00
Ziyue Yang e44e112a17 Fix mscclAlgoHandle not initialized issue (#995) 2023-12-01 07:58:01 -08:00
Ziyue Yang 4bb0b4a380 Move MSCCL algorithm loading to initialization to workaround HIP graph conflict (#982)
* MSCCL: pre-specify channels and pre-load algorithms

* add mutex

* fix bug

* clean include

* disable all-gathers temporarily
2023-11-30 09:47:20 -08:00
akolliasAMD 56ce9ef05f recreated pr 914 to work with current develop branch (#979) 2023-11-28 16:33:47 -07:00
Wenkai Du 50b2dd9fd7 Add special handling of gfx940 (#976)
* Add special handling of gfx940

* Update ring base
2023-11-22 15:07:36 -08:00
Wenkai Du 569d3f7d59 msccl: allocate scratch as ext-scope fine-grained (#968) 2023-11-16 09:57:25 -06:00
Wenkai Du bc8661f092 Fix kernel command line warnings (#961)
* Fix kernel command line warnings

* Remove while loop
2023-11-15 18:01:12 -08:00