İşleme Grafiği

608 İşleme

Yazar SHA1 Mesaj Tarih
Wenkai Du abf265a911 Rework barriers and adjust scope of atomics (#1019) 2024-01-04 08:18:48 -08:00
Ziyue Yang 0a53077c9c Improve MSCCL algorithms (#1023) 2024-01-03 14:51:34 -08:00
akolliasAMD f4858e14b2 rearranged how the min and max functions are part of msccl (#1025)
* rearranged how the min and max functions are part of msccl

* added more coverage on in place graph tests
2023-12-21 08:58:33 -07:00
Ziyue Yang 655742a3a6 Fully disable MSCCL when machine is not matched (#1017)
* Disable MSCCL algorithm meta loading when machine is not matched

* fully disable init

* fix potential segfault
2023-12-13 08:36:21 -08:00
Wenkai Du 53d807a5b9 msccl: disable on multi-node (#1018) 2023-12-13 07:41:40 -08:00
Wenkai Du 81602814a7 msccl: fix data corruption with MTYPE_RW (#1014) 2023-12-11 20:33:15 -08:00
Wenkai Du 7965c8b53c Fix memory fence and use non-temporal store (#1007)
* Fix memory fence and use non-temporal store

* Use amdgcn builtin instead of inline asm

* Move threadfence location

* Revert changes to gfx90a

* Rework gfx90a change

* Apply changes to gfx94x
2023-12-09 12:16:08 -08:00
Ziyue Yang c002f20029 Fix MSCCL scratch allocation (#1010) 2023-12-08 17:47:10 -06:00
Wen-Heng (Jack) Chung baadda4bd8 Relax workgroup barrier implementation for MSCCL send/recv ops. (#997)
* Trim logic.

* Revert "Trim logic."

This reverts commit 8f2dba6c764108acf2bf5428366b9f41d4d206b9.

* Introduce MSCCL template parameters to send / recv.

* Address review feedbacks.
2023-12-08 17:46:53 -06:00
Wenkai Du 12c08fc52a msccl: build same number of kernels as in ROCm 5.7 (#1005)
Removed fullOps kernels from build
2023-12-07 13:36:04 -06:00
Wen-Heng (Jack) Chung 293f0fb752 Use a map to host scratch buffers (#1004)
* Use a map to host scratch buffers

* Address review feedbacks. Deliberately keep mscclSetupScratch function.
2023-12-05 13:15:28 -06:00
Nilesh M Negi bc44e3faa7 Fix gcnArch bug in IFC mix build (#998) (#1002)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2023-12-04 16:20:22 -06:00
Bertan Dogancay 7c0f49a878 IFC mix build (#998) 2023-12-02 18:49:52 -07:00
Wenkai Du 4ba65d1d6a Increase max channles to 64 (#993) 2023-12-01 16:01:11 -08:00
pradeep-ramanna 0b53f79196 Fix GPU to NIC mapping for peertopeer (#994) 2023-12-01 08:00:17 -08:00
Ziyue Yang e44e112a17 Fix mscclAlgoHandle not initialized issue (#995) 2023-12-01 07:58:01 -08:00
Ziyue Yang 4bb0b4a380 Move MSCCL algorithm loading to initialization to workaround HIP graph conflict (#982)
* MSCCL: pre-specify channels and pre-load algorithms

* add mutex

* fix bug

* clean include

* disable all-gathers temporarily
2023-11-30 09:47:20 -08:00
akolliasAMD 56ce9ef05f recreated pr 914 to work with current develop branch (#979) 2023-11-28 16:33:47 -07:00
Wenkai Du 50b2dd9fd7 Add special handling of gfx940 (#976)
* Add special handling of gfx940

* Update ring base
2023-11-22 15:07:36 -08:00
Wenkai Du 569d3f7d59 msccl: allocate scratch as ext-scope fine-grained (#968) 2023-11-16 09:57:25 -06:00
Wenkai Du bc8661f092 Fix kernel command line warnings (#961)
* Fix kernel command line warnings

* Remove while loop
2023-11-15 18:01:12 -08:00
Ziyue Yang 7fc891bc8d Fix MSCCL work FIFO allocation with HIP graph enabled (#967) 2023-11-15 16:43:28 -08:00
Bertan Dogancay 198f14923b Check to support older ROCm versions (#963) 2023-11-15 12:36:31 -07:00
Ziyue Yang 7ae95db5b8 Optimize MSCCL all-gather algorithms for gfx942 (#964) 2023-11-15 08:18:59 -08:00
Ziyue Yang df128879a6 Optimize MSCCL reduce primitive switching for gfx942 (#962)
* Optimize reduce primitive switching for gfx942

* address comment
2023-11-15 08:18:44 -08:00
Wenkai Du 5a800e00cd msccl: enable basic collective trace (#959)
To avoid increasing number of kernels, colltrace is only enabled with
RCCL_MSCCL_FORCE_FULLOPS=1
2023-11-08 20:14:28 -08:00
Wen-Heng (Jack) Chung efc42d9045 Use send instead of sendWithBarrier. (#727) 2023-11-07 13:47:24 -06:00
Nusrat Islam 022735d208 Merge pull request #950 from nusislam/msccl-red2
msccl: remove cases from numReduction switch statement
2023-11-04 02:48:03 -05:00
Wenkai Du dbcba2923b Use parallel init of LDS and adjust P2P channels for gfx94x (#943)
* Use parallel init of LDS and adjust P2P channels for gfx94x

* Move another init to parallel

* Fix NCCL_NCHANNELS_PER_PEER setting
2023-11-03 16:06:49 -07:00
Nusrat Islam f545b94d4b msccl: remove cases from numReduction switch statement 2023-11-03 16:56:51 -05:00
Wenkai Du bb84345943 msccl: use 32-bit LDS access and add RCCL_MSCCL_FORCE_FULLOPS (#953) 2023-11-03 10:38:02 -07:00
akolliasAMD 988efe605a MSCCL stream fix (#948) 2023-11-03 09:10:52 -06:00
Wenkai Du f484ff17b9 msccl: add templated kernel (#945)
* msccl: add templated kernel

* Use defines to improve code readability

* Fix kernel indexing and review feedback
2023-11-02 17:21:53 -07:00
Nusrat Islam 6b80a0d0d4 msccl: remove dereference of reduce args
It can be removed because the msccl kernel will never execute this code
according to the current msccl setup.
2023-11-02 13:20:00 -05:00
Wenkai Du a7400218a2 msccl: use atomic to set dependency flags (#941) 2023-10-31 14:46:57 -07:00
Wenkai Du a497722894 NPkit: misc fixes for MSCCL (#936)
* msccl: add xcc_id to timestamp sync

* NPKit: add timestamp for rrc operator

* NPKit: add timestamp for MSCCL init
2023-10-30 10:00:12 -07:00
Nilesh M Negi 1e5ca6820b Fix gcnArchName bug in topology dump (#937)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2023-10-28 12:30:36 -05:00
Ziyue Yang 4c117e5335 Fix MSCCL work FIFO out-of-bound issue (#935) 2023-10-27 11:24:52 -07:00
Nilesh M Negi 96ec3ffe2e SRC/INIT: fix typo for ENABLE_PROFILING (#934)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2023-10-26 23:52:46 -05:00
Nilesh M Negi f22df90e5c remove gcnArch support (#920)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2023-10-26 12:09:15 -05:00
Wenkai Du fb0eccb57b msccl: reduce debug output when using NCCL_DEBUG=INFO (#932) 2023-10-25 08:05:19 -07:00
Wenkai Du c4e65fd382 Add missing gfx942 support (#927) 2023-10-23 12:04:37 -07:00
Wenkai Du dbb5611a3a Remove LDS based software barriers from MSCCL (#923) 2023-10-19 16:39:41 -05:00
Wenkai Du 4278a9918b Update rome models (#922) 2023-10-18 17:28:01 -07:00
Wenkai Du 39812ce757 NPKit: add xcc_id field (#918) 2023-10-13 15:24:59 -07:00
Wenkai Du 1b80d041cb Fix incorrect arch name parsing (#916) 2023-10-13 10:01:11 -07:00
Wenkai Du 6d0b5c1e89 Port init_once fix from NCCL (#915) 2023-10-13 08:01:12 -07:00
Wen-Heng (Jack) Chung 7ee5c1c28b Change MSCCL kernel signature to allow kernel arguments be preloaded via SGPR (#911)
* Adding a script that will download/compile/run TransferBench/RCCL/UCX/RCCL-tests/RCCL-Unittests/hip-mpi-testsuite (#895)

Co-authored-by: Pedram Alizadeh <pmohamma@banff-pla-r27-05.pla.dcgpu>

* Only build gfx941

* demo

* fine tune malloc

* Fix merge errors

* Fix merge errors

* Disable parallel build

* Adopt --amdgpu-kernarg-preload-count

* Revert "Adding a script that will download/compile/run TransferBench/RCCL/UCX/RCCL-tests/RCCL-Unittests/hip-mpi-testsuite (#895)"

This reverts commit f5e252dddf02a41b4d1bc512f306f45f97166304.

* Revert CMake changes.

* NPKIT changes.

* Remove some license declarations.

* Address code review feedbacks on msccl_kernel_impl.h

* Update CMakeLists.txt

* Add CMake logic to check the existence of --amdgpu-kernarg-preload-count

* Fix NPKIT trace logic.

---------

Co-authored-by: Pedram Alizadeh <pmohamma@amd.com>
Co-authored-by: Pedram Alizadeh <pmohamma@banff-pla-r27-05.pla.dcgpu>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
2023-10-12 20:17:08 -05:00
Bertan Dogancay a6ff4618c7 Revert "Remove 2H4P condition from P2P channels adjustment (#890)" (#904)
This reverts commit 16dd05a58a.
2023-10-04 09:46:11 -06:00
akolliasAMD 28d7fe5629 Dma buf support optin (#905)
* dmaBufSupport Optin added on every part of the code that should invoke it
2023-10-03 03:17:48 -06:00