Update Barrier_All and Sync_All APIs (#72)

* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
  - Added separate pSync buffers for each device context
  - Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts

* Update barrier_all functional tests for multi-context support

* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
  - Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
  - Added support in both IPC and RO conduits
  - Updated functional tests to cover all `barrier_all` APIs

* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
  - Implemented sync_all APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync_all` APIs

[ROCm/rocshmem commit: c652f58cef]
Этот коммит содержится в:
Avinash Kethineedi
2025-04-02 11:58:55 -05:00
коммит произвёл GitHub
родитель 0cde5f53dc
Коммит 426bbf525b
22 изменённых файлов: 508 добавлений и 53 удалений
+37
Просмотреть файл
@@ -81,6 +81,16 @@ declare -A TEST_NUMBERS=(
["wgsignalfetch"]="56"
["wavesignalfetch"]="57"
["teambarrier"]="58"
["defaultctxget"]="59"
["defaultctxgetnbi"]="60"
["defaultctxput"]="61"
["defaultctxputnbi"]="62"
["defaultctxp"]="63"
["defaultctxg"]="64"
["wavebarrierall"]="65"
["wgbarrierall"]="66"
["wavesyncall"]="67"
["wgsyncall"]="68"
)
ExecTest() {
@@ -303,11 +313,38 @@ TestColl() {
# | Name | Ranks | Workgroups | Threads | Max Message Size #
##############################################################################
ExecTest "barrierall" 2 1 1
ExecTest "barrierall" 2 16 64
ExecTest "barrierall" 2 32 256
ExecTest "barrierall" 2 64 1024
ExecTest "wavebarrierall" 2 1 1
ExecTest "wavebarrierall" 2 16 64
ExecTest "wavebarrierall" 2 32 256
ExecTest "wavebarrierall" 2 64 1024
ExecTest "wgbarrierall" 2 1 1
ExecTest "wgbarrierall" 2 16 64
ExecTest "wgbarrierall" 2 32 256
ExecTest "wgbarrierall" 2 64 1024
ExecTest "teambarrier" 2 1 1
ExecTest "sync" 2 1 1
ExecTest "syncall" 2 1 1
ExecTest "syncall" 2 16 64
ExecTest "syncall" 2 32 256
ExecTest "syncall" 2 64 1024
ExecTest "wavesyncall" 2 1 1
ExecTest "wavesyncall" 2 16 64
ExecTest "wavesyncall" 2 32 256
ExecTest "wavesyncall" 2 64 1024
ExecTest "wgsyncall" 2 1 1
ExecTest "wgsyncall" 2 16 64
ExecTest "wgsyncall" 2 32 256
ExecTest "wgsyncall" 2 64 1024
ExecTest "alltoall" 2 1 1 512