22 Melakukan

Penulis SHA1 Pesan Tanggal
Aurelien Bouteiller ede2adfe49 new tester: put to all pes from all lanes concurrently (#112)
* Add put to all pes from all lanes concurrently

* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)

* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly

* Add flood tester to the testing script

* add to gda test case w/o the _g variant that is not implemented.

[ROCm/rocshmem commit: cca7872bcf]
2026-01-16 10:40:48 -05:00
Anatolii Rozanov f98c72d627 Add host API for *_on_stream operations (#340)
* Add functional test for barrier_all_on_stream

* Add rocshmem_barrier_all_on_stream support for GDA and RO backends

Implements rocshmem_barrier_all_on_stream operation for
GPU Direct Access and Reverse Offload backends.

Previously, rocshmem_barrier_all_on_stream was only supported for IPC backend.

* Add functional test for rocshmem_broadcastmem_on_stream

* Add host-side rocshmem_broadcastmem_on_stream API

Implement stream-based broadcast collective operation

- Add rocshmem_broadcastmem_on_stream host API and kernel implementation
- Add functional test TeamBroadcastmemOnStreamTester with multi-stream
  support and correctness verification
- Use per-workgroup contexts to avoid contention across parallel streams

API:
rocshmem_broadcastmem_on_stream(team, dest, source, nelems, pe_root, stream)

* Add functional test for rocshmem_getmem_on_stream

* Add host-side rocshmem_getmem_on_stream API

Implement stream-based point-to-point RMA get operation

- Add rocshmem_getmem_on_stream host API and kernel implementation
- Support for asynchronous getmem operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group collective getmem for efficient memory transfer

API:
rocshmem_getmem_on_stream(dest, source, nelems, pe, stream)

(AI Assist)

* Add host-side rocshmem_putmem_on_stream API

- Add rocshmem_putmem_on_stream for asynchronous remote writes
- Support for concurrent RMA operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group device collective operation

API:
rocshmem_putmem_on_stream(dest, source, bytes, pe, stream)

(AI Assist)

* Add functional test for rocshmem_putmem_on_stream

* Add host-side rocshmem_putmem_signal_on_stream API

Enables asynchronous putmem operations with signaling on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_putmem_signal_kernel
- Host interface putmem_signal_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Public API

Function signature:
void rocshmem_putmem_signal_on_stream(void *dest, const void *source,
                                      size_t bytes, uint64_t *sig_addr,
                                      uint64_t signal, int sig_op,
                                      int pe, hipStream_t stream);

* Add functional test for rocshmem_putmem_signal_on_stream

* Add host-side rocshmem_signal_wait_until_on_stream API

Enables asynchronous signal wait operations on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_signal_wait_until_kernel
- Host interface signal_wait_until_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Native uint64_t support in wait_until API (generated from P2P_SYNC.py)

Function signature:
void rocshmem_signal_wait_until_on_stream(uint64_t *sig_addr, int cmp,
                                          uint64_t cmp_value,
                                          hipStream_t stream);

(AI Assist)

* Add functional test for rocshmem_signal_wait_until_on_stream

* Add documentation for stream API functions

This commit adds API documentation for the following host-side
stream functions:

- rocshmem_barrier_all_on_stream (collective routines)
- rocshmem_broadcastmem_on_stream (collective routines)
- rocshmem_getmem_on_stream (RMA operations)
- rocshmem_putmem_on_stream (RMA operations)
- rocshmem_putmem_signal_on_stream (signaling operations)
- rocshmem_signal_wait_until_on_stream (point-to-point sync)

The documentation includes function signatures, parameter descriptions,
and detailed explanations of asynchronous behavior and stream handling.

(AI Assist)

* Rename "bytes" -> "nelems"

* Add "_TEST_" to the variables used in tests

* Remove incorrect hipStreamDefault usage

hipStreamDefault is not a default stream. This is a flag.

If stream == nullptr, then just pass it to kernel. It will launch the kernel on the default stream

[ROCm/rocshmem commit: d0c8380650]
2025-12-09 08:55:46 -06:00
Anatolii Rozanov 4b04b540bf Add host API for alltoallmem_on_stream collective operation (#333)
* Add host-side rocshmem_alltoallmem_on_stream function

Function signature:
  rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void *dest,
                                 const void *source, size_t size,
                                 hipStream_t stream)

- The function launches rocshmem_alltoallmem_kernel which calls
device-side alltoall<char> workgroup collective through default context.
- Uses dynamic block size determination via occupancy API.
- Implemented for all backends.

* Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends

When allocating memory for alltoall_pSync_pool in setup_teams() and
teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE
instead of ROCSHMEM_ALLTOALL_SYNC_SIZE.

* Add functional test for team_alltoallmem_on_stream

This commit adds a new functional test to verify the correctness of
the host-side rocshmem_team_alltoallmem_on_stream API.

* Add documentation for rocshmem_alltoallmem_on_stream

This commit adds API documentation for the host-side
rocshmem_alltoallmem_on_stream function in the collective routines
section. The documentation includes:

[ROCm/rocshmem commit: 5577feb70d]
2025-12-03 08:40:24 -05:00
Allen Hubbe fa7841f0d4 functional_tests: n, nskip, nloop, nlarge options (#297)
To make the functional tests more useful for benchmarking, allow user to
specify the number of loops and related parameters via command options.

Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>

[ROCm/rocshmem commit: ed91c8cce2]
2025-10-30 11:54:49 -04:00
Aurelien Bouteiller bdb30e2984 Tests/syncall (#291)
* SyncAll test case would run Sync

* Despecialized name for argument reader

* Rename sync-test to team-sync-test as it uses teams

* Another stab at probing NUM_GPUS

[ROCm/rocshmem commit: 054bc33dc4]
2025-10-23 13:40:41 -04:00
Avinash Kethineedi e31b4d42e5 Update atomic functional tests (#262)
* feat: implement function to return number of blocks in grid.

* test: update atomics functional tests
  - Standard atomic tests: `atomic_add`, `atomic_inc`, `fetch_atomic_add`, `fetch_atomic_inc`, and `fetch_compare_and_swap`
  - Bitwise atomic tests:    `atomic_and`, `atomic_or`, `atomic_xor`, fetch_atomic_and`, `fetch_atomic_or`, and `fetch_atomic_xor`
  - Extended atomic tests: `atomic_fetch`, `atomic_set`, and `atomic_swap`

* Added two different address modes for atomics.
* Added all supported data types for atomics tests.


[ROCm/rocshmem commit: 0a4f8a83b9]
2025-10-06 10:50:50 -05:00
Yiltan 4f955324ac Fix g/p tests (#266)
[ROCm/rocshmem commit: 6bb46887e8]
2025-09-29 14:27:25 -04:00
Edgar Gabriel 56eb68bc4a Add extended team tests (#207)
Create teams in the functional test that are not a duplicate of the
ROCSHMEM_TEAM_WORLD. THis commit contains only infra-tests to make sure
that n_pes and my_pe on the new teams is indeed correct.

[ROCm/rocshmem commit: e95360961d]
2025-08-01 08:50:14 -05:00
Avinash Kethineedi 2a7416d016 Implement rocshmem_ptr in IPC conduit (#197)
* Implement `rocshmem_ptr` in IPC conduit

* tests: add functional test for `rocshmem_ptr`
  - Add safety check for pointer access and condition check before printing results for `rocshmem_ptr` test
  - Use `rocshmem_put` to store `rocshmem_ptr` availability for data validation

[ROCm/rocshmem commit: 526105d315]
2025-07-28 12:01:02 -05:00
Avinash Kethineedi c4de6833f6 Add SPDX license identifiers and update copyright headers (#85)
* Update copyright information and add SPDX license identifier

* Update AUTHORS

* Remove `sos_tests`

[ROCm/rocshmem commit: f6ef19f5a9]
2025-04-15 15:37:53 -05:00
Avinash Kethineedi 9bd2b04899 Update Barrier and Sync APIs (#73)
* Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context
 - Implemented `barrier` APIs for thread, wavefront, and workgroup scopes
 - Added support into both IPC and RO conduits
 - Added functional tests to cover all `barrier` APIs
 - Removed collective operations on default context

* Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits.
  - Implemented `sync` APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync` APIs

* update naming convention for context-based `barrier` APIs

[ROCm/rocshmem commit: dc61bca066]
2025-04-08 11:25:31 -05:00
Avinash Kethineedi 426bbf525b Update Barrier_All and Sync_All APIs (#72)
* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
  - Added separate pSync buffers for each device context
  - Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts

* Update barrier_all functional tests for multi-context support

* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
  - Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
  - Added support in both IPC and RO conduits
  - Updated functional tests to cover all `barrier_all` APIs

* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
  - Implemented sync_all APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync_all` APIs

[ROCm/rocshmem commit: c652f58cef]
2025-04-02 11:58:55 -05:00
Edgar Gabriel 1ee9b72449 add rocshmem_barrier() (#61)
* add team-barrier implementation

add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.

* add team_barrier_tests to functional tests

[ROCm/rocshmem commit: bcbc42e78f]
2025-03-24 11:23:03 -05:00
Avinash Kethineedi e16bb62767 Update RMA functional tests (#50)
* Update primitive tests for multi-workgroup support

* Update workgroup primitive tests for multi-workgroup support

* Update workfront primitive tests for multi-workgroup support

* Update team based primitive tests for multi-workgroup support

* Update RMA functional tests to capture timing after quiet call
   - Modified RMA functional tests to record the time after a `quiet` call in thread, wavefront, and workgroup RMA calls.

* Improve error handling and memory management
   - Replaced `cout` with `cerr` for improved error reporting.
   - Ensured all allocated memory is freed when `rocshmem_malloc` fails.

* Update start time in primitive tests and latency calculations
   - Modified primitive tests to capture the earliest start time.
   - Updated latency calculations in functional tests.

* Remove `GetSwarmTester`

* Update start time in team primitive tests

* Invoke quiet call from a single thread within a block on a rocshmem context

[ROCm/rocshmem commit: aa3121a967]
2025-03-18 14:39:57 -05:00
Avinash Kethineedi 65b4ff4c41 Add multi work-group support for collective functional tests (#45)
- Added multi-work group support for the All-to-all, Fcollect, Broadcast, Barrier and Sync collective functional tests
- Renamed All-to-all and Fcollect tests to TeamAlltoAll and TeamFcollect

[ROCm/rocshmem commit: 57d60aa727]
2025-02-19 10:31:53 -06:00
Brandon Potter 913ce47ef1 Use new naming scheme
[ROCm/rocshmem commit: fd8dbc7fb6]
2024-11-25 14:25:29 -06:00
avinashkethineedi daae6f4d60 Remove active-set-based broadcast test from the functional tests suite
[ROCm/rocshmem commit: 9a524046fe]
2024-10-29 16:18:46 +00:00
avinashkethineedi 5869709dac Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays
* Change log_stride calcualtions to stride calculations
* Update all_reduce example code to use team based interface


[ROCm/rocshmem commit: abec29bd6a]
2024-10-28 22:10:18 +00:00
avinashkethineedi 10eb11c1d5 Use C++ iota function to reset buffers and use its values for verification
* Update functional test script to include new tests


[ROCm/rocshmem commit: 18a1bdd0ac]
2024-10-15 20:23:25 +00:00
avinashkethineedi 9532e084fc Add tilled version of puts and gets at wavefront level to the functional test suite
* Implemented tiled version of put*_wave and get*_wave functions
* Maintain single kernel that supports both tiled and untiled versions
* Disable IPC in the default RO build script


[ROCm/rocshmem commit: b6d31ac7ef]
2024-09-07 16:06:36 -07:00
avinashkethineedi 3d26792831 Add tilled version of puts and gets at the workgroup level to the functional test suite
[ROCm/rocshmem commit: d226922733]
2024-09-07 15:58:14 -07:00
Brandon Potter ad4ab69c19 Transfer files from RAD repository
[ROCm/rocshmem commit: ea8f264a11]
2024-07-01 09:57:08 -05:00