Wykres commitów

36 Commity

Autor SHA1 Wiadomość Data
Anatolii Rozanov 5577feb70d Add host API for alltoallmem_on_stream collective operation (#333)
* Add host-side rocshmem_alltoallmem_on_stream function

Function signature:
  rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void *dest,
                                 const void *source, size_t size,
                                 hipStream_t stream)

- The function launches rocshmem_alltoallmem_kernel which calls
device-side alltoall<char> workgroup collective through default context.
- Uses dynamic block size determination via occupancy API.
- Implemented for all backends.

* Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends

When allocating memory for alltoall_pSync_pool in setup_teams() and
teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE
instead of ROCSHMEM_ALLTOALL_SYNC_SIZE.

* Add functional test for team_alltoallmem_on_stream

This commit adds a new functional test to verify the correctness of
the host-side rocshmem_team_alltoallmem_on_stream API.

* Add documentation for rocshmem_alltoallmem_on_stream

This commit adds API documentation for the host-side
rocshmem_alltoallmem_on_stream function in the collective routines
section. The documentation includes:
2025-12-03 08:40:24 -05:00
Yiltan 1347d5d628 [GDA] Alltoall optimization - single warp (#319)
* Remove testing of data types
As the collective is templated, we are just testing if sizeof(T) works

* Added single threaded varients

* Applied thread puts optimization to barrier

* Apply single threaded optimization to alltoall

* This optimization only works on bnxt, so place a switch to protect it

* Handle the edge case where the thread count is smaller than the number of PEs
2025-11-19 14:25:29 -05:00
Aurelien Bouteiller 8c175315f2 Add backend type query method, use it to disable 32bit amo testers on gda (#307)
* Add backend type query method, use it to disable 32bit amo testers on
gda

* The infrateam testers work
2025-11-05 10:24:07 -05:00
Aurelien Bouteiller 054bc33dc4 Tests/syncall (#291)
* SyncAll test case would run Sync

* Despecialized name for argument reader

* Rename sync-test to team-sync-test as it uses teams

* Another stab at probing NUM_GPUS
2025-10-23 13:40:41 -04:00
Edgar Gabriel e4c427a736 Remove MPI compile-time dependency (#264)
* use dlsym for MPI functions

to allow compiling without MPI support, convert the usage of MPI functions and symbols to be based on a dlopen/dlsym based mechanism. Turns out this cannot be done entirely vendor neutral, slightly different solutions might be required for Open MPI, MPICH and the new MPI ABI.

* checkpoint

more work to be done.

* checkpoint 2

* checkpoint 3

* checkpoint 4

examples compile and link correctly

* checkpoitn 5 (I think)

* Checkpoitn 6

* dyld-mpi: adapt GDA

* dyldmpi: tests that depend on MPI need to link with it themselves

* do not ../mpi_instance.h

* dyldmpi: make the symetricHeapTestFixture compile

* dyldmpi: Change cmakery, compiles and run gda w/o external MPI

* Make it also compile in external MPI mode

* dyldmpi: ipc unit tests compile but do not link

* dyldmpi: new approach, if external mpi required, link with mpi,
otherwise use ompi5 abi

* C-style comments in cmakelist..

* dyldmpi: examples: do not fail compiling if MPI not found at build time,
instead do not compile the MPI required examples

* more updates to CMake logic

* convert RO backend

and a few other cleanups

* update some unit tests

to work with the dlopen MPI environment correctly.

---------

Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>
2025-10-01 08:06:56 -05:00
Edgar Gabriel e95360961d Add extended team tests (#207)
Create teams in the functional test that are not a duplicate of the
ROCSHMEM_TEAM_WORLD. THis commit contains only infra-tests to make sure
that n_pes and my_pe on the new teams is indeed correct.
2025-08-01 08:50:14 -05:00
Avinash Kethineedi 526105d315 Implement rocshmem_ptr in IPC conduit (#197)
* Implement `rocshmem_ptr` in IPC conduit

* tests: add functional test for `rocshmem_ptr`
  - Add safety check for pointer access and condition check before printing results for `rocshmem_ptr` test
  - Use `rocshmem_put` to store `rocshmem_ptr` availability for data validation
2025-07-28 12:01:02 -05:00
Avinash Kethineedi 7a5c6f86d7 functional_tests: use size_t for size variable (#190)
Changed the data type of `size` to `size_t` in all functional tests to ensure
consistency with rocSHMEM APIs.
2025-07-03 13:26:54 -05:00
Edgar Gabriel 6ea5edc951 Introduce support for executing the IPC conduit without MPI (#153)
* relax MPI dependency from code

This commit (series) removes the strict dependency on MPI in code base.
rocSHMEM will still be compiled with MPI, but the goal is to make the
code work even if MPI_Init_thread has not been invoked, at least for
certain, well-defined scenarios. Hence, the goal is not remove any
mentioning of MPI from rocSHMEM, but to ensure correct execution of the
ipc conduit even if the library has been initialized using other means.

Details:
 - add non-MPI version of remote_heap and WindowInfo classes
 - host interfaces work on WindowInfoMPI, they will not work with the
   non-MPI code path. Since it is unclear whether we plan to support the
   host interfaces at all, this is probably not a major limitation.

* update symmetric_heap structures and backend

* first cut on initialization

and enabling non-MPI initialization of the IPCBackend

* add non-MPI hostInterface methods

at the moment, only barrier_all and sync_all are explicitely supported.

* add non-mpi version of ipc_policy

and a number of smaller fixes required in other files.
A small init/finalize test already passes now with the branch.

* add non-mpi team_split_strided code

* minor fixes for non-MPI use-case

* disable symmetric-heap-window-ionfo test

disable this test for now just to make the compilation pass. Will have
to rework it.

* make no-mpi great again

after rebasing on top of the MPI singleton changes.

* enable running functional tests with uuid init

to run the functional tests using rocshmem_init_attr and the uuid
mechanism requires
a) a PMIx installation on the system
b) setting the environment variable ROCSHMEM_TEST_UUID=1

* fix multi-team creation bug

fix a bug occuring when creating many teams, which was the result of
incorrectly applying two indices in our own implementation of Allreduce.

* make unit tests pass again

* reverse offload was impacted by code change

fix the RO conduit to cope wioth the non-MPI path introduced for the IPC
conduit.

* update to cmake logic to find pmix

* Update src/memory/window_info.hpp

Co-authored-by: Yiltan <ytemucin@amd.com>

* Update CMakeLists.txt

Co-authored-by: Yiltan <ytemucin@amd.com>

* document ROCSHMEM_UNIQUEID_NO_MPI

* rename env. variable to UNIQUEID_WITH_MPI

* update host.cpp to use USE_HDP_FLUSH macro

instead of the deprecated USE_COHERENT_HEAP.

* add note for running example with RO conduit

add a note clarifying that running init_attr_test from the example
directory requires setting an additional environment variable with the
RO conduit.

* Find PMIx in more cases, only apply pmix build options to the test that
needs it, if OMPI_COMM_WORLD_LOCA_RANK is not setenv, abort

---------

Co-authored-by: Yiltan <ytemucin@amd.com>
Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>
2025-06-21 13:23:11 -05:00
Yiltan c81722c339 Check RMA functional test data in GPU kernel (#91) 2025-04-28 16:06:05 -04:00
Avinash Kethineedi f6ef19f5a9 Add SPDX license identifiers and update copyright headers (#85)
* Update copyright information and add SPDX license identifier

* Update AUTHORS

* Remove `sos_tests`
2025-04-15 15:37:53 -05:00
Avinash Kethineedi dc61bca066 Update Barrier and Sync APIs (#73)
* Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context
 - Implemented `barrier` APIs for thread, wavefront, and workgroup scopes
 - Added support into both IPC and RO conduits
 - Added functional tests to cover all `barrier` APIs
 - Removed collective operations on default context

* Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits.
  - Implemented `sync` APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync` APIs

* update naming convention for context-based `barrier` APIs
2025-04-08 11:25:31 -05:00
Avinash Kethineedi c652f58cef Update Barrier_All and Sync_All APIs (#72)
* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
  - Added separate pSync buffers for each device context
  - Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts

* Update barrier_all functional tests for multi-context support

* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
  - Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
  - Added support in both IPC and RO conduits
  - Updated functional tests to cover all `barrier_all` APIs

* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
  - Implemented sync_all APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync_all` APIs
2025-04-02 11:58:55 -05:00
Avinash Kethineedi 867519e1d0 Implement default RO context (#64)
* Allocate default context buffers and initialize queue for management

- Allocated the status flag, g return, and atomic return buffers for
  the default context.
- Initialized `AtomicWFQueueProxy` instances to manage these buffers
  efficiently for concurrent access.

* Update `BlockHandle` with default context buffers

* Add default context flag and update buffer retrieval functions

- Added a flag to distinguish the default context from other contexts.
- Modified return buffer functionns and `get_status_flag` function to accommodate
  the default context

* Add default context primitive tests

-  get, put, get_nbi, put_nbi, g, and p APIs.
2025-03-25 18:51:54 -05:00
Edgar Gabriel bcbc42e78f add rocshmem_barrier() (#61)
* add team-barrier implementation

add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.

* add team_barrier_tests to functional tests
2025-03-24 11:23:03 -05:00
Avinash Kethineedi aa3121a967 Update RMA functional tests (#50)
* Update primitive tests for multi-workgroup support

* Update workgroup primitive tests for multi-workgroup support

* Update workfront primitive tests for multi-workgroup support

* Update team based primitive tests for multi-workgroup support

* Update RMA functional tests to capture timing after quiet call
   - Modified RMA functional tests to record the time after a `quiet` call in thread, wavefront, and workgroup RMA calls.

* Improve error handling and memory management
   - Replaced `cout` with `cerr` for improved error reporting.
   - Ensured all allocated memory is freed when `rocshmem_malloc` fails.

* Update start time in primitive tests and latency calculations
   - Modified primitive tests to capture the earliest start time.
   - Updated latency calculations in functional tests.

* Remove `GetSwarmTester`

* Update start time in team primitive tests

* Invoke quiet call from a single thread within a block on a rocshmem context
2025-03-18 14:39:57 -05:00
Avinash Kethineedi 57d60aa727 Add multi work-group support for collective functional tests (#45)
- Added multi-work group support for the All-to-all, Fcollect, Broadcast, Barrier and Sync collective functional tests
- Renamed All-to-all and Fcollect tests to TeamAlltoAll and TeamFcollect
2025-02-19 10:31:53 -06:00
avinashkethineedi c155636da4 Update bandwidth and latency calculations
- Refined bandwidth and latency calculations for improved accuracy
2025-02-17 06:18:46 +00:00
Yiltan Hassan Temucin 8d74c7b73e Validate signal after put signal operations 2025-02-06 08:17:22 -06:00
avinashkethineedi e40e6a63fa Updated default case of functional tests with empty test 2024-12-26 19:33:23 +00:00
Avinash Kethineedi c5902afe28 Merge pull request #19 from avinashkethineedi/teams_split_API 2024-12-23 20:42:09 +05:30
avinashkethineedi cb8b9094b4 Fix rocshmem_team_split_strided API 2024-12-21 18:16:42 +00:00
Yiltan Temucin 83a588ee2b Commented function that fails functional tests 2024-12-20 14:48:54 -06:00
Yiltan Temucin fa0858833e Remove comparisons of signed to unsigned values 2024-12-12 10:21:08 -06:00
avinashkethineedi d8ce066adc Merge branch PR #55 into naming_scheme 2024-12-04 21:46:38 +00:00
Brandon Potter fd8dbc7fb6 Use new naming scheme 2024-11-25 14:25:29 -06:00
Yiltan Temucin f710a301fe Added functional tests 2024-11-22 15:36:17 -06:00
avinashkethineedi 9a524046fe Remove active-set-based broadcast test from the functional tests suite 2024-10-29 16:18:46 +00:00
avinashkethineedi e9484bbb86 Remove active-set-based reduction test from the functional tests suite 2024-10-28 21:22:46 +00:00
Yiltan Temucin 9576ff6440 Cleaned up how we print the output 2024-10-28 13:37:33 -05:00
avinashkethineedi 18a1bdd0ac Use C++ iota function to reset buffers and use its values for verification
* Update functional test script to include new tests
2024-10-15 20:23:25 +00:00
avinashkethineedi b6d31ac7ef Add tilled version of puts and gets at wavefront level to the functional test suite
* Implemented tiled version of put*_wave and get*_wave functions
* Maintain single kernel that supports both tiled and untiled versions
* Disable IPC in the default RO build script
2024-09-07 16:06:36 -07:00
avinashkethineedi d226922733 Add tilled version of puts and gets at the workgroup level to the functional test suite 2024-09-07 15:58:14 -07:00
avinashkethineedi ff954237dd add functional tests for puts and gets at wavefront level
* These functional tests are simple puts and gets, where every wave will get/put the same amount of data
* Enabled workgroup level puts and gets tests
2024-09-05 14:52:48 -07:00
Edgar Gabriel 7e7bc4b0a9 silence warnings in functional testsuite
check the return code of hip functions in order to silence some
warnings.
2024-07-02 10:07:43 -07:00
Brandon Potter ea8f264a11 Transfer files from RAD repository 2024-07-01 09:57:08 -05:00