* add support for compiling all backends
also include the logic to select backends either based on user requests
or through some heuristics
* checkpoint for compiling all backends
* final checkpoint
all tests seem to pass when compiling all three backends simultaneasly
and forcing to use any of the three Backends.
* update PR to new envvar system
* Add host APIs for querying device ctx and remote heap pointer
* Host API to query device pointer for ROCSHMEM_DEFAULT_CONTEXT,
this is needed to support dynamic module initialization via device kernel
library bitcode.
* Host API to query remote symmetric heap pointer that can be used in
custom device kernel for RMA operations.
* Added rocshmem_ptr implementation within the Host Context class
* Enables pointer retrieval functionality for symmetric data objects
* Copy IPC pointers to host memory in RO host context
---------
Co-authored-by: avinashkethineedi <avinash.kethineedi@amd.com>
* Remove unused forward_list
* Remove unused __read_clock function
* Replace wallClk code with hip function
* Remove unused unit test for ipc
* Remove slab heap
* Remove unused EBO spinlock
* Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context
- Implemented `barrier` APIs for thread, wavefront, and workgroup scopes
- Added support into both IPC and RO conduits
- Added functional tests to cover all `barrier` APIs
- Removed collective operations on default context
* Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits.
- Implemented `sync` APIs for thread, wavefront, and workgroup scopes
- Added support into both IPC and RO conduits
- Added functional tests to cover all `sync` APIs
* update naming convention for context-based `barrier` APIs
* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
- Added separate pSync buffers for each device context
- Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts
* Update barrier_all functional tests for multi-context support
* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
- Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
- Added support in both IPC and RO conduits
- Updated functional tests to cover all `barrier_all` APIs
* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
- Implemented sync_all APIs for thread, wavefront, and workgroup scopes
- Added support into both IPC and RO conduits
- Added functional tests to cover all `sync_all` APIs
* add team-barrier implementation
add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.
* add team_barrier_tests to functional tests