* Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context
- Implemented `barrier` APIs for thread, wavefront, and workgroup scopes
- Added support into both IPC and RO conduits
- Added functional tests to cover all `barrier` APIs
- Removed collective operations on default context
* Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits.
- Implemented `sync` APIs for thread, wavefront, and workgroup scopes
- Added support into both IPC and RO conduits
- Added functional tests to cover all `sync` APIs
* update naming convention for context-based `barrier` APIs
[ROCm/rocshmem commit: dc61bca066]
* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
- Added separate pSync buffers for each device context
- Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts
* Update barrier_all functional tests for multi-context support
* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
- Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
- Added support in both IPC and RO conduits
- Updated functional tests to cover all `barrier_all` APIs
* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
- Implemented sync_all APIs for thread, wavefront, and workgroup scopes
- Added support into both IPC and RO conduits
- Added functional tests to cover all `sync_all` APIs
[ROCm/rocshmem commit: c652f58cef]
* README: update documentation for RO support
update the README and the install_dependencies script to match the
requirements of the RO conduit.
* add CODEOWNERS file
[ROCm/rocshmem commit: 4e48c9748e]
* Update HIP version check for compatibility with versions >= 5.5
* Update memory allocator for context BlockHandle
- Replaced `HIPAllocator` with `HIPDefaultFinegrainedAllocator` for context `BlockHandle`.
* Update run commands for `rocshmem_g` and `rocshmem_p` functional tests
[ROCm/rocshmem commit: c16b0d6952]
* add team-barrier implementation
add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.
* add team_barrier_tests to functional tests
[ROCm/rocshmem commit: bcbc42e78f]
* Sync Reverse Offload scripts
- Disable IPC unit tests when IPC is not available in the rocSHMEM configuration
* Added missing ptr in ipc_policy
[ROCm/rocshmem commit: 3428957de9]
* Update primitive tests for multi-workgroup support
* Update workgroup primitive tests for multi-workgroup support
* Update workfront primitive tests for multi-workgroup support
* Update team based primitive tests for multi-workgroup support
* Update RMA functional tests to capture timing after quiet call
- Modified RMA functional tests to record the time after a `quiet` call in thread, wavefront, and workgroup RMA calls.
* Improve error handling and memory management
- Replaced `cout` with `cerr` for improved error reporting.
- Ensured all allocated memory is freed when `rocshmem_malloc` fails.
* Update start time in primitive tests and latency calculations
- Modified primitive tests to capture the earliest start time.
- Updated latency calculations in functional tests.
* Remove `GetSwarmTester`
* Update start time in team primitive tests
* Invoke quiet call from a single thread within a block on a rocshmem context
[ROCm/rocshmem commit: aa3121a967]
- Remove hdp and ipc pointers from BlockHandle, align RO stats with RO contexts
- Add run commands for `rocshmem_g` and `rocshmem_p` API tests in driver.sh
- Allocate rocshmem API return buffers based on number of device contexts.
- Associate status flag address with blocking calls and remove threadId dependency
- Associated the status flag address with each blocking call request to notify the GPU thread.
- Removed dependency on threadId for determining the appropriate status flag index.
- Move status flag buffer allocation to backend.
- Initialize allocated memeory to zero
[ROCm/rocshmem commit: df4ad2c04d]
* Rearrange CMakefile
* Enable linking to external rocshmem library
* Minor fix for the functional test driver
* ROCSHMEM_HOME detection fixed
[ROCm/rocshmem commit: 96424a59a8]
* Update install_dependencies.sh
* Updated to ROCm repos
* Merge pull request #37 from ROCm/depBuild
locked specific version on ompi and ucx
* locked specific version on ompi and ucx
* [IPC] Fix ROCSHMEM_SIGNAL_ADD
* Generate CMake Package Configuration Files
---------
Co-authored-by: akolliasAMD <99202231+akolliasAMD@users.noreply.github.com>
Co-authored-by: akolliasAMD <akollias@amd.com>
[ROCm/rocshmem commit: 785e31aa48]
* Implemented tiled version of put*_wave and get*_wave functions
* Maintain single kernel that supports both tiled and untiled versions
* Disable IPC in the default RO build script
[ROCm/rocshmem commit: b6d31ac7ef]
* These functional tests are simple puts and gets, where every wave will get/put the same amount of data
* Enabled workgroup level puts and gets tests
[ROCm/rocshmem commit: ff954237dd]
* add backend_ipc.{cpp & hpp}
* rename context_ipc.{cpp & hpp} to context_ipc_device.{cpp & hpp}
* add host interface to IPC backend
* add context_ipc_host.{cpp & hpp} to support host interface
* add USE_RO compile flag to enable support for single backend interface at a time
* add ipc_single script to build rocSHMEM with IPC backend
[ROCm/rocshmem commit: 49779863c2]