* Allocate default context buffers and initialize queue for management
- Allocated the status flag, g return, and atomic return buffers for
the default context.
- Initialized `AtomicWFQueueProxy` instances to manage these buffers
efficiently for concurrent access.
* Update `BlockHandle` with default context buffers
* Add default context flag and update buffer retrieval functions
- Added a flag to distinguish the default context from other contexts.
- Modified return buffer functionns and `get_status_flag` function to accommodate
the default context
* Add default context primitive tests
- get, put, get_nbi, put_nbi, g, and p APIs.
[ROCm/rocshmem commit: 867519e1d0]
* README: update documentation for RO support
update the README and the install_dependencies script to match the
requirements of the RO conduit.
* add CODEOWNERS file
[ROCm/rocshmem commit: 4e48c9748e]
* feat: Add AtomicWFQueue implementation
- Implemented wavefront-safe atomic FIFO queue ensuring first-come, first-serve order
- Added efficient synchronization using atomics
- Enhanced `dequeue` to wait until an element is available
* test: Add GTest for AtomicWFQueue
- Implemented unit tests for AtomicWFQueue using GoogleTest framework
- Added tests for `enqueue`, `dequeue`, and edge cases
- Ensured synchronization behavior and correctness under concurrent conditions
* Add assert in `enqueue` and update atomics
- Added an assert in the `enqueue` function to ensure it fails if the queue is full
[ROCm/rocshmem commit: b84b5638cf]
* Update HIP version check for compatibility with versions >= 5.5
* Update memory allocator for context BlockHandle
- Replaced `HIPAllocator` with `HIPDefaultFinegrainedAllocator` for context `BlockHandle`.
* Update run commands for `rocshmem_g` and `rocshmem_p` functional tests
[ROCm/rocshmem commit: c16b0d6952]
* add team-barrier implementation
add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.
* add team_barrier_tests to functional tests
[ROCm/rocshmem commit: bcbc42e78f]
* RO/collectives: add linear algorithms using RPut/Rget
- make broadcast, alltoall and fcollect use a simple linear algorithm
using MPI_RPut/Rget, but without blocking in the execution
- remove the to_all interfaces, since they have been deprecated.
- remove the active-set interfaces, since they have been removed from
rocSHMEM
* avoid notification after barrier
Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>
* disable allocation of ata_buffer
a temporary buffer of 128MB was allocated when creating a team. In
previous versions of the code, that buffer was used by some collective
operations. This is not the case for now. Therefore, do not allocate the
buffer for now. I am not removing the element itself from teh
structure, since we might need it in future versions again.
---------
Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>
[ROCm/rocshmem commit: 908bd5bda3]
* Sync Reverse Offload scripts
- Disable IPC unit tests when IPC is not available in the rocSHMEM configuration
* Added missing ptr in ipc_policy
[ROCm/rocshmem commit: 3428957de9]
* Update primitive tests for multi-workgroup support
* Update workgroup primitive tests for multi-workgroup support
* Update workfront primitive tests for multi-workgroup support
* Update team based primitive tests for multi-workgroup support
* Update RMA functional tests to capture timing after quiet call
- Modified RMA functional tests to record the time after a `quiet` call in thread, wavefront, and workgroup RMA calls.
* Improve error handling and memory management
- Replaced `cout` with `cerr` for improved error reporting.
- Ensured all allocated memory is freed when `rocshmem_malloc` fails.
* Update start time in primitive tests and latency calculations
- Modified primitive tests to capture the earliest start time.
- Updated latency calculations in functional tests.
* Remove `GetSwarmTester`
* Update start time in team primitive tests
* Invoke quiet call from a single thread within a block on a rocshmem context
[ROCm/rocshmem commit: aa3121a967]
- Remove hdp and ipc pointers from BlockHandle, align RO stats with RO contexts
- Add run commands for `rocshmem_g` and `rocshmem_p` API tests in driver.sh
- Allocate rocshmem API return buffers based on number of device contexts.
- Associate status flag address with blocking calls and remove threadId dependency
- Associated the status flag address with each blocking call request to notify the GPU thread.
- Removed dependency on threadId for determining the appropriate status flag index.
- Move status flag buffer allocation to backend.
- Initialize allocated memeory to zero
[ROCm/rocshmem commit: df4ad2c04d]
* Rearrange CMakefile
* Enable linking to external rocshmem library
* Minor fix for the functional test driver
* ROCSHMEM_HOME detection fixed
[ROCm/rocshmem commit: 96424a59a8]
* Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations
- Modified the Device proxy class to determine memory allocation size at runtime.
- Updated all classes that include the Device proxy to use dynamic memory allocation.
- Removed compile-time memory size calculations.
- Ensured the allocated number of backend queue data structures matches the number of RO device contexts.
[ROCm/rocshmem commit: eb5a38e806]
- Added multi-work group support for the All-to-all, Fcollect, Broadcast, Barrier and Sync collective functional tests
- Renamed All-to-all and Fcollect tests to TeamAlltoAll and TeamFcollect
[ROCm/rocshmem commit: 57d60aa727]
* Update install_dependencies.sh
* Updated to ROCm repos
* Merge pull request #37 from ROCm/depBuild
locked specific version on ompi and ucx
* locked specific version on ompi and ucx
* [IPC] Fix ROCSHMEM_SIGNAL_ADD
* Generate CMake Package Configuration Files
---------
Co-authored-by: akolliasAMD <99202231+akolliasAMD@users.noreply.github.com>
Co-authored-by: akolliasAMD <akollias@amd.com>
[ROCm/rocshmem commit: 785e31aa48]