- use the reduce_psync buffers for synchronization in allreduce, not the
barrier_psync.
- execute a wwg barrier after the allreduce operation. After internal
discussion it was determined that it is required for correctness.
[ROCm/rocshmem commit: 6f512e92a5]
* Refactor `Barrier_all` and `Sync_all` to use default context
- Removed context-specific implementations of barrier_all and sync_all
- Added barrier_all and sync_all to the default context implementation
- Updated functional tests to use the default context for barrier_all and sync_all
* Update `Barrier_all` and `Sync_all` API usage in documentation
* Update `CHANGELOG`
---------
Co-authored-by: Yiltan <ytemucin@amd.com>
[ROCm/rocshmem commit: bf48bcabf2]
* Update the naming convention for collective APIs to ensure consistency across the interface.
* Move all collective API declarations to rocshmem_COLL.hpp
* The following APIs were updated as part of this change:
- `barrier`
- `barrier_all`
- `sync`
- `sync_all`
- `all_to_all`
- `broadcast`
- `fcollect`
- `all_reduce`
* Update header file generation code for collective APIs
[ROCm/rocshmem commit: 68421895d6]