* add code for bootstrapping
the bootstrapping code has been extracted from the MSCCLPP library,
which in parts is based on the code from NVIDIA. The code has been
modified to match the specific requirements of the rocSHMEM library.
* add code to use the new uniqueId bootstrapping
* adjust init_attr example
extend the rocshmem_init_attr example to use two disjoint groups
of processe, in order to trigger the new code path.
* add env variable for bootstrap timeout
* Update examples/rocshmem_init_attr_test.cc
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
* Update src/rocshmem.cpp
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
---------
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
* Update the naming convention for collective APIs to ensure consistency across the interface.
* Move all collective API declarations to rocshmem_COLL.hpp
* The following APIs were updated as part of this change:
- `barrier`
- `barrier_all`
- `sync`
- `sync_all`
- `all_to_all`
- `broadcast`
- `fcollect`
- `all_reduce`
* Update header file generation code for collective APIs
* Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context
- Implemented `barrier` APIs for thread, wavefront, and workgroup scopes
- Added support into both IPC and RO conduits
- Added functional tests to cover all `barrier` APIs
- Removed collective operations on default context
* Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits.
- Implemented `sync` APIs for thread, wavefront, and workgroup scopes
- Added support into both IPC and RO conduits
- Added functional tests to cover all `sync` APIs
* update naming convention for context-based `barrier` APIs
* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
- Added separate pSync buffers for each device context
- Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts
* Update barrier_all functional tests for multi-context support
* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
- Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
- Added support in both IPC and RO conduits
- Updated functional tests to cover all `barrier_all` APIs
* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
- Implemented sync_all APIs for thread, wavefront, and workgroup scopes
- Added support into both IPC and RO conduits
- Added functional tests to cover all `sync_all` APIs
add the interfaces required to support rocshmem initialization
through the uniqueID mechanism. At the moment this still maps to
MPI initialization underneath the hood, but adding the functions might
simplify the porting of some applications to rocshmem. In addition, if
we need to transition away from MPI one day, this is also one step into
this direction.
* add team-barrier implementation
add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.
* add team_barrier_tests to functional tests