Граф коммитов

133 Коммитов

Автор SHA1 Сообщение Дата
Aurelien Bouteiller 87179b1ffd Remove unused parts of dlmalloc to improve coverity score (#106) 2025-05-07 13:05:04 -04:00
Aurelien Bouteiller b835de6cd5 Substitute pow2bin allocator with a dlmalloc based allocator (#71)
* Add dlmalloc_strat allocator strategy
 - Use mspace variant to ease encapsulation
 - Make pow2bins and dlmalloc cmake selectable
* Add unit tester for dlmalloc, rework single_heap, pow2bins unit testers
accordingly
 - add dlmalloc get_used/get_avail, and have all strats allocators also have a get_used
 - Rework memallocator unit tests: bin size is per strat, alignment is verified in singleheap
* bugfix: dlmalloc exposed that the pingpong test would write past end of
allocation with -w 32
* iostream leakage/mixed usage of cerr and fprintf(stderr

---------

Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
2025-05-01 11:55:23 -04:00
Edgar Gabriel db74307195 unify env variables and use DPRINTF (#89)
* unify handling of env variables

create a class containing all (most?) environment variables used by rocshmem and an object that is instatiated
before library_init, since some of the environment variables need to be
set before we start the bootstraping process.

This allows us to remove two files from the bootstrap directory.

* replace INFO and TRACE macros with DPRINTF

to be more consistent with the rest of the rocSHMEM code
2025-04-29 06:05:25 -05:00
Edgar Gabriel e3b0353fa9 use correct id when accessing ipc-bases (#88)
we need to use the position of that processes in the local ipc-bases
array, not the global rank.
2025-04-17 10:11:32 -05:00
Aurelien Bouteiller 9befbe8293 bugfix: do not dereference ctx during create_ctx if we did run out (#83) 2025-04-16 10:37:44 -04:00
Avinash Kethineedi f6ef19f5a9 Add SPDX license identifiers and update copyright headers (#85)
* Update copyright information and add SPDX license identifier

* Update AUTHORS

* Remove `sos_tests`
2025-04-15 15:37:53 -05:00
Aurelien Bouteiller a1a0560ca3 Remove dev-mono-linear (#81)
* Remove dev_mono_linear (followup to removal of slab_heap)

* cleanup: use CHECK_HIP rather than ad-hoc error checking

---------

Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
2025-04-15 09:57:59 -04:00
Edgar Gabriel b5830a623b Revamp the uniqueId code to support subgroups of processes (#80)
* add code for bootstrapping

the bootstrapping code has been extracted from the MSCCLPP library,
which in parts is based on the code from NVIDIA. The code has been
modified to match the specific requirements of the rocSHMEM library.

* add code to use the new uniqueId bootstrapping

* adjust init_attr example

extend the rocshmem_init_attr example to use two disjoint groups
of processe, in order to trigger the new code path.

* add env variable for bootstrap timeout

* Update examples/rocshmem_init_attr_test.cc

Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>

* Update src/rocshmem.cpp

Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>

---------

Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
2025-04-14 12:02:09 -05:00
Avinash Kethineedi 05755847f5 Update backend to use provided MPI communicator during library initialization (#79)
* Update backend to use provided MPI communicator during library initialization, default to `MPI_COMM_WORLD`

* Update `rocshmem_my_pe` and `rocshmem_n_pes` host APIs
   - Return values from backend if initialized; otherwise, fallback to MPI_Singleton.
2025-04-14 09:18:57 -05:00
Brandon Potter 0fd628458c Cleanup unused code in repository (#75)
* Remove unused forward_list

* Remove unused __read_clock function

* Replace wallClk code with hip function

* Remove unused unit test for ipc

* Remove slab heap

* Remove unused EBO spinlock
2025-04-10 14:47:24 -05:00
Avinash Kethineedi 68421895d6 Update collective APIs naming (#77)
* Update the naming convention for collective APIs to ensure consistency across the interface.

* Move all collective API declarations to rocshmem_COLL.hpp

* The following APIs were updated as part of this change:
  - `barrier`
  - `barrier_all`
  - `sync`
  - `sync_all`
  - `all_to_all`
  - `broadcast`
  - `fcollect`
  - `all_reduce`

* Update header file generation code for collective APIs
2025-04-10 12:14:47 -05:00
Avinash Kethineedi dc61bca066 Update Barrier and Sync APIs (#73)
* Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context
 - Implemented `barrier` APIs for thread, wavefront, and workgroup scopes
 - Added support into both IPC and RO conduits
 - Added functional tests to cover all `barrier` APIs
 - Removed collective operations on default context

* Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits.
  - Implemented `sync` APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync` APIs

* update naming convention for context-based `barrier` APIs
2025-04-08 11:25:31 -05:00
Avinash Kethineedi c652f58cef Update Barrier_All and Sync_All APIs (#72)
* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
  - Added separate pSync buffers for each device context
  - Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts

* Update barrier_all functional tests for multi-context support

* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
  - Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
  - Added support in both IPC and RO conduits
  - Updated functional tests to cover all `barrier_all` APIs

* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
  - Implemented sync_all APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync_all` APIs
2025-04-02 11:58:55 -05:00
Edgar Gabriel e9f6227d75 add uniqueID initialization (#69)
add the interfaces required to support rocshmem initialization
through the uniqueID mechanism. At the moment this still maps to
MPI initialization underneath the hood, but adding the functions might
simplify the porting of some applications to rocshmem. In addition, if
we need to transition away from MPI one day, this is also one step into
this direction.
2025-03-28 16:34:00 -05:00
Edgar Gabriel 12561783de Performance tuning for inter-node communication (#66)
This PR addresses two issues:
 - reduce the number of contexts supported by the host-interface by
   default to 1, we are not using those at the moment, and hence
   we now create fewer MPI_Win at the startup
 - introduces a micro-sleep in RO progress engine in case there are no
   pending requests. This leads significant performance improvements
   observed for inter-node communication with THor2 NICs.
2025-03-26 21:09:26 -05:00
Avinash Kethineedi 867519e1d0 Implement default RO context (#64)
* Allocate default context buffers and initialize queue for management

- Allocated the status flag, g return, and atomic return buffers for
  the default context.
- Initialized `AtomicWFQueueProxy` instances to manage these buffers
  efficiently for concurrent access.

* Update `BlockHandle` with default context buffers

* Add default context flag and update buffer retrieval functions

- Added a flag to distinguish the default context from other contexts.
- Modified return buffer functionns and `get_status_flag` function to accommodate
  the default context

* Add default context primitive tests

-  get, put, get_nbi, put_nbi, g, and p APIs.
2025-03-25 18:51:54 -05:00
Avinash Kethineedi b84b5638cf Add AtomicWFQueue implementation and tests (#62)
* feat: Add AtomicWFQueue implementation
  - Implemented wavefront-safe atomic FIFO queue ensuring first-come, first-serve order
  - Added efficient synchronization using atomics
  - Enhanced `dequeue` to wait until an element is available

* test: Add GTest for AtomicWFQueue
  - Implemented unit tests for AtomicWFQueue using GoogleTest framework
  - Added tests for `enqueue`, `dequeue`, and edge cases
  - Ensured synchronization behavior and correctness under concurrent conditions

* Add assert in `enqueue` and update atomics
  - Added an assert in the `enqueue` function to ensure it fails if the queue is full
2025-03-25 00:45:19 -05:00
Avinash Kethineedi c16b0d6952 Fix/RO Backend Hang Issue (#53)
* Update HIP version check for compatibility with versions >= 5.5

* Update memory allocator for context BlockHandle
   - Replaced `HIPAllocator` with `HIPDefaultFinegrainedAllocator` for context `BlockHandle`.

* Update run commands for `rocshmem_g` and `rocshmem_p` functional tests
2025-03-24 22:54:07 -05:00
Edgar Gabriel bcbc42e78f add rocshmem_barrier() (#61)
* add team-barrier implementation

add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.

* add team_barrier_tests to functional tests
2025-03-24 11:23:03 -05:00
Yiltan 658bf2a3b5 Removed GPU_IB (#59) 2025-03-24 09:04:52 -04:00
Avinash Kethineedi 1210b6419f Remove support code for GFX940 and GFX941 targets (#55) 2025-03-21 14:31:49 -05:00
Edgar Gabriel 908bd5bda3 RO/collectives: add linear algorithms using RPut/Rget (#58)
* RO/collectives: add linear algorithms using RPut/Rget

- make broadcast, alltoall and fcollect use a simple linear algorithm
  using MPI_RPut/Rget, but without blocking in the execution
- remove the to_all interfaces, since they have been deprecated.
- remove the active-set interfaces, since they have been removed from
  rocSHMEM

* avoid notification after barrier

Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>

* disable allocation of ata_buffer

a temporary buffer of 128MB was allocated when creating a team. In
previous versions of the code, that buffer was used by some collective
operations. This is not the case for now. Therefore, do not allocate the
buffer for now. I am not removing the element itself from teh
structure, since we might need it in future versions again.

---------

Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>
2025-03-21 12:49:39 -05:00
Yiltan 3428957de9 Sync Reverse Offload Scripts (#52)
* Sync Reverse Offload scripts
- Disable IPC unit tests when IPC is not available in the rocSHMEM configuration

* Added missing ptr in ipc_policy
2025-03-19 14:31:07 -04:00
Yiltan b7f3839f27 Updated IPC detection logic (#51)
* Added environment variable to enable/disable IPC at runtime

* Fixed IPC detection logic allow for difference process mappings

* Updated README.md
2025-03-17 11:36:11 -04:00
Avinash Kethineedi df4ad2c04d Refactor RO backend data structures (#49)
- Remove hdp and ipc pointers from BlockHandle, align RO stats with RO contexts

- Add run commands for `rocshmem_g` and `rocshmem_p` API tests in driver.sh

- Allocate rocshmem API return buffers based on number of device contexts.

- Associate status flag address with blocking calls and remove threadId dependency
   - Associated the status flag address with each blocking call request to notify the GPU thread.
   - Removed dependency on threadId for determining the appropriate status flag index.

- Move status flag buffer allocation to backend.

- Initialize allocated memeory to zero
2025-03-14 10:49:44 -05:00
Avinash Kethineedi eb5a38e806 Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations (#48)
* Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations

- Modified the Device proxy class to determine memory allocation size at runtime.
- Updated all classes that include the Device proxy to use dynamic memory allocation.
- Removed compile-time memory size calculations.
- Ensured the allocated number of backend queue data structures matches the number of RO device contexts.
2025-02-24 15:11:46 -06:00
Yiltan 487e5b7d0f Fix ROCm 6.4 warnings (#47)
* Removed __AMDGCN_WAVEFRONT_SIZE

* Added unit test to validate WF_SIZE
2025-02-24 13:34:13 -05:00
avinashkethineedi 21dbd5cc5e Remove rocshmem_timer function 2025-02-17 17:10:51 +00:00
avinashkethineedi 540cd4b918 RO Backend: Add support for char, signed char and unsigned char 2025-02-12 20:10:03 +00:00
Yiltan 495cd6970b Merge pull request #38 from Yiltan/ro/implement-sigops
Implements Signalling Operations for RO
2025-02-10 15:10:07 -05:00
Yiltan 944444cf12 Merge pull request #39 from Yiltan/ro/fix-teamreduce
Fix Team reduction intra-node
2025-02-10 14:56:27 -05:00
Yiltan Hassan Temucin 022b2c27e7 Fix Team reduction intra-node 2025-02-07 08:39:35 -06:00
Avinash Kethineedi d97e5ba2c8 Merge pull request #36 from avinashkethineedi/fix/rocshmem-ctx-wg-team-sync
Fix `rocshmem_ctx_wg_team_sync` API
2025-02-06 13:41:16 -06:00
Yiltan Hassan Temucin f1c25f7e19 [RO] implemented signaling operations 2025-02-06 10:17:32 -06:00
Yiltan Hassan Temucin 21171deeb8 [RO] added MPI_UNSIGNED_LONG as type 2025-02-06 10:17:32 -06:00
avinashkethineedi c5b548c398 Fix rocshmem_ctx_wg_team_sync API
- Updated `rocshmem_ctx_wg_team_sync` to utilize a team-specific memory buffer for synchronization
2025-02-05 19:09:07 +00:00
avinashkethineedi e311400d15 Fix rocshmem_ctx_my_pe and rocshmem_ctx_n_pes APIs to return PE numbering and size relative to the team in a team-specific context. 2025-02-05 03:41:40 +00:00
Avinash Kethineedi 248972b30b Merge pull request #31 from avinashkethineedi/rocshmem_g
Implement `rocshmem_g` API and optimize memory usage
2025-02-04 11:15:41 -06:00
Yiltan Hassan Temucin fd3eaa3f69 [IPC] Fix ROCSHMEM_SIGNAL_ADD 2025-02-03 09:59:28 -08:00
avinashkethineedi 757d7e53ca Implement rocshmem_g API and optimize memory usage
- Implement `rocshmem_g` API
- Free up memory space allocated for `rocshmem_g` and atomic operations' return values
2025-02-02 05:56:46 +00:00
avinashkethineedi 1ef2d3a6b7 Replace raw pointers for host_interface with shared_ptr to enable automatic memory handling 2025-01-13 20:58:43 +00:00
Yiltan Temucin c0e4a32ca2 IPC backend now aborts with rocshmem global_exit() 2024-12-23 11:03:04 -06:00
avinashkethineedi cb8b9094b4 Fix rocshmem_team_split_strided API 2024-12-21 18:16:42 +00:00
Yiltan Temucin fa0858833e Remove comparisons of signed to unsigned values 2024-12-12 10:21:08 -06:00
Yiltan Temucin 658915ed35 Renamed debug.hpp to rocshmem_debug.hpp 2024-12-06 15:49:50 -06:00
avinashkethineedi 6486e29078 Rename config.h to roc_shmem_config.h 2024-12-06 01:08:13 +00:00
avinashkethineedi d8ce066adc Merge branch PR #55 into naming_scheme 2024-12-04 21:46:38 +00:00
Brandon Potter fd8dbc7fb6 Use new naming scheme 2024-11-25 14:25:29 -06:00
Yiltan Temucin d8f44e4436 Added Signalling Operations 2024-11-22 15:36:17 -06:00
Yiltan a59e946e44 Merge pull request #51 from Yiltan/roc_shmemx_correction
Removing instances of `roc_shmemx`
2024-11-19 13:28:05 -05:00