Wykres commitów

56 Commity

Autor SHA1 Wiadomość Data
Edgar Gabriel bcbc42e78f add rocshmem_barrier() (#61)
* add team-barrier implementation

add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.

* add team_barrier_tests to functional tests
2025-03-24 11:23:03 -05:00
Yiltan 658bf2a3b5 Removed GPU_IB (#59) 2025-03-24 09:04:52 -04:00
Avinash Kethineedi eb5a38e806 Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations (#48)
* Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations

- Modified the Device proxy class to determine memory allocation size at runtime.
- Updated all classes that include the Device proxy to use dynamic memory allocation.
- Removed compile-time memory size calculations.
- Ensured the allocated number of backend queue data structures matches the number of RO device contexts.
2025-02-24 15:11:46 -06:00
avinashkethineedi c5b548c398 Fix rocshmem_ctx_wg_team_sync API
- Updated `rocshmem_ctx_wg_team_sync` to utilize a team-specific memory buffer for synchronization
2025-02-05 19:09:07 +00:00
Avinash Kethineedi 248972b30b Merge pull request #31 from avinashkethineedi/rocshmem_g
Implement `rocshmem_g` API and optimize memory usage
2025-02-04 11:15:41 -06:00
Yiltan Hassan Temucin fd3eaa3f69 [IPC] Fix ROCSHMEM_SIGNAL_ADD 2025-02-03 09:59:28 -08:00
avinashkethineedi 757d7e53ca Implement rocshmem_g API and optimize memory usage
- Implement `rocshmem_g` API
- Free up memory space allocated for `rocshmem_g` and atomic operations' return values
2025-02-02 05:56:46 +00:00
avinashkethineedi 1ef2d3a6b7 Replace raw pointers for host_interface with shared_ptr to enable automatic memory handling 2025-01-13 20:58:43 +00:00
Yiltan Temucin c0e4a32ca2 IPC backend now aborts with rocshmem global_exit() 2024-12-23 11:03:04 -06:00
Yiltan Temucin fa0858833e Remove comparisons of signed to unsigned values 2024-12-12 10:21:08 -06:00
avinashkethineedi 6486e29078 Rename config.h to roc_shmem_config.h 2024-12-06 01:08:13 +00:00
avinashkethineedi d8ce066adc Merge branch PR #55 into naming_scheme 2024-12-04 21:46:38 +00:00
Brandon Potter fd8dbc7fb6 Use new naming scheme 2024-11-25 14:25:29 -06:00
Yiltan Temucin d8f44e4436 Added Signalling Operations 2024-11-22 15:36:17 -06:00
Avinash Kethineedi 2cb5cab038 Merge pull request #52 from avinashkethineedi/IPC_puts/gets
Update puts and gets with fence call
2024-11-14 13:19:24 -06:00
avinashkethineedi d1ee997542 Update puts and gets to include a fence following data movement, ensuring data visibility 2024-11-12 16:52:07 +00:00
avinashkethineedi 5e3d94c705 Update collective APIs to use teams interface
* Use team-relative numbering in collective functions
* Replace log_stride with stride
2024-11-06 17:50:23 +00:00
Yiltan Hassan Temucin 997eb69b5a modified team based to_all -> reduce 2024-11-06 09:46:43 -06:00
avinashkethineedi b2b0d559cb Merge branch 'ROCm:develop' into active_set_APIs 2024-11-05 23:02:44 +00:00
Yiltan Hassan Temucin fe767d9abf remove cooperative groups 2024-10-30 20:10:21 +00:00
avinashkethineedi 5975b8c621 Update broadcast function to use stride calculations instead of log_stride 2024-10-29 19:10:05 +00:00
avinashkethineedi abec29bd6a Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays
* Change log_stride calcualtions to stride calculations
* Update all_reduce example code to use team based interface
2024-10-28 22:10:18 +00:00
Edgar Gabriel 11df5427a6 add ascii art for ring allredude 2024-10-24 15:08:32 +00:00
Edgar Gabriel a4b4281f50 fix odd-case allreduce scenarios
if the number of elements to be used in the allreduce operation is not
exact multiple of the work-array buffer size and number of pe's, we need
to adjust the algorithm to:
 - initially perform a ring_allreduce on n_segments * chunk_size (which
   is the integer division of the number of elements and the work-buffer
   size, i.e. will not cover the entire buffer)
 - perform another ring_allreduce where chunk_size is reduced to match
   the remaining elements
 - if the remaining elements from the previous step cannot evenly be
   divded by the number of pe's, we need to perform a direct_allreduce on
   the outstanding number of elements.
2024-10-24 15:08:32 +00:00
Edgar Gabriel 87db7f7d38 fix barrier synchronization on gfx90a 2024-10-24 15:08:28 +00:00
Edgar Gabriel 1fbb89bc73 ipc: add ring_allreduce algorithms
add the ring allreduce algorithm to the ipc conduit in order to be able
to execute slightly largers reductions.
2024-10-24 15:07:17 +00:00
Edgar Gabriel ba21cb7b85 ipc/to_all: add direct allreduce algorithm
add a simple version of an allreduce algorithm as a starting point.
2024-10-24 15:07:14 +00:00
Avinash Kethineedi 8a16968cf2 Merge pull request #41 from avinashkethineedi/collective_routine_buffers
Fine grained memory buffers for work/sync arrays
2024-10-23 23:33:48 -05:00
avinashkethineedi d5ea5868e3 Fix quiet and fence of default context
* Update tinfo of default context
2024-10-22 16:18:05 +00:00
avinashkethineedi 6685d0ab60 Add fine grained memory buffers for work/sync arrays
* Add interanl put_mem/get_mem{_wave, _wg} functions to read/write to work/sync arrays
* Add condition check to ensure all MPI processes are on the same compute node for IPC conduit
2024-10-21 15:28:39 +00:00
Yiltan Hassan Temucin 722a5f0731 updated *_wait* APIs to use int rather than roc_shmem_cmps 2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin bcf3fdff10 *_wait* routines changed parameter from ptr to ivars to match OpenSHMEM 2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin 509277c034 fixed notifier bug 2024-10-10 06:45:43 -07:00
Yiltan Hassan Temucin b1134e8633 added notifier->sync() when we are not using cooperative groups
updated scope bug
2024-10-09 13:11:28 -07:00
Yiltan Hassan Temucin 63667a3167 Added Cooperative Groups configure option and header 2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin 1baa071edf Fix initialization order bug 2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin e2f6a65284 fixed barrier issue on MI250X 2024-10-08 13:18:04 -07:00
avinashkethineedi 92fb1abaf2 Add team information to the context
* Update roc_shmem_ctx_fence API to use team-relative PE numbering
* Update backend to populate team_opaque member of ROC_SHMEM_CTX_DEFAULT (used to store information about the team wrt TEAM_WORLD)
2024-10-04 17:56:15 +00:00
avinashkethineedi 979aed105a Add fence and quiet functionality
* Perform atomic stores to enforce memory ordering
2024-10-03 06:28:12 +00:00
Avinash Kethineedi e58077e3cf Merge branch 'ipc_bringup' into ipc_atomics 2024-09-09 14:22:55 -05:00
Edgar Gabriel dfcacdc4a3 remove pSync from internal_bcast functions
remove the pSync arguments from the internal_broadcast functions,
they are not used anyway.
2024-09-09 12:06:30 -07:00
avinashkethineedi 7bbf34d334 remove local_pe calculation from puts, gets and atomics functions
* All the PEs are assumed to be accessible using IPC backend
2024-09-05 11:52:00 -07:00
Edgar Gabriel aae6295460 ipc/context_ipc_device.cpp: set barrier_sync
set the barrier_sync variable on the context during
object creation
2024-08-28 09:41:05 -07:00
avinashkethineedi e1e1ac6df6 Add atomics
* Add atomic_add, atomic_set, atomic_cas, atomic_fetch_add and atomic_fetch_cas to IPC backend
2024-08-28 08:30:46 -07:00
avinashkethineedi 45a8cb3354 Update IPC object
* Update the IPC object in the context class with the instance created in the IPC backend
2024-08-28 08:14:38 -07:00
Edgar Gabriel 0de3b5e6fc first cut on collectives and sync
code is based on the GPUIB implementations of the routines, which seem
however generic enough to work also for the IPC conduit.

Some code is in for broadcast, fcollect, and alltoall.
2024-08-27 15:03:38 -07:00
Edgar Gabriel e2e30b5339 remove device wait_until functions
adding the device versions of the wait_until* and test functions in the
ipc folder leads to linking errors of the functional tests. Remove them
and use for now the upper level versions of the functions, similarly to
the RO conduit. Might have to revisit this later again.
2024-08-27 15:03:32 -07:00
avinashkethineedi a9571ec002 Add buffers required for collectives 2024-08-22 09:28:09 -07:00
avinashkethineedi a59bdd4f6b Add IPC teams 2024-08-22 09:15:44 -07:00
avinashkethineedi c8b0f2378e Add gets and puts functionality to IPC context 2024-08-15 13:17:44 -07:00