Yiltan Temucin
c0e4a32ca2
IPC backend now aborts with rocshmem global_exit()
2024-12-23 11:03:04 -06:00
Yiltan Temucin
fa0858833e
Remove comparisons of signed to unsigned values
2024-12-12 10:21:08 -06:00
avinashkethineedi
6486e29078
Rename config.h to roc_shmem_config.h
2024-12-06 01:08:13 +00:00
avinashkethineedi
d8ce066adc
Merge branch PR #55 into naming_scheme
2024-12-04 21:46:38 +00:00
Brandon Potter
fd8dbc7fb6
Use new naming scheme
2024-11-25 14:25:29 -06:00
Yiltan Temucin
d8f44e4436
Added Signalling Operations
2024-11-22 15:36:17 -06:00
Avinash Kethineedi
2cb5cab038
Merge pull request #52 from avinashkethineedi/IPC_puts/gets
...
Update puts and gets with fence call
2024-11-14 13:19:24 -06:00
avinashkethineedi
d1ee997542
Update puts and gets to include a fence following data movement, ensuring data visibility
2024-11-12 16:52:07 +00:00
avinashkethineedi
5e3d94c705
Update collective APIs to use teams interface
...
* Use team-relative numbering in collective functions
* Replace log_stride with stride
2024-11-06 17:50:23 +00:00
Yiltan Hassan Temucin
997eb69b5a
modified team based to_all -> reduce
2024-11-06 09:46:43 -06:00
avinashkethineedi
b2b0d559cb
Merge branch 'ROCm:develop' into active_set_APIs
2024-11-05 23:02:44 +00:00
Yiltan Hassan Temucin
fe767d9abf
remove cooperative groups
2024-10-30 20:10:21 +00:00
avinashkethineedi
5975b8c621
Update broadcast function to use stride calculations instead of log_stride
2024-10-29 19:10:05 +00:00
avinashkethineedi
abec29bd6a
Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays
...
* Change log_stride calcualtions to stride calculations
* Update all_reduce example code to use team based interface
2024-10-28 22:10:18 +00:00
Edgar Gabriel
11df5427a6
add ascii art for ring allredude
2024-10-24 15:08:32 +00:00
Edgar Gabriel
a4b4281f50
fix odd-case allreduce scenarios
...
if the number of elements to be used in the allreduce operation is not
exact multiple of the work-array buffer size and number of pe's, we need
to adjust the algorithm to:
- initially perform a ring_allreduce on n_segments * chunk_size (which
is the integer division of the number of elements and the work-buffer
size, i.e. will not cover the entire buffer)
- perform another ring_allreduce where chunk_size is reduced to match
the remaining elements
- if the remaining elements from the previous step cannot evenly be
divded by the number of pe's, we need to perform a direct_allreduce on
the outstanding number of elements.
2024-10-24 15:08:32 +00:00
Edgar Gabriel
87db7f7d38
fix barrier synchronization on gfx90a
2024-10-24 15:08:28 +00:00
Edgar Gabriel
1fbb89bc73
ipc: add ring_allreduce algorithms
...
add the ring allreduce algorithm to the ipc conduit in order to be able
to execute slightly largers reductions.
2024-10-24 15:07:17 +00:00
Edgar Gabriel
ba21cb7b85
ipc/to_all: add direct allreduce algorithm
...
add a simple version of an allreduce algorithm as a starting point.
2024-10-24 15:07:14 +00:00
Avinash Kethineedi
8a16968cf2
Merge pull request #41 from avinashkethineedi/collective_routine_buffers
...
Fine grained memory buffers for work/sync arrays
2024-10-23 23:33:48 -05:00
avinashkethineedi
d5ea5868e3
Fix quiet and fence of default context
...
* Update tinfo of default context
2024-10-22 16:18:05 +00:00
avinashkethineedi
6685d0ab60
Add fine grained memory buffers for work/sync arrays
...
* Add interanl put_mem/get_mem{_wave, _wg} functions to read/write to work/sync arrays
* Add condition check to ensure all MPI processes are on the same compute node for IPC conduit
2024-10-21 15:28:39 +00:00
Yiltan Hassan Temucin
722a5f0731
updated *_wait* APIs to use int rather than roc_shmem_cmps
2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin
bcf3fdff10
*_wait* routines changed parameter from ptr to ivars to match OpenSHMEM
2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin
509277c034
fixed notifier bug
2024-10-10 06:45:43 -07:00
Yiltan Hassan Temucin
b1134e8633
added notifier->sync() when we are not using cooperative groups
...
updated scope bug
2024-10-09 13:11:28 -07:00
Yiltan Hassan Temucin
63667a3167
Added Cooperative Groups configure option and header
2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin
1baa071edf
Fix initialization order bug
2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin
e2f6a65284
fixed barrier issue on MI250X
2024-10-08 13:18:04 -07:00
avinashkethineedi
92fb1abaf2
Add team information to the context
...
* Update roc_shmem_ctx_fence API to use team-relative PE numbering
* Update backend to populate team_opaque member of ROC_SHMEM_CTX_DEFAULT (used to store information about the team wrt TEAM_WORLD)
2024-10-04 17:56:15 +00:00
avinashkethineedi
979aed105a
Add fence and quiet functionality
...
* Perform atomic stores to enforce memory ordering
2024-10-03 06:28:12 +00:00
Avinash Kethineedi
e58077e3cf
Merge branch 'ipc_bringup' into ipc_atomics
2024-09-09 14:22:55 -05:00
Edgar Gabriel
dfcacdc4a3
remove pSync from internal_bcast functions
...
remove the pSync arguments from the internal_broadcast functions,
they are not used anyway.
2024-09-09 12:06:30 -07:00
avinashkethineedi
7bbf34d334
remove local_pe calculation from puts, gets and atomics functions
...
* All the PEs are assumed to be accessible using IPC backend
2024-09-05 11:52:00 -07:00
Edgar Gabriel
aae6295460
ipc/context_ipc_device.cpp: set barrier_sync
...
set the barrier_sync variable on the context during
object creation
2024-08-28 09:41:05 -07:00
avinashkethineedi
e1e1ac6df6
Add atomics
...
* Add atomic_add, atomic_set, atomic_cas, atomic_fetch_add and atomic_fetch_cas to IPC backend
2024-08-28 08:30:46 -07:00
avinashkethineedi
45a8cb3354
Update IPC object
...
* Update the IPC object in the context class with the instance created in the IPC backend
2024-08-28 08:14:38 -07:00
Edgar Gabriel
0de3b5e6fc
first cut on collectives and sync
...
code is based on the GPUIB implementations of the routines, which seem
however generic enough to work also for the IPC conduit.
Some code is in for broadcast, fcollect, and alltoall.
2024-08-27 15:03:38 -07:00
Edgar Gabriel
e2e30b5339
remove device wait_until functions
...
adding the device versions of the wait_until* and test functions in the
ipc folder leads to linking errors of the functional tests. Remove them
and use for now the upper level versions of the functions, similarly to
the RO conduit. Might have to revisit this later again.
2024-08-27 15:03:32 -07:00
avinashkethineedi
a9571ec002
Add buffers required for collectives
2024-08-22 09:28:09 -07:00
avinashkethineedi
a59bdd4f6b
Add IPC teams
2024-08-22 09:15:44 -07:00
avinashkethineedi
c8b0f2378e
Add gets and puts functionality to IPC context
2024-08-15 13:17:44 -07:00
avinashkethineedi
b68867ee17
remove ipc_policy.{hpp & cpp} and context_ipc.{hpp & cpp}
...
* move ipc_policy.{hpp & cpp} to src
* rename context_ipc.{hpp & cpp} to context_ipc_device.{hpp & cpp}
2024-08-15 08:52:06 -07:00
avinashkethineedi
49779863c2
Add IPC backend
...
* add backend_ipc.{cpp & hpp}
* rename context_ipc.{cpp & hpp} to context_ipc_device.{cpp & hpp}
* add host interface to IPC backend
* add context_ipc_host.{cpp & hpp} to support host interface
* add USE_RO compile flag to enable support for single backend interface at a time
* add ipc_single script to build rocSHMEM with IPC backend
2024-08-14 22:59:02 -07:00
avinashkethineedi
24375a949e
Code refactor
...
move ipc_policy.hpp and ipc_policy.cpp files to src, since they are used by all the conduits.
2024-08-14 20:44:35 -07:00
Brandon Potter
58c5a98b5d
Add ipc unit_tests
2024-08-07 12:18:12 -07:00
Edgar Gabriel
de750ddacc
src/ipc: add context_ipc barebone code
2024-07-26 08:22:33 -07:00
Edgar Gabriel
1183006e20
src/ipc: IPC folder refactor
...
mv ipc_policy.{hpp,cpp} into a separate folder as a start for the
standalone IPC conduit.
Unit tests and functional tests pass on the developmpent system.
2024-07-25 07:33:41 -07:00