avinashkethineedi
cb8b9094b4
Fix rocshmem_team_split_strided API
2024-12-21 18:16:42 +00:00
Yiltan Temucin
fa0858833e
Remove comparisons of signed to unsigned values
2024-12-12 10:21:08 -06:00
Yiltan Temucin
658915ed35
Renamed debug.hpp to rocshmem_debug.hpp
2024-12-06 15:49:50 -06:00
avinashkethineedi
6486e29078
Rename config.h to roc_shmem_config.h
2024-12-06 01:08:13 +00:00
avinashkethineedi
d8ce066adc
Merge branch PR #55 into naming_scheme
2024-12-04 21:46:38 +00:00
Brandon Potter
fd8dbc7fb6
Use new naming scheme
2024-11-25 14:25:29 -06:00
Yiltan Temucin
d8f44e4436
Added Signalling Operations
2024-11-22 15:36:17 -06:00
Yiltan
a59e946e44
Merge pull request #51 from Yiltan/roc_shmemx_correction
...
Removing instances of `roc_shmemx`
2024-11-19 13:28:05 -05:00
Avinash Kethineedi
2cb5cab038
Merge pull request #52 from avinashkethineedi/IPC_puts/gets
...
Update puts and gets with fence call
2024-11-14 13:19:24 -06:00
avinashkethineedi
d1ee997542
Update puts and gets to include a fence following data movement, ensuring data visibility
2024-11-12 16:52:07 +00:00
Yiltan Temucin
c2b736ef3d
converted roc_shmemx to roc_shmem
2024-11-12 08:37:56 -06:00
avinashkethineedi
5e3d94c705
Update collective APIs to use teams interface
...
* Use team-relative numbering in collective functions
* Replace log_stride with stride
2024-11-06 17:50:23 +00:00
Yiltan Hassan Temucin
997eb69b5a
modified team based to_all -> reduce
2024-11-06 09:46:43 -06:00
avinashkethineedi
b2b0d559cb
Merge branch 'ROCm:develop' into active_set_APIs
2024-11-05 23:02:44 +00:00
Yiltan Hassan Temucin
fe767d9abf
remove cooperative groups
2024-10-30 20:10:21 +00:00
avinashkethineedi
5975b8c621
Update broadcast function to use stride calculations instead of log_stride
2024-10-29 19:10:05 +00:00
avinashkethineedi
e1ff06913c
Remove device-side active-set-based broadcast API interface from rocSHMEM
2024-10-29 19:04:49 +00:00
avinashkethineedi
abec29bd6a
Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays
...
* Change log_stride calcualtions to stride calculations
* Update all_reduce example code to use team based interface
2024-10-28 22:10:18 +00:00
avinashkethineedi
c22048112e
Remove the device-side active-set-based reduction API interface from rocSHMEM
2024-10-28 21:35:14 +00:00
Yiltan
794b888d69
Merge pull request #43 from ROCm/LWPRHMEM-75-API-differences-bug-fix
...
Lwprhmem 75 api differences bug fix
2024-10-28 15:45:15 -04:00
Yiltan Temucin
98afb41263
API bug fix in IB conduit
2024-10-24 11:52:03 -05:00
Yiltan Temucin
e210020e9b
API change bug fix
2024-10-24 11:52:03 -05:00
Edgar Gabriel
11df5427a6
add ascii art for ring allredude
2024-10-24 15:08:32 +00:00
Edgar Gabriel
a4b4281f50
fix odd-case allreduce scenarios
...
if the number of elements to be used in the allreduce operation is not
exact multiple of the work-array buffer size and number of pe's, we need
to adjust the algorithm to:
- initially perform a ring_allreduce on n_segments * chunk_size (which
is the integer division of the number of elements and the work-buffer
size, i.e. will not cover the entire buffer)
- perform another ring_allreduce where chunk_size is reduced to match
the remaining elements
- if the remaining elements from the previous step cannot evenly be
divded by the number of pe's, we need to perform a direct_allreduce on
the outstanding number of elements.
2024-10-24 15:08:32 +00:00
Edgar Gabriel
87db7f7d38
fix barrier synchronization on gfx90a
2024-10-24 15:08:28 +00:00
Edgar Gabriel
1fbb89bc73
ipc: add ring_allreduce algorithms
...
add the ring allreduce algorithm to the ipc conduit in order to be able
to execute slightly largers reductions.
2024-10-24 15:07:17 +00:00
Edgar Gabriel
ba21cb7b85
ipc/to_all: add direct allreduce algorithm
...
add a simple version of an allreduce algorithm as a starting point.
2024-10-24 15:07:14 +00:00
Avinash Kethineedi
8a16968cf2
Merge pull request #41 from avinashkethineedi/collective_routine_buffers
...
Fine grained memory buffers for work/sync arrays
2024-10-23 23:33:48 -05:00
avinashkethineedi
d5ea5868e3
Fix quiet and fence of default context
...
* Update tinfo of default context
2024-10-22 16:18:05 +00:00
avinashkethineedi
6685d0ab60
Add fine grained memory buffers for work/sync arrays
...
* Add interanl put_mem/get_mem{_wave, _wg} functions to read/write to work/sync arrays
* Add condition check to ensure all MPI processes are on the same compute node for IPC conduit
2024-10-21 15:28:39 +00:00
Yiltan Hassan Temucin
8b3854b252
updated atomic_fetch() parameters
2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin
722a5f0731
updated *_wait* APIs to use int rather than roc_shmem_cmps
2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin
bcf3fdff10
*_wait* routines changed parameter from ptr to ivars to match OpenSHMEM
2024-10-11 13:34:28 -07:00
Brandon Potter
e419a8b963
Merge pull request #29 from ROCm/improve-ib-latency
...
Vectorize WQE segments writes
2024-10-11 11:55:48 -05:00
Yiltan Hassan Temucin
509277c034
fixed notifier bug
2024-10-10 06:45:43 -07:00
Yiltan Hassan Temucin
b1134e8633
added notifier->sync() when we are not using cooperative groups
...
updated scope bug
2024-10-09 13:11:28 -07:00
Yiltan Hassan Temucin
63667a3167
Added Cooperative Groups configure option and header
2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin
1baa071edf
Fix initialization order bug
2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin
e2f6a65284
fixed barrier issue on MI250X
2024-10-08 13:18:04 -07:00
avinashkethineedi
92fb1abaf2
Add team information to the context
...
* Update roc_shmem_ctx_fence API to use team-relative PE numbering
* Update backend to populate team_opaque member of ROC_SHMEM_CTX_DEFAULT (used to store information about the team wrt TEAM_WORLD)
2024-10-04 17:56:15 +00:00
avinashkethineedi
979aed105a
Add fence and quiet functionality
...
* Perform atomic stores to enforce memory ordering
2024-10-03 06:28:12 +00:00
Brandon Potter
787cf0ff3f
Merge pull request #31 from BKP/ipc_bringup_fine_unit_09-26-24
...
Add IPC Simple Buffer Fine-grained Unit Tests
2024-10-01 15:12:30 -05:00
Brandon Potter
24b928a007
Poll the signal from one thread instead of all
2024-10-01 15:01:37 -05:00
Brandon Potter
f85c46ec0a
Bugfixes for the ipc unit tests
2024-09-26 13:40:05 -05:00
Edgar Gabriel
c133ea18a5
fix assembly switch/case instruction
...
move the case statement out of the architecture specific section.
2024-09-20 20:25:40 +00:00
Muhammad Awad
3162d49b56
Vectorize WQe segments writes
...
Signed-off-by: Muhammad Awad <MuhammadAbdelghaffar.Awad@amd.com >
2024-09-17 20:34:18 -05:00
Brandon Potter
86a2f34539
Add missing header file
2024-09-10 09:35:02 -07:00
Brandon Potter
7411c45591
Conservatively use SEQ_CST atomics in IPC conduit
2024-09-10 09:34:45 -07:00
Brandon Potter
2806e1be79
Intermediate commit for rebase
2024-09-10 07:10:22 -07:00
Brandon Potter
45c29e7734
Minor updates to Nofifier sync method
2024-09-10 07:10:21 -07:00