Commit Graph

91 Commitit

Tekijä SHA1 Viesti Päivämäärä
avinashkethineedi cb8b9094b4 Fix rocshmem_team_split_strided API 2024-12-21 18:16:42 +00:00
Yiltan Temucin fa0858833e Remove comparisons of signed to unsigned values 2024-12-12 10:21:08 -06:00
Yiltan Temucin 658915ed35 Renamed debug.hpp to rocshmem_debug.hpp 2024-12-06 15:49:50 -06:00
avinashkethineedi 6486e29078 Rename config.h to roc_shmem_config.h 2024-12-06 01:08:13 +00:00
avinashkethineedi d8ce066adc Merge branch PR #55 into naming_scheme 2024-12-04 21:46:38 +00:00
Brandon Potter fd8dbc7fb6 Use new naming scheme 2024-11-25 14:25:29 -06:00
Yiltan Temucin d8f44e4436 Added Signalling Operations 2024-11-22 15:36:17 -06:00
Yiltan a59e946e44 Merge pull request #51 from Yiltan/roc_shmemx_correction
Removing instances of `roc_shmemx`
2024-11-19 13:28:05 -05:00
Avinash Kethineedi 2cb5cab038 Merge pull request #52 from avinashkethineedi/IPC_puts/gets
Update puts and gets with fence call
2024-11-14 13:19:24 -06:00
avinashkethineedi d1ee997542 Update puts and gets to include a fence following data movement, ensuring data visibility 2024-11-12 16:52:07 +00:00
Yiltan Temucin c2b736ef3d converted roc_shmemx to roc_shmem 2024-11-12 08:37:56 -06:00
avinashkethineedi 5e3d94c705 Update collective APIs to use teams interface
* Use team-relative numbering in collective functions
* Replace log_stride with stride
2024-11-06 17:50:23 +00:00
Yiltan Hassan Temucin 997eb69b5a modified team based to_all -> reduce 2024-11-06 09:46:43 -06:00
avinashkethineedi b2b0d559cb Merge branch 'ROCm:develop' into active_set_APIs 2024-11-05 23:02:44 +00:00
Yiltan Hassan Temucin fe767d9abf remove cooperative groups 2024-10-30 20:10:21 +00:00
avinashkethineedi 5975b8c621 Update broadcast function to use stride calculations instead of log_stride 2024-10-29 19:10:05 +00:00
avinashkethineedi e1ff06913c Remove device-side active-set-based broadcast API interface from rocSHMEM 2024-10-29 19:04:49 +00:00
avinashkethineedi abec29bd6a Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays
* Change log_stride calcualtions to stride calculations
* Update all_reduce example code to use team based interface
2024-10-28 22:10:18 +00:00
avinashkethineedi c22048112e Remove the device-side active-set-based reduction API interface from rocSHMEM 2024-10-28 21:35:14 +00:00
Yiltan 794b888d69 Merge pull request #43 from ROCm/LWPRHMEM-75-API-differences-bug-fix
Lwprhmem 75 api differences bug fix
2024-10-28 15:45:15 -04:00
Yiltan Temucin 98afb41263 API bug fix in IB conduit 2024-10-24 11:52:03 -05:00
Yiltan Temucin e210020e9b API change bug fix 2024-10-24 11:52:03 -05:00
Edgar Gabriel 11df5427a6 add ascii art for ring allredude 2024-10-24 15:08:32 +00:00
Edgar Gabriel a4b4281f50 fix odd-case allreduce scenarios
if the number of elements to be used in the allreduce operation is not
exact multiple of the work-array buffer size and number of pe's, we need
to adjust the algorithm to:
 - initially perform a ring_allreduce on n_segments * chunk_size (which
   is the integer division of the number of elements and the work-buffer
   size, i.e. will not cover the entire buffer)
 - perform another ring_allreduce where chunk_size is reduced to match
   the remaining elements
 - if the remaining elements from the previous step cannot evenly be
   divded by the number of pe's, we need to perform a direct_allreduce on
   the outstanding number of elements.
2024-10-24 15:08:32 +00:00
Edgar Gabriel 87db7f7d38 fix barrier synchronization on gfx90a 2024-10-24 15:08:28 +00:00
Edgar Gabriel 1fbb89bc73 ipc: add ring_allreduce algorithms
add the ring allreduce algorithm to the ipc conduit in order to be able
to execute slightly largers reductions.
2024-10-24 15:07:17 +00:00
Edgar Gabriel ba21cb7b85 ipc/to_all: add direct allreduce algorithm
add a simple version of an allreduce algorithm as a starting point.
2024-10-24 15:07:14 +00:00
Avinash Kethineedi 8a16968cf2 Merge pull request #41 from avinashkethineedi/collective_routine_buffers
Fine grained memory buffers for work/sync arrays
2024-10-23 23:33:48 -05:00
avinashkethineedi d5ea5868e3 Fix quiet and fence of default context
* Update tinfo of default context
2024-10-22 16:18:05 +00:00
avinashkethineedi 6685d0ab60 Add fine grained memory buffers for work/sync arrays
* Add interanl put_mem/get_mem{_wave, _wg} functions to read/write to work/sync arrays
* Add condition check to ensure all MPI processes are on the same compute node for IPC conduit
2024-10-21 15:28:39 +00:00
Yiltan Hassan Temucin 8b3854b252 updated atomic_fetch() parameters 2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin 722a5f0731 updated *_wait* APIs to use int rather than roc_shmem_cmps 2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin bcf3fdff10 *_wait* routines changed parameter from ptr to ivars to match OpenSHMEM 2024-10-11 13:34:28 -07:00
Brandon Potter e419a8b963 Merge pull request #29 from ROCm/improve-ib-latency
Vectorize WQE segments writes
2024-10-11 11:55:48 -05:00
Yiltan Hassan Temucin 509277c034 fixed notifier bug 2024-10-10 06:45:43 -07:00
Yiltan Hassan Temucin b1134e8633 added notifier->sync() when we are not using cooperative groups
updated scope bug
2024-10-09 13:11:28 -07:00
Yiltan Hassan Temucin 63667a3167 Added Cooperative Groups configure option and header 2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin 1baa071edf Fix initialization order bug 2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin e2f6a65284 fixed barrier issue on MI250X 2024-10-08 13:18:04 -07:00
avinashkethineedi 92fb1abaf2 Add team information to the context
* Update roc_shmem_ctx_fence API to use team-relative PE numbering
* Update backend to populate team_opaque member of ROC_SHMEM_CTX_DEFAULT (used to store information about the team wrt TEAM_WORLD)
2024-10-04 17:56:15 +00:00
avinashkethineedi 979aed105a Add fence and quiet functionality
* Perform atomic stores to enforce memory ordering
2024-10-03 06:28:12 +00:00
Brandon Potter 787cf0ff3f Merge pull request #31 from BKP/ipc_bringup_fine_unit_09-26-24
Add IPC Simple Buffer Fine-grained Unit Tests
2024-10-01 15:12:30 -05:00
Brandon Potter 24b928a007 Poll the signal from one thread instead of all 2024-10-01 15:01:37 -05:00
Brandon Potter f85c46ec0a Bugfixes for the ipc unit tests 2024-09-26 13:40:05 -05:00
Edgar Gabriel c133ea18a5 fix assembly switch/case instruction
move the case statement out of the architecture specific section.
2024-09-20 20:25:40 +00:00
Muhammad Awad 3162d49b56 Vectorize WQe segments writes
Signed-off-by: Muhammad Awad <MuhammadAbdelghaffar.Awad@amd.com>
2024-09-17 20:34:18 -05:00
Brandon Potter 86a2f34539 Add missing header file 2024-09-10 09:35:02 -07:00
Brandon Potter 7411c45591 Conservatively use SEQ_CST atomics in IPC conduit 2024-09-10 09:34:45 -07:00
Brandon Potter 2806e1be79 Intermediate commit for rebase 2024-09-10 07:10:22 -07:00
Brandon Potter 45c29e7734 Minor updates to Nofifier sync method 2024-09-10 07:10:21 -07:00