Граф коммитов

150 Коммитов

Автор SHA1 Сообщение Дата
Yiltan Temucin 3f857718fd Fixed bug in functional and unit tests driver.sh
- The driver previously did not propagate errors correctly
- Adjusted gtest filters

driver edit
2024-11-15 10:50:31 -06:00
Avinash Kethineedi 02834a66a8 Merge pull request #53 from avinashkethineedi/CMake_file
CMake file for examples folder
2024-11-15 10:20:05 -06:00
avinashkethineedi 1f3b242e12 Add CMake file for examples folder 2024-11-14 19:50:23 +00:00
Avinash Kethineedi 2cb5cab038 Merge pull request #52 from avinashkethineedi/IPC_puts/gets
Update puts and gets with fence call
2024-11-14 13:19:24 -06:00
Avinash Kethineedi 3edf881b40 Merge pull request #50 from avinashkethineedi/teams_interface
Update collective APIs to use teams interface
2024-11-12 15:31:42 -06:00
avinashkethineedi d1ee997542 Update puts and gets to include a fence following data movement, ensuring data visibility 2024-11-12 16:52:07 +00:00
avinashkethineedi 5e3d94c705 Update collective APIs to use teams interface
* Use team-relative numbering in collective functions
* Replace log_stride with stride
2024-11-06 17:50:23 +00:00
Yiltan 958575d8a4 Merge pull request #45 from Yiltan/to_all_reduce
Fixed Function Signature for `to_all` APIs
2024-11-06 10:54:36 -05:00
Yiltan Temucin 799d9d5ed7 updated examples to use new APIs 2024-11-06 09:49:06 -06:00
Yiltan Hassan Temucin 9aa9aea7e6 removed external access to non-team based reduce 2024-11-06 09:46:47 -06:00
Yiltan Hassan Temucin 997eb69b5a modified team based to_all -> reduce 2024-11-06 09:46:43 -06:00
Avinash Kethineedi 75ece02048 Merge pull request #46 from avinashkethineedi/active_set_APIs
Remove device-side active-set-based APIs
2024-11-05 18:43:40 -06:00
avinashkethineedi b2b0d559cb Merge branch 'ROCm:develop' into active_set_APIs 2024-11-05 23:02:44 +00:00
Yiltan b141f354cf Merge pull request #47 from Yiltan/revert-pr36-coopgroups
Remove Cooperative Groups  (Partially Revert #PR36)
2024-11-04 09:35:15 -05:00
Yiltan Hassan Temucin fe767d9abf remove cooperative groups 2024-10-30 20:10:21 +00:00
avinashkethineedi 68c893d790 Add example code demonstrating team-based broadcast and alltoall API usage
* Update all_reduce test to keep the naming convention uniform across the examples
2024-10-30 19:09:17 +00:00
avinashkethineedi 5975b8c621 Update broadcast function to use stride calculations instead of log_stride 2024-10-29 19:10:05 +00:00
avinashkethineedi e1ff06913c Remove device-side active-set-based broadcast API interface from rocSHMEM 2024-10-29 19:04:49 +00:00
avinashkethineedi 9a524046fe Remove active-set-based broadcast test from the functional tests suite 2024-10-29 16:18:46 +00:00
avinashkethineedi abec29bd6a Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays
* Change log_stride calcualtions to stride calculations
* Update all_reduce example code to use team based interface
2024-10-28 22:10:18 +00:00
avinashkethineedi c22048112e Remove the device-side active-set-based reduction API interface from rocSHMEM 2024-10-28 21:35:14 +00:00
avinashkethineedi e9484bbb86 Remove active-set-based reduction test from the functional tests suite 2024-10-28 21:22:46 +00:00
Yiltan 9885f984f6 Merge pull request #44 from ROCm/fix-printing
Clean up functional tests output
2024-10-28 15:45:28 -04:00
Yiltan 794b888d69 Merge pull request #43 from ROCm/LWPRHMEM-75-API-differences-bug-fix
Lwprhmem 75 api differences bug fix
2024-10-28 15:45:15 -04:00
Yiltan Temucin 9576ff6440 Cleaned up how we print the output 2024-10-28 13:37:33 -05:00
Edgar Gabriel 7ae0a54550 Merge pull request #32 from edgargabriel/topic/to_all_first_version
ipc/to_all: add direct allreduce algorithm
2024-10-24 15:59:24 -05:00
Yiltan Temucin 98afb41263 API bug fix in IB conduit 2024-10-24 11:52:03 -05:00
Yiltan Temucin e210020e9b API change bug fix 2024-10-24 11:52:03 -05:00
Edgar Gabriel 11df5427a6 add ascii art for ring allredude 2024-10-24 15:08:32 +00:00
Edgar Gabriel a4b4281f50 fix odd-case allreduce scenarios
if the number of elements to be used in the allreduce operation is not
exact multiple of the work-array buffer size and number of pe's, we need
to adjust the algorithm to:
 - initially perform a ring_allreduce on n_segments * chunk_size (which
   is the integer division of the number of elements and the work-buffer
   size, i.e. will not cover the entire buffer)
 - perform another ring_allreduce where chunk_size is reduced to match
   the remaining elements
 - if the remaining elements from the previous step cannot evenly be
   divded by the number of pe's, we need to perform a direct_allreduce on
   the outstanding number of elements.
2024-10-24 15:08:32 +00:00
Edgar Gabriel 87db7f7d38 fix barrier synchronization on gfx90a 2024-10-24 15:08:28 +00:00
Edgar Gabriel a0ac7b2d60 add some example code
first examples include a getmem testcase and an allreduce (to_all)
example.
2024-10-24 15:07:17 +00:00
Edgar Gabriel 1fbb89bc73 ipc: add ring_allreduce algorithms
add the ring allreduce algorithm to the ipc conduit in order to be able
to execute slightly largers reductions.
2024-10-24 15:07:17 +00:00
Edgar Gabriel ba21cb7b85 ipc/to_all: add direct allreduce algorithm
add a simple version of an allreduce algorithm as a starting point.
2024-10-24 15:07:14 +00:00
Brandon Potter 416dffa129 Merge pull request #34 from BKP/ipc_parameterized_simple_tests_10-01-24
IPC Parameterized Simple Tests
2024-10-24 08:23:26 -05:00
Avinash Kethineedi 8a16968cf2 Merge pull request #41 from avinashkethineedi/collective_routine_buffers
Fine grained memory buffers for work/sync arrays
2024-10-23 23:33:48 -05:00
Avinash Kethineedi e594a9cb85 Merge pull request #42 from avinashkethineedi/fix_quiet/fence
Fix quiet and fence of default context
2024-10-22 12:56:43 -05:00
avinashkethineedi d5ea5868e3 Fix quiet and fence of default context
* Update tinfo of default context
2024-10-22 16:18:05 +00:00
avinashkethineedi 6685d0ab60 Add fine grained memory buffers for work/sync arrays
* Add interanl put_mem/get_mem{_wave, _wg} functions to read/write to work/sync arrays
* Add condition check to ensure all MPI processes are on the same compute node for IPC conduit
2024-10-21 15:28:39 +00:00
Yiltan b922bdcf4c Merge pull request #39 from Yiltan/LWPRHMEM-75-API-differences
LWPRHMEM-75 API Differences
2024-10-18 15:27:34 -04:00
Avinash Kethineedi 892b4e2436 Merge pull request #40 from avinashkethineedi/functional_tests/puts_gets
Functional tests {wave, wg} puts and gets
2024-10-17 17:23:05 -05:00
avinashkethineedi 18a1bdd0ac Use C++ iota function to reset buffers and use its values for verification
* Update functional test script to include new tests
2024-10-15 20:23:25 +00:00
Avinash Kethineedi e981f61693 Merge branch 'ROCm:develop' into functional_tests/puts_gets 2024-10-14 10:27:54 -05:00
Yiltan Hassan Temucin 8b3854b252 updated atomic_fetch() parameters 2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin 722a5f0731 updated *_wait* APIs to use int rather than roc_shmem_cmps 2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin bcf3fdff10 *_wait* routines changed parameter from ptr to ivars to match OpenSHMEM 2024-10-11 13:34:28 -07:00
Brandon Potter ce0ca36d37 Merge branch 'ROCm:develop' into ipc_parameterized_simple_tests_10-01-24 2024-10-11 12:49:56 -05:00
Brandon Potter e419a8b963 Merge pull request #29 from ROCm/improve-ib-latency
Vectorize WQE segments writes
2024-10-11 11:55:48 -05:00
Yiltan 8015a453ff Merge pull request #36 from Yiltan/LWPRHMEM-71-add-coop-groups
Add cooperative groups for sync collective
2024-10-11 12:55:33 -04:00
Yiltan Hassan Temucin 509277c034 fixed notifier bug 2024-10-10 06:45:43 -07:00