Graf commitů

158 Commity

Autor SHA1 Zpráva Datum
Yiltan Temucin ff8aab522b Fixed typo in examples 2024-11-22 15:36:17 -06:00
Yiltan Temucin ec72aad517 Create put_signal example 2024-11-22 15:36:17 -06:00
Yiltan Temucin d8f44e4436 Added Signalling Operations 2024-11-22 15:36:17 -06:00
Yiltan 308816bc5e Merge pull request #49 from Yiltan/unit-tests-driver-bug
driver should now return a fail code if any of the mpirun's fail
2024-11-22 16:35:36 -05:00
Yiltan a59e946e44 Merge pull request #51 from Yiltan/roc_shmemx_correction
Removing instances of `roc_shmemx`
2024-11-19 13:28:05 -05:00
Yiltan Temucin 50e46847c6 Explicitly require rocPRIM and rocThrust. 2024-11-19 08:54:18 -06:00
Yiltan Temucin 4ad24b5aab Propergate errors from build scripts so CI doesn't silently fail 2024-11-15 11:17:33 -06:00
Yiltan Temucin 3f857718fd Fixed bug in functional and unit tests driver.sh
- The driver previously did not propagate errors correctly
- Adjusted gtest filters

driver edit
2024-11-15 10:50:31 -06:00
Avinash Kethineedi 02834a66a8 Merge pull request #53 from avinashkethineedi/CMake_file
CMake file for examples folder
2024-11-15 10:20:05 -06:00
avinashkethineedi 1f3b242e12 Add CMake file for examples folder 2024-11-14 19:50:23 +00:00
Avinash Kethineedi 2cb5cab038 Merge pull request #52 from avinashkethineedi/IPC_puts/gets
Update puts and gets with fence call
2024-11-14 13:19:24 -06:00
Avinash Kethineedi 3edf881b40 Merge pull request #50 from avinashkethineedi/teams_interface
Update collective APIs to use teams interface
2024-11-12 15:31:42 -06:00
avinashkethineedi d1ee997542 Update puts and gets to include a fence following data movement, ensuring data visibility 2024-11-12 16:52:07 +00:00
Yiltan Temucin c2b736ef3d converted roc_shmemx to roc_shmem 2024-11-12 08:37:56 -06:00
avinashkethineedi 5e3d94c705 Update collective APIs to use teams interface
* Use team-relative numbering in collective functions
* Replace log_stride with stride
2024-11-06 17:50:23 +00:00
Yiltan 958575d8a4 Merge pull request #45 from Yiltan/to_all_reduce
Fixed Function Signature for `to_all` APIs
2024-11-06 10:54:36 -05:00
Yiltan Temucin 799d9d5ed7 updated examples to use new APIs 2024-11-06 09:49:06 -06:00
Yiltan Hassan Temucin 9aa9aea7e6 removed external access to non-team based reduce 2024-11-06 09:46:47 -06:00
Yiltan Hassan Temucin 997eb69b5a modified team based to_all -> reduce 2024-11-06 09:46:43 -06:00
Avinash Kethineedi 75ece02048 Merge pull request #46 from avinashkethineedi/active_set_APIs
Remove device-side active-set-based APIs
2024-11-05 18:43:40 -06:00
avinashkethineedi b2b0d559cb Merge branch 'ROCm:develop' into active_set_APIs 2024-11-05 23:02:44 +00:00
Yiltan b141f354cf Merge pull request #47 from Yiltan/revert-pr36-coopgroups
Remove Cooperative Groups  (Partially Revert #PR36)
2024-11-04 09:35:15 -05:00
Yiltan Hassan Temucin fe767d9abf remove cooperative groups 2024-10-30 20:10:21 +00:00
avinashkethineedi 68c893d790 Add example code demonstrating team-based broadcast and alltoall API usage
* Update all_reduce test to keep the naming convention uniform across the examples
2024-10-30 19:09:17 +00:00
avinashkethineedi 5975b8c621 Update broadcast function to use stride calculations instead of log_stride 2024-10-29 19:10:05 +00:00
avinashkethineedi e1ff06913c Remove device-side active-set-based broadcast API interface from rocSHMEM 2024-10-29 19:04:49 +00:00
avinashkethineedi 9a524046fe Remove active-set-based broadcast test from the functional tests suite 2024-10-29 16:18:46 +00:00
avinashkethineedi abec29bd6a Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays
* Change log_stride calcualtions to stride calculations
* Update all_reduce example code to use team based interface
2024-10-28 22:10:18 +00:00
avinashkethineedi c22048112e Remove the device-side active-set-based reduction API interface from rocSHMEM 2024-10-28 21:35:14 +00:00
avinashkethineedi e9484bbb86 Remove active-set-based reduction test from the functional tests suite 2024-10-28 21:22:46 +00:00
Yiltan 9885f984f6 Merge pull request #44 from ROCm/fix-printing
Clean up functional tests output
2024-10-28 15:45:28 -04:00
Yiltan 794b888d69 Merge pull request #43 from ROCm/LWPRHMEM-75-API-differences-bug-fix
Lwprhmem 75 api differences bug fix
2024-10-28 15:45:15 -04:00
Yiltan Temucin 9576ff6440 Cleaned up how we print the output 2024-10-28 13:37:33 -05:00
Edgar Gabriel 7ae0a54550 Merge pull request #32 from edgargabriel/topic/to_all_first_version
ipc/to_all: add direct allreduce algorithm
2024-10-24 15:59:24 -05:00
Yiltan Temucin 98afb41263 API bug fix in IB conduit 2024-10-24 11:52:03 -05:00
Yiltan Temucin e210020e9b API change bug fix 2024-10-24 11:52:03 -05:00
Edgar Gabriel 11df5427a6 add ascii art for ring allredude 2024-10-24 15:08:32 +00:00
Edgar Gabriel a4b4281f50 fix odd-case allreduce scenarios
if the number of elements to be used in the allreduce operation is not
exact multiple of the work-array buffer size and number of pe's, we need
to adjust the algorithm to:
 - initially perform a ring_allreduce on n_segments * chunk_size (which
   is the integer division of the number of elements and the work-buffer
   size, i.e. will not cover the entire buffer)
 - perform another ring_allreduce where chunk_size is reduced to match
   the remaining elements
 - if the remaining elements from the previous step cannot evenly be
   divded by the number of pe's, we need to perform a direct_allreduce on
   the outstanding number of elements.
2024-10-24 15:08:32 +00:00
Edgar Gabriel 87db7f7d38 fix barrier synchronization on gfx90a 2024-10-24 15:08:28 +00:00
Edgar Gabriel a0ac7b2d60 add some example code
first examples include a getmem testcase and an allreduce (to_all)
example.
2024-10-24 15:07:17 +00:00
Edgar Gabriel 1fbb89bc73 ipc: add ring_allreduce algorithms
add the ring allreduce algorithm to the ipc conduit in order to be able
to execute slightly largers reductions.
2024-10-24 15:07:17 +00:00
Edgar Gabriel ba21cb7b85 ipc/to_all: add direct allreduce algorithm
add a simple version of an allreduce algorithm as a starting point.
2024-10-24 15:07:14 +00:00
Brandon Potter 416dffa129 Merge pull request #34 from BKP/ipc_parameterized_simple_tests_10-01-24
IPC Parameterized Simple Tests
2024-10-24 08:23:26 -05:00
Avinash Kethineedi 8a16968cf2 Merge pull request #41 from avinashkethineedi/collective_routine_buffers
Fine grained memory buffers for work/sync arrays
2024-10-23 23:33:48 -05:00
Avinash Kethineedi e594a9cb85 Merge pull request #42 from avinashkethineedi/fix_quiet/fence
Fix quiet and fence of default context
2024-10-22 12:56:43 -05:00
avinashkethineedi d5ea5868e3 Fix quiet and fence of default context
* Update tinfo of default context
2024-10-22 16:18:05 +00:00
avinashkethineedi 6685d0ab60 Add fine grained memory buffers for work/sync arrays
* Add interanl put_mem/get_mem{_wave, _wg} functions to read/write to work/sync arrays
* Add condition check to ensure all MPI processes are on the same compute node for IPC conduit
2024-10-21 15:28:39 +00:00
Yiltan b922bdcf4c Merge pull request #39 from Yiltan/LWPRHMEM-75-API-differences
LWPRHMEM-75 API Differences
2024-10-18 15:27:34 -04:00
Avinash Kethineedi 892b4e2436 Merge pull request #40 from avinashkethineedi/functional_tests/puts_gets
Functional tests {wave, wg} puts and gets
2024-10-17 17:23:05 -05:00
avinashkethineedi 18a1bdd0ac Use C++ iota function to reset buffers and use its values for verification
* Update functional test script to include new tests
2024-10-15 20:23:25 +00:00