Yiltan Temucin
ff8aab522b
Fixed typo in examples
2024-11-22 15:36:17 -06:00
Yiltan Temucin
ec72aad517
Create put_signal example
2024-11-22 15:36:17 -06:00
Yiltan Temucin
d8f44e4436
Added Signalling Operations
2024-11-22 15:36:17 -06:00
Yiltan
308816bc5e
Merge pull request #49 from Yiltan/unit-tests-driver-bug
...
driver should now return a fail code if any of the mpirun's fail
2024-11-22 16:35:36 -05:00
Yiltan
a59e946e44
Merge pull request #51 from Yiltan/roc_shmemx_correction
...
Removing instances of `roc_shmemx`
2024-11-19 13:28:05 -05:00
Yiltan Temucin
50e46847c6
Explicitly require rocPRIM and rocThrust.
2024-11-19 08:54:18 -06:00
Yiltan Temucin
4ad24b5aab
Propergate errors from build scripts so CI doesn't silently fail
2024-11-15 11:17:33 -06:00
Yiltan Temucin
3f857718fd
Fixed bug in functional and unit tests driver.sh
...
- The driver previously did not propagate errors correctly
- Adjusted gtest filters
driver edit
2024-11-15 10:50:31 -06:00
Avinash Kethineedi
02834a66a8
Merge pull request #53 from avinashkethineedi/CMake_file
...
CMake file for examples folder
2024-11-15 10:20:05 -06:00
avinashkethineedi
1f3b242e12
Add CMake file for examples folder
2024-11-14 19:50:23 +00:00
Avinash Kethineedi
2cb5cab038
Merge pull request #52 from avinashkethineedi/IPC_puts/gets
...
Update puts and gets with fence call
2024-11-14 13:19:24 -06:00
Avinash Kethineedi
3edf881b40
Merge pull request #50 from avinashkethineedi/teams_interface
...
Update collective APIs to use teams interface
2024-11-12 15:31:42 -06:00
avinashkethineedi
d1ee997542
Update puts and gets to include a fence following data movement, ensuring data visibility
2024-11-12 16:52:07 +00:00
Yiltan Temucin
c2b736ef3d
converted roc_shmemx to roc_shmem
2024-11-12 08:37:56 -06:00
avinashkethineedi
5e3d94c705
Update collective APIs to use teams interface
...
* Use team-relative numbering in collective functions
* Replace log_stride with stride
2024-11-06 17:50:23 +00:00
Yiltan
958575d8a4
Merge pull request #45 from Yiltan/to_all_reduce
...
Fixed Function Signature for `to_all` APIs
2024-11-06 10:54:36 -05:00
Yiltan Temucin
799d9d5ed7
updated examples to use new APIs
2024-11-06 09:49:06 -06:00
Yiltan Hassan Temucin
9aa9aea7e6
removed external access to non-team based reduce
2024-11-06 09:46:47 -06:00
Yiltan Hassan Temucin
997eb69b5a
modified team based to_all -> reduce
2024-11-06 09:46:43 -06:00
Avinash Kethineedi
75ece02048
Merge pull request #46 from avinashkethineedi/active_set_APIs
...
Remove device-side active-set-based APIs
2024-11-05 18:43:40 -06:00
avinashkethineedi
b2b0d559cb
Merge branch 'ROCm:develop' into active_set_APIs
2024-11-05 23:02:44 +00:00
Yiltan
b141f354cf
Merge pull request #47 from Yiltan/revert-pr36-coopgroups
...
Remove Cooperative Groups (Partially Revert #PR36)
2024-11-04 09:35:15 -05:00
Yiltan Hassan Temucin
fe767d9abf
remove cooperative groups
2024-10-30 20:10:21 +00:00
avinashkethineedi
68c893d790
Add example code demonstrating team-based broadcast and alltoall API usage
...
* Update all_reduce test to keep the naming convention uniform across the examples
2024-10-30 19:09:17 +00:00
avinashkethineedi
5975b8c621
Update broadcast function to use stride calculations instead of log_stride
2024-10-29 19:10:05 +00:00
avinashkethineedi
e1ff06913c
Remove device-side active-set-based broadcast API interface from rocSHMEM
2024-10-29 19:04:49 +00:00
avinashkethineedi
9a524046fe
Remove active-set-based broadcast test from the functional tests suite
2024-10-29 16:18:46 +00:00
avinashkethineedi
abec29bd6a
Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays
...
* Change log_stride calcualtions to stride calculations
* Update all_reduce example code to use team based interface
2024-10-28 22:10:18 +00:00
avinashkethineedi
c22048112e
Remove the device-side active-set-based reduction API interface from rocSHMEM
2024-10-28 21:35:14 +00:00
avinashkethineedi
e9484bbb86
Remove active-set-based reduction test from the functional tests suite
2024-10-28 21:22:46 +00:00
Yiltan
9885f984f6
Merge pull request #44 from ROCm/fix-printing
...
Clean up functional tests output
2024-10-28 15:45:28 -04:00
Yiltan
794b888d69
Merge pull request #43 from ROCm/LWPRHMEM-75-API-differences-bug-fix
...
Lwprhmem 75 api differences bug fix
2024-10-28 15:45:15 -04:00
Yiltan Temucin
9576ff6440
Cleaned up how we print the output
2024-10-28 13:37:33 -05:00
Edgar Gabriel
7ae0a54550
Merge pull request #32 from edgargabriel/topic/to_all_first_version
...
ipc/to_all: add direct allreduce algorithm
2024-10-24 15:59:24 -05:00
Yiltan Temucin
98afb41263
API bug fix in IB conduit
2024-10-24 11:52:03 -05:00
Yiltan Temucin
e210020e9b
API change bug fix
2024-10-24 11:52:03 -05:00
Edgar Gabriel
11df5427a6
add ascii art for ring allredude
2024-10-24 15:08:32 +00:00
Edgar Gabriel
a4b4281f50
fix odd-case allreduce scenarios
...
if the number of elements to be used in the allreduce operation is not
exact multiple of the work-array buffer size and number of pe's, we need
to adjust the algorithm to:
- initially perform a ring_allreduce on n_segments * chunk_size (which
is the integer division of the number of elements and the work-buffer
size, i.e. will not cover the entire buffer)
- perform another ring_allreduce where chunk_size is reduced to match
the remaining elements
- if the remaining elements from the previous step cannot evenly be
divded by the number of pe's, we need to perform a direct_allreduce on
the outstanding number of elements.
2024-10-24 15:08:32 +00:00
Edgar Gabriel
87db7f7d38
fix barrier synchronization on gfx90a
2024-10-24 15:08:28 +00:00
Edgar Gabriel
a0ac7b2d60
add some example code
...
first examples include a getmem testcase and an allreduce (to_all)
example.
2024-10-24 15:07:17 +00:00
Edgar Gabriel
1fbb89bc73
ipc: add ring_allreduce algorithms
...
add the ring allreduce algorithm to the ipc conduit in order to be able
to execute slightly largers reductions.
2024-10-24 15:07:17 +00:00
Edgar Gabriel
ba21cb7b85
ipc/to_all: add direct allreduce algorithm
...
add a simple version of an allreduce algorithm as a starting point.
2024-10-24 15:07:14 +00:00
Brandon Potter
416dffa129
Merge pull request #34 from BKP/ipc_parameterized_simple_tests_10-01-24
...
IPC Parameterized Simple Tests
2024-10-24 08:23:26 -05:00
Avinash Kethineedi
8a16968cf2
Merge pull request #41 from avinashkethineedi/collective_routine_buffers
...
Fine grained memory buffers for work/sync arrays
2024-10-23 23:33:48 -05:00
Avinash Kethineedi
e594a9cb85
Merge pull request #42 from avinashkethineedi/fix_quiet/fence
...
Fix quiet and fence of default context
2024-10-22 12:56:43 -05:00
avinashkethineedi
d5ea5868e3
Fix quiet and fence of default context
...
* Update tinfo of default context
2024-10-22 16:18:05 +00:00
avinashkethineedi
6685d0ab60
Add fine grained memory buffers for work/sync arrays
...
* Add interanl put_mem/get_mem{_wave, _wg} functions to read/write to work/sync arrays
* Add condition check to ensure all MPI processes are on the same compute node for IPC conduit
2024-10-21 15:28:39 +00:00
Yiltan
b922bdcf4c
Merge pull request #39 from Yiltan/LWPRHMEM-75-API-differences
...
LWPRHMEM-75 API Differences
2024-10-18 15:27:34 -04:00
Avinash Kethineedi
892b4e2436
Merge pull request #40 from avinashkethineedi/functional_tests/puts_gets
...
Functional tests {wave, wg} puts and gets
2024-10-17 17:23:05 -05:00
avinashkethineedi
18a1bdd0ac
Use C++ iota function to reset buffers and use its values for verification
...
* Update functional test script to include new tests
2024-10-15 20:23:25 +00:00