39 Commits

Author SHA1 Message Date
Yiltan 55aab4d62e [Docs] Clarify ROCSHMEM_HEAP_SIZE (#392)
* clarify ROCSHMEM_HEAP_SIZE

* Apply suggestions from code review

Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

---------

Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: 0496586829]
2026-01-20 17:22:18 -05:00
yugang-amd bcd9119dbc Bump rocm-docs-core to 1.31.2 (#387)
[ROCm/rocshmem commit: 491739c9b4]
2026-01-15 13:17:51 -05:00
dependabot[bot] 12d9d45667 Bump urllib3 from 2.6.0 to 2.6.3 in /docs/sphinx (#383)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.6.0 to 2.6.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.3)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocshmem commit: f9fc022ed5]
2026-01-09 08:27:43 -05:00
dependabot[bot] 645236aadd Bump pynacl from 1.5.0 to 1.6.2 in /docs/sphinx (#379)
Bumps [pynacl](https://github.com/pyca/pynacl) from 1.5.0 to 1.6.2.
- [Changelog](https://github.com/pyca/pynacl/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/pynacl/compare/1.5.0...1.6.2)

---
updated-dependencies:
- dependency-name: pynacl
  dependency-version: 1.6.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocshmem commit: fb644ddfa9]
2026-01-07 14:39:00 -05:00
dependabot[bot] 750d3f8b2e Bump urllib3 from 2.5.0 to 2.6.0 in /docs/sphinx (#365)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocshmem commit: 166a591216]
2025-12-19 09:55:42 -05:00
yugang-amd 195fe4e5ee GDA docs style edits (#362)
* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/sphinx/_toc.yml.in

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Apply suggestions from code review

Co-authored-by: yugang-amd <yugang.wang@amd.com>

---------

Co-authored-by: Yiltan <ytemucin@amd.com>

[ROCm/rocshmem commit: bbad1d8539]
2025-12-10 17:03:58 -05:00
Yiltan 258d264ecc Add default context alltoall API (#350)
[ROCm/rocshmem commit: fddbe7b15d]
2025-12-10 11:43:15 -05:00
Anatolii Rozanov f98c72d627 Add host API for *_on_stream operations (#340)
* Add functional test for barrier_all_on_stream

* Add rocshmem_barrier_all_on_stream support for GDA and RO backends

Implements rocshmem_barrier_all_on_stream operation for
GPU Direct Access and Reverse Offload backends.

Previously, rocshmem_barrier_all_on_stream was only supported for IPC backend.

* Add functional test for rocshmem_broadcastmem_on_stream

* Add host-side rocshmem_broadcastmem_on_stream API

Implement stream-based broadcast collective operation

- Add rocshmem_broadcastmem_on_stream host API and kernel implementation
- Add functional test TeamBroadcastmemOnStreamTester with multi-stream
  support and correctness verification
- Use per-workgroup contexts to avoid contention across parallel streams

API:
rocshmem_broadcastmem_on_stream(team, dest, source, nelems, pe_root, stream)

* Add functional test for rocshmem_getmem_on_stream

* Add host-side rocshmem_getmem_on_stream API

Implement stream-based point-to-point RMA get operation

- Add rocshmem_getmem_on_stream host API and kernel implementation
- Support for asynchronous getmem operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group collective getmem for efficient memory transfer

API:
rocshmem_getmem_on_stream(dest, source, nelems, pe, stream)

(AI Assist)

* Add host-side rocshmem_putmem_on_stream API

- Add rocshmem_putmem_on_stream for asynchronous remote writes
- Support for concurrent RMA operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group device collective operation

API:
rocshmem_putmem_on_stream(dest, source, bytes, pe, stream)

(AI Assist)

* Add functional test for rocshmem_putmem_on_stream

* Add host-side rocshmem_putmem_signal_on_stream API

Enables asynchronous putmem operations with signaling on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_putmem_signal_kernel
- Host interface putmem_signal_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Public API

Function signature:
void rocshmem_putmem_signal_on_stream(void *dest, const void *source,
                                      size_t bytes, uint64_t *sig_addr,
                                      uint64_t signal, int sig_op,
                                      int pe, hipStream_t stream);

* Add functional test for rocshmem_putmem_signal_on_stream

* Add host-side rocshmem_signal_wait_until_on_stream API

Enables asynchronous signal wait operations on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_signal_wait_until_kernel
- Host interface signal_wait_until_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Native uint64_t support in wait_until API (generated from P2P_SYNC.py)

Function signature:
void rocshmem_signal_wait_until_on_stream(uint64_t *sig_addr, int cmp,
                                          uint64_t cmp_value,
                                          hipStream_t stream);

(AI Assist)

* Add functional test for rocshmem_signal_wait_until_on_stream

* Add documentation for stream API functions

This commit adds API documentation for the following host-side
stream functions:

- rocshmem_barrier_all_on_stream (collective routines)
- rocshmem_broadcastmem_on_stream (collective routines)
- rocshmem_getmem_on_stream (RMA operations)
- rocshmem_putmem_on_stream (RMA operations)
- rocshmem_putmem_signal_on_stream (signaling operations)
- rocshmem_signal_wait_until_on_stream (point-to-point sync)

The documentation includes function signatures, parameter descriptions,
and detailed explanations of asynchronous behavior and stream handling.

(AI Assist)

* Rename "bytes" -> "nelems"

* Add "_TEST_" to the variables used in tests

* Remove incorrect hipStreamDefault usage

hipStreamDefault is not a default stream. This is a flag.

If stream == nullptr, then just pass it to kernel. It will launch the kernel on the default stream

[ROCm/rocshmem commit: d0c8380650]
2025-12-09 08:55:46 -06:00
Yiltan 9b77387067 Fix docs rendering issue (#349)
[ROCm/rocshmem commit: d5bcb3a201]
2025-12-08 15:54:06 -05:00
Anatolii Rozanov 4b04b540bf Add host API for alltoallmem_on_stream collective operation (#333)
* Add host-side rocshmem_alltoallmem_on_stream function

Function signature:
  rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void *dest,
                                 const void *source, size_t size,
                                 hipStream_t stream)

- The function launches rocshmem_alltoallmem_kernel which calls
device-side alltoall<char> workgroup collective through default context.
- Uses dynamic block size determination via occupancy API.
- Implemented for all backends.

* Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends

When allocating memory for alltoall_pSync_pool in setup_teams() and
teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE
instead of ROCSHMEM_ALLTOALL_SYNC_SIZE.

* Add functional test for team_alltoallmem_on_stream

This commit adds a new functional test to verify the correctness of
the host-side rocshmem_team_alltoallmem_on_stream API.

* Add documentation for rocshmem_alltoallmem_on_stream

This commit adds API documentation for the host-side
rocshmem_alltoallmem_on_stream function in the collective routines
section. The documentation includes:

[ROCm/rocshmem commit: 5577feb70d]
2025-12-03 08:40:24 -05:00
Yiltan 0f32739b52 Updated important missing enviroment variables (#344)
[ROCm/rocshmem commit: 8b350a51fe]
2025-12-02 11:40:30 -05:00
Adel Johar 2c243feb1b [Docs] Move environment variables to separate page (#341)
[ROCm/rocshmem commit: ba77bdd9a6]
2025-12-01 14:25:27 -05:00
Yiltan 2079193495 Update docs for GDA (#337)
[ROCm/rocshmem commit: 5606fdafd6]
2025-12-01 09:38:11 -05:00
Yiltan f9caef6908 Add rocshmem_int64_p (#335)
[ROCm/rocshmem commit: d9e2890222]
2025-11-26 10:31:23 -05:00
Edgar Gabriel db4c6293cc add relaxed_ordering option (#324)
* add relaxed_ordering option

add an environment variable that allows to control setting the
IBV_ACCESS_RELAXED_ORDERING flag when registering memory with the
ibv_reg_mr* functions.

* missed a spot

[ROCm/rocshmem commit: 2ae2033648]
2025-11-20 08:20:25 -06:00
Avinash Kethineedi b771a26916 Add ROCSHMEM_CTX_INVALID for invalid context handling (#287)
* Add `ROCSHMEM_CTX_INVALID` for invalid context handling
  - Define `ROCSHMEM_CTX_INVALID` as {nullptr, nullptr}
  - Add == and != operators to rocshmem_ctx_t
  - Use `ROCSHMEM_CTX_INVALID` on failed context creation
  - Skip ctx destroy if context is invalid

* Update docs for context create and destroy APIs usage and behavior

[ROCm/rocshmem commit: 955c22aeed]
2025-10-22 12:00:56 -05:00
Yiltan 92a7904656 Implement rocshmem_pe_quiet() (#282)
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: c3eeae473b]
2025-10-20 11:42:39 -04:00
Yiltan c269577b89 Updated docs for ROCm 7.x.x (#239)
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Co-authored-by: yugang-amd <yugang.wang@amd.com>

[ROCm/rocshmem commit: 9338c84480]
2025-10-17 12:10:37 -04:00
Aurelien Bouteiller 225746b0f0 Make ROCSHMEM_DISABLE_MIXED_IPC a synonym for ROCSHMEM_RO_DISABLE_IPC, ROCSHMEM_DISABLE_IPC (#273)
* Make ROCSHMEM_DISABLE_IPC a synonym for ROCSHMEM_RO_DISABLE_IPC

* Introduce ROCSHMEM_DISABLE_MIXED_IPC and deprecate old variants

[ROCm/rocshmem commit: db8e5f1086]
2025-10-09 19:57:53 -04:00
Aurelien Bouteiller 8837414042 Cleanup/wg init (#260)
* remove wg_init and wg_finalize from functional tests

* Remove wg_init and wg_finalize from examples

* deprecate wg_init/finalize

* Updated docs

* Typo in documentation

---------

Co-authored-by: Yiltan <yiltan@amd.com>

[ROCm/rocshmem commit: 6e7277b544]
2025-10-07 14:34:18 -04:00
yugang-amd ac13b22edc remove dead link (#271)
[ROCm/rocshmem commit: 2bf1f889ad]
2025-10-06 11:07:52 -04:00
Yiltan 2b75fe7bf9 Improve qp mapping (#259)
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: 7ebf03fe2f]
2025-09-25 10:24:59 -04:00
yugang-amd 526784233b remove broken link etc. (#234)
[ROCm/rocshmem commit: 4a760d741a]
2025-09-10 09:48:28 -04:00
yugang-amd e0fb92c2e2 Update descriptions about hardware support (#236)
[ROCm/rocshmem commit: b3e2e72f29]
2025-09-08 13:26:05 -04:00
akolliasAMD 1d7206d0d7 Added ability to build for local gpu by env Variable (#204)
* Added the ability to compile for Local gpu by environment variable

* adding gfx950 on default only on rocm 7.0 and above

* Updated docs

* removed xnack+ on specific gfx targets

---------

Co-authored-by: Yiltan Hassan Temucin <yiltan.temucin@amd.com>

[ROCm/rocshmem commit: be630d9b93]
2025-08-11 12:35:50 -06:00
Aurelien Bouteiller 93cf1b680e Documentation for RO (#189)
* Update documentation to include RO and how to use it

* Clarify supported configuration

Co-authored-by: yugang-amd <yugang.wang@amd.com>


[ROCm/rocshmem commit: 42e28835ad]
2025-07-10 18:49:10 -04:00
dependabot[bot] 86ab9a8f89 Bump urllib3 from 2.4.0 to 2.5.0 in /docs/sphinx (#170)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.4.0 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.4.0...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocshmem commit: 47bd7ec0d8]
2025-06-25 11:08:42 -04:00
dependabot[bot] a33dbbee03 Bump requests from 2.32.3 to 2.32.4 in /docs/sphinx (#169)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocshmem commit: 49f7f1bab1]
2025-06-25 11:08:28 -04:00
Edgar Gabriel e167f50803 Introduce support for executing the IPC conduit without MPI (#153)
* relax MPI dependency from code

This commit (series) removes the strict dependency on MPI in code base.
rocSHMEM will still be compiled with MPI, but the goal is to make the
code work even if MPI_Init_thread has not been invoked, at least for
certain, well-defined scenarios. Hence, the goal is not remove any
mentioning of MPI from rocSHMEM, but to ensure correct execution of the
ipc conduit even if the library has been initialized using other means.

Details:
 - add non-MPI version of remote_heap and WindowInfo classes
 - host interfaces work on WindowInfoMPI, they will not work with the
   non-MPI code path. Since it is unclear whether we plan to support the
   host interfaces at all, this is probably not a major limitation.

* update symmetric_heap structures and backend

* first cut on initialization

and enabling non-MPI initialization of the IPCBackend

* add non-MPI hostInterface methods

at the moment, only barrier_all and sync_all are explicitely supported.

* add non-mpi version of ipc_policy

and a number of smaller fixes required in other files.
A small init/finalize test already passes now with the branch.

* add non-mpi team_split_strided code

* minor fixes for non-MPI use-case

* disable symmetric-heap-window-ionfo test

disable this test for now just to make the compilation pass. Will have
to rework it.

* make no-mpi great again

after rebasing on top of the MPI singleton changes.

* enable running functional tests with uuid init

to run the functional tests using rocshmem_init_attr and the uuid
mechanism requires
a) a PMIx installation on the system
b) setting the environment variable ROCSHMEM_TEST_UUID=1

* fix multi-team creation bug

fix a bug occuring when creating many teams, which was the result of
incorrectly applying two indices in our own implementation of Allreduce.

* make unit tests pass again

* reverse offload was impacted by code change

fix the RO conduit to cope wioth the non-MPI path introduced for the IPC
conduit.

* update to cmake logic to find pmix

* Update src/memory/window_info.hpp

Co-authored-by: Yiltan <ytemucin@amd.com>

* Update CMakeLists.txt

Co-authored-by: Yiltan <ytemucin@amd.com>

* document ROCSHMEM_UNIQUEID_NO_MPI

* rename env. variable to UNIQUEID_WITH_MPI

* update host.cpp to use USE_HDP_FLUSH macro

instead of the deprecated USE_COHERENT_HEAP.

* add note for running example with RO conduit

add a note clarifying that running init_attr_test from the example
directory requires setting an additional environment variable with the
RO conduit.

* Find PMIx in more cases, only apply pmix build options to the test that
needs it, if OMPI_COMM_WORLD_LOCA_RANK is not setenv, abort

---------

Co-authored-by: Yiltan <ytemucin@amd.com>
Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>

[ROCm/rocshmem commit: 6ea5edc951]
2025-06-21 13:23:11 -05:00
Avinash Kethineedi 14756a73b1 Refactor Barrier_all and Sync_all APIs to use default context (#159)
* Refactor `Barrier_all` and `Sync_all` to use default context

- Removed context-specific implementations of barrier_all and sync_all
- Added barrier_all and sync_all to the default context implementation
- Updated functional tests to use the default context for barrier_all and sync_all

* Update `Barrier_all` and `Sync_all` API usage in documentation

* Update `CHANGELOG`

---------

Co-authored-by: Yiltan <ytemucin@amd.com>

[ROCm/rocshmem commit: bf48bcabf2]
2025-06-17 11:16:18 -05:00
dependabot[bot] 5727670930 Bump tornado from 6.4.2 to 6.5.1 in /docs/sphinx (#143)
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.4.2 to 6.5.1.
- [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst)
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.4.2...v6.5.1)

---
updated-dependencies:
- dependency-name: tornado
  dependency-version: 6.5.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocshmem commit: e0c9ee45a7]
2025-05-27 11:10:07 -04:00
yugang-amd 8fbb892cc1 Final edits (#126)
* final edits

* more edits per review

* more edits

* attempt to fix dead link

[ROCm/rocshmem commit: 8a266e698c]
2025-05-21 16:59:00 -04:00
Yiltan 6a7644e467 Updated ROCm-docs to match the current status of the repository (#117)
* Updated docs to match the current status of the repository

Co-authored-by: yugang-amd <yugang.wang@amd.com>

[ROCm/rocshmem commit: f43e3cf4fa]
2025-05-16 09:26:59 -04:00
yugang-amd 17cde51fb7 Style edits (#122)
[ROCm/rocshmem commit: 67bff9ca30]
2025-05-13 16:26:28 -04:00
alexxu-amd 2f82ed9bf0 move requirements.txt from docs/ to docs/sphinx/ (#118)
[ROCm/rocshmem commit: 9088383dab]
2025-05-08 15:37:58 -04:00
Yiltan 1667e63e30 Initial ROCm-docs (#92)
* Initial ROCm-docs commit

Co-authored-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
Co-authored-by: Alex Xu <alex.xu@amd.com>
Co-authored-by: yugang-amd <yugang.wang@amd.com>

[ROCm/rocshmem commit: f693c98fb2]
2025-05-08 13:39:28 -04:00
Yiltan Temucin 3164874941 Use ROCm-CMake
[ROCm/rocshmem commit: b60a460681]
2024-12-06 15:49:41 -06:00
Brandon Potter 913ce47ef1 Use new naming scheme
[ROCm/rocshmem commit: fd8dbc7fb6]
2024-11-25 14:25:29 -06:00
Brandon Potter ad4ab69c19 Transfer files from RAD repository
[ROCm/rocshmem commit: ea8f264a11]
2024-07-01 09:57:08 -05:00