نمودار کامیت

15 کامیت‌ها

مولف SHA1 پیام تاریخ
Edgar Gabriel d0c2845031 add support for GPUs using wavefront size of 32 (#285)
* add gfx1100 support

Add support for Radeon 7900 GPUs (RX and PRO), and 7800 PRO.

I was contemplating to add gfx1101 and gfx1102 GPUs as well, but those are the lower end models that are more unlikely to be used for compute intensive jobs. In addition, I do not have access to them to test the support.

* update WF_SIZe for different options

Radeon systems use a WarpSize of 32, unlike current Instinct systems,
which use a warp size of 64. For the device side, a gfx specific ifdef
is sufficient. For the host side, we need to query the device
properties.

* adjust functional tests to wf_size of 32

* update unit tests to handle wf_size of 32

* address reviewer comments
2025-10-22 16:04:58 -05:00
Edgar Gabriel a1269e3db5 allow all three backends to co-exist in a single build (#270)
* add support for compiling all backends

also include the logic to select backends either based on user requests
or through some heuristics

* checkpoint for compiling all backends

* final checkpoint

all tests seem to pass when compiling all three backends simultaneasly
and forcing to use any of the three Backends.

* update PR to new envvar system
2025-10-07 10:49:20 -05:00
Omri Mor a0fcbf8d35 Unify environment variable management (#235)
* Add environment variable configuration infrastructure
  - Namespace rocshmem::envvar
  - Track all config env vars in per-category lists
  - Remove duplicates from list of allowed env var types
  - Reject negative inputs for unsigned integer types
  - Accept empty strings for std::string
  - Print error source location using C++20 std::source_location
  - Unit tests
* Port environment variables
  - ROCSHMEM_UNIQUEID_WITH_MPI
  - ROCSHMEM_RO_DISABLE_IPC
  - ROCSHMEM_BOOTSTRAP_TIMEOUT
  - ROCSHMEM_BOOTSTRAP_HOSTID
  - ROCSHMEM_BOOTSTRAP_SOCKET_IFNAME
  - ROCSHMEM_RO_PROGRESS_DELAY
  - ROCSHMEM_BOOTSTRAP_SOCKET_FAMILY
  - ROCSHMEM_MAX_NUM_CONTEXTS
    + Merge the independent per-backend copies into a single variable
      that is used by all three backends (IPC, RO, GDA).
    + Set default to 32 (for GDA); prior default for IPC and RO was 1024.
  - ROCSHMEM_MAX_NUM_HOST_CONTEXTS
  - ROCSHMEM_MAX_WF_BUFFERS
  - ROCSHMEM_SQ_SIZE
  - ROCSHMEM_RO_NET_CPU_QUEUE
    + Renamed from RO_NET_CPU_QUEUE
    + Change env var input type to bool, default to false
    + Invert code logic: setting RO_NET_CPU_QUEUE to anything
      would /disable/ a variable gpu_queue, which defaulted to true.
      Variable is now named config::ro::net_cpu_queue,
      with all prior checks for gpu_queue inverted.
  - ROCSHMEM_USE_IB_HCA
  - ROCSHMEM_HEAP_SIZE
    + Defaults to 1L << 30 i.e. 1 GiB,
      from default heap size in memory/heap_memory.hpp.
  - ROCSHMEM_MAX_NUM_TEAMS
    + Unlike other env vars, this can be referenced from devices.
    + Function currently narrows from size_t to int: uses need to be audited
      for safety and correctness in using size_t directly.
  - ROCSHMEM_GDA_ALTERNATE_QP_PORTS
* New env var ROCSHMEM_DEBUG
  - Debug levels:
    + NONE
    + VERSION
    + WARN
    + INFO
    + TRACE
  - Currently unused - will be added later
  - Mirrors RCCL debug control
* Remove rocshmem::rocshmem_env_config
* Change interface for GetClosestNicToGpu
  to accept const char** instead of char**:
  the pointed-to string does not need to be modified
  - Files were not audited for inclusion of util.hpp only for env vars
---------
Signed-off-by: Omri Mor <Omri.Mor@amd.com>
2025-10-06 10:05:57 -07:00
Aurelien Bouteiller 801d2c5012 Enable GDA+IPC (#249)
* Enable GDA+IPC
Fix ROCSHMEM_DISABLE_IPC for both RO and GDA

* add more functionality to bootstrap class

we need a few more functions in the boostrap class to be able to fully
handle the rocshmem requirements:
 - add a function to return the list of local ranks
 - provide a groupAllgather operation which takes a vector of ranks
   participating
 - provide a groupAlltoall operation which takes a vector of ranks
   participating

Also, update the functionality of the gda-Alltoall and gda-Allreduce
operations to take advantage of these functions.

* ipc_policy adapted to use bootstrap groupallgather

* bugfix: there was a mistake in computing sendto in groupallgather

* bugfix: shm_size and shm_rank were set in a local variable rather than
the class member

* mpi-bootstrap: remove an unecessary allgather

---------

Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
2025-09-16 11:54:53 -04:00
Aurelien Bouteiller 63a79892b2 rocshmem_config.h has a different include path when installed and built-dir (#186)
* rocshmem_config.h needs to be in a similar directory structure for
includes to work when building testers in build, and from an installed
library

* Do not change installed rocshmem.hpp
2025-07-02 16:51:38 -04:00
Edgar Gabriel 6ea5edc951 Introduce support for executing the IPC conduit without MPI (#153)
* relax MPI dependency from code

This commit (series) removes the strict dependency on MPI in code base.
rocSHMEM will still be compiled with MPI, but the goal is to make the
code work even if MPI_Init_thread has not been invoked, at least for
certain, well-defined scenarios. Hence, the goal is not remove any
mentioning of MPI from rocSHMEM, but to ensure correct execution of the
ipc conduit even if the library has been initialized using other means.

Details:
 - add non-MPI version of remote_heap and WindowInfo classes
 - host interfaces work on WindowInfoMPI, they will not work with the
   non-MPI code path. Since it is unclear whether we plan to support the
   host interfaces at all, this is probably not a major limitation.

* update symmetric_heap structures and backend

* first cut on initialization

and enabling non-MPI initialization of the IPCBackend

* add non-MPI hostInterface methods

at the moment, only barrier_all and sync_all are explicitely supported.

* add non-mpi version of ipc_policy

and a number of smaller fixes required in other files.
A small init/finalize test already passes now with the branch.

* add non-mpi team_split_strided code

* minor fixes for non-MPI use-case

* disable symmetric-heap-window-ionfo test

disable this test for now just to make the compilation pass. Will have
to rework it.

* make no-mpi great again

after rebasing on top of the MPI singleton changes.

* enable running functional tests with uuid init

to run the functional tests using rocshmem_init_attr and the uuid
mechanism requires
a) a PMIx installation on the system
b) setting the environment variable ROCSHMEM_TEST_UUID=1

* fix multi-team creation bug

fix a bug occuring when creating many teams, which was the result of
incorrectly applying two indices in our own implementation of Allreduce.

* make unit tests pass again

* reverse offload was impacted by code change

fix the RO conduit to cope wioth the non-MPI path introduced for the IPC
conduit.

* update to cmake logic to find pmix

* Update src/memory/window_info.hpp

Co-authored-by: Yiltan <ytemucin@amd.com>

* Update CMakeLists.txt

Co-authored-by: Yiltan <ytemucin@amd.com>

* document ROCSHMEM_UNIQUEID_NO_MPI

* rename env. variable to UNIQUEID_WITH_MPI

* update host.cpp to use USE_HDP_FLUSH macro

instead of the deprecated USE_COHERENT_HEAP.

* add note for running example with RO conduit

add a note clarifying that running init_attr_test from the example
directory requires setting an additional environment variable with the
RO conduit.

* Find PMIx in more cases, only apply pmix build options to the test that
needs it, if OMPI_COMM_WORLD_LOCA_RANK is not setenv, abort

---------

Co-authored-by: Yiltan <ytemucin@amd.com>
Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>
2025-06-21 13:23:11 -05:00
Edgar Gabriel db74307195 unify env variables and use DPRINTF (#89)
* unify handling of env variables

create a class containing all (most?) environment variables used by rocshmem and an object that is instatiated
before library_init, since some of the environment variables need to be
set before we start the bootstraping process.

This allows us to remove two files from the bootstrap directory.

* replace INFO and TRACE macros with DPRINTF

to be more consistent with the rest of the rocSHMEM code
2025-04-29 06:05:25 -05:00
Avinash Kethineedi f6ef19f5a9 Add SPDX license identifiers and update copyright headers (#85)
* Update copyright information and add SPDX license identifier

* Update AUTHORS

* Remove `sos_tests`
2025-04-15 15:37:53 -05:00
Brandon Potter 0fd628458c Cleanup unused code in repository (#75)
* Remove unused forward_list

* Remove unused __read_clock function

* Replace wallClk code with hip function

* Remove unused unit test for ipc

* Remove slab heap

* Remove unused EBO spinlock
2025-04-10 14:47:24 -05:00
Edgar Gabriel 12561783de Performance tuning for inter-node communication (#66)
This PR addresses two issues:
 - reduce the number of contexts supported by the host-interface by
   default to 1, we are not using those at the moment, and hence
   we now create fewer MPI_Win at the startup
 - introduces a micro-sleep in RO progress engine in case there are no
   pending requests. This leads significant performance improvements
   observed for inter-node communication with THor2 NICs.
2025-03-26 21:09:26 -05:00
Yiltan b7f3839f27 Updated IPC detection logic (#51)
* Added environment variable to enable/disable IPC at runtime

* Fixed IPC detection logic allow for difference process mappings

* Updated README.md
2025-03-17 11:36:11 -04:00
avinashkethineedi 6486e29078 Rename config.h to roc_shmem_config.h 2024-12-06 01:08:13 +00:00
Brandon Potter 862ef5713f Move inline asm into separate file 2024-07-30 14:53:19 -05:00
Brandon Potter 73303ca2d2 Move inline assembly into arch defines blocks 2024-07-30 12:56:32 -05:00
Brandon Potter ea8f264a11 Transfer files from RAD repository 2024-07-01 09:57:08 -05:00