* Implement `rocshmem_ptr` in IPC conduit
* tests: add functional test for `rocshmem_ptr`
- Add safety check for pointer access and condition check before printing results for `rocshmem_ptr` test
- Use `rocshmem_put` to store `rocshmem_ptr` availability for data validation
* Add host APIs for querying device ctx and remote heap pointer
* Host API to query device pointer for ROCSHMEM_DEFAULT_CONTEXT,
this is needed to support dynamic module initialization via device kernel
library bitcode.
* Host API to query remote symmetric heap pointer that can be used in
custom device kernel for RMA operations.
* Added rocshmem_ptr implementation within the Host Context class
* Enables pointer retrieval functionality for symmetric data objects
* Copy IPC pointers to host memory in RO host context
---------
Co-authored-by: avinashkethineedi <avinash.kethineedi@amd.com>
* rocshmem_config.h needs to be in a similar directory structure for
includes to work when building testers in build, and from an installed
library
* Do not change installed rocshmem.hpp
* relax MPI dependency from code
This commit (series) removes the strict dependency on MPI in code base.
rocSHMEM will still be compiled with MPI, but the goal is to make the
code work even if MPI_Init_thread has not been invoked, at least for
certain, well-defined scenarios. Hence, the goal is not remove any
mentioning of MPI from rocSHMEM, but to ensure correct execution of the
ipc conduit even if the library has been initialized using other means.
Details:
- add non-MPI version of remote_heap and WindowInfo classes
- host interfaces work on WindowInfoMPI, they will not work with the
non-MPI code path. Since it is unclear whether we plan to support the
host interfaces at all, this is probably not a major limitation.
* update symmetric_heap structures and backend
* first cut on initialization
and enabling non-MPI initialization of the IPCBackend
* add non-MPI hostInterface methods
at the moment, only barrier_all and sync_all are explicitely supported.
* add non-mpi version of ipc_policy
and a number of smaller fixes required in other files.
A small init/finalize test already passes now with the branch.
* add non-mpi team_split_strided code
* minor fixes for non-MPI use-case
* disable symmetric-heap-window-ionfo test
disable this test for now just to make the compilation pass. Will have
to rework it.
* make no-mpi great again
after rebasing on top of the MPI singleton changes.
* enable running functional tests with uuid init
to run the functional tests using rocshmem_init_attr and the uuid
mechanism requires
a) a PMIx installation on the system
b) setting the environment variable ROCSHMEM_TEST_UUID=1
* fix multi-team creation bug
fix a bug occuring when creating many teams, which was the result of
incorrectly applying two indices in our own implementation of Allreduce.
* make unit tests pass again
* reverse offload was impacted by code change
fix the RO conduit to cope wioth the non-MPI path introduced for the IPC
conduit.
* update to cmake logic to find pmix
* Update src/memory/window_info.hpp
Co-authored-by: Yiltan <ytemucin@amd.com>
* Update CMakeLists.txt
Co-authored-by: Yiltan <ytemucin@amd.com>
* document ROCSHMEM_UNIQUEID_NO_MPI
* rename env. variable to UNIQUEID_WITH_MPI
* update host.cpp to use USE_HDP_FLUSH macro
instead of the deprecated USE_COHERENT_HEAP.
* add note for running example with RO conduit
add a note clarifying that running init_attr_test from the example
directory requires setting an additional environment variable with the
RO conduit.
* Find PMIx in more cases, only apply pmix build options to the test that
needs it, if OMPI_COMM_WORLD_LOCA_RANK is not setenv, abort
---------
Co-authored-by: Yiltan <ytemucin@amd.com>
Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>
* Rework cmakery:
* detect rocm/hip/rocshmem better, make sure that ROCM_PATH and
ROCM_ROOT don't conflict and are taken by default
* add /opt/rocm as a fallback when nothing else found
* obtain hipcc in a sanitized way (ensure we use the same logic we
use to later find_package hip)
* factorize redundancies
* export GPU_TARGETS as part of the cmake target for librocshmem,
this helps with a clean error when an application tries to link
with the wrong offload-target flag (rather than a cryptic link error)
* phased out ROCSHMEM_HOME, in favor of rocshmem_ROOT (the cmake
blessed way)
* Remove references to ROCSHMEM_HOME, we prefer ROCSHMEM_ROOT
* Pick CMAKE_PREFIX_PATH method for consistent finding hip/rocm
* Undo this pr using LANGUAGE HIP, maybe later
* Use only rocmcmakebuildtools as recommended from 6.4 onward
* Refactor `Barrier_all` and `Sync_all` to use default context
- Removed context-specific implementations of barrier_all and sync_all
- Added barrier_all and sync_all to the default context implementation
- Updated functional tests to use the default context for barrier_all and sync_all
* Update `Barrier_all` and `Sync_all` API usage in documentation
* Update `CHANGELOG`
---------
Co-authored-by: Yiltan <ytemucin@amd.com>
* Revert "SWDEV-536571 - Include assert header. (#157)"
This reverts commit bcc14b1a34.
* Fix use of assert/abort and required includes
* Disable IPC AMO testers for non-implemented functions
* Use FineGrained allocator for heap by default, consolidate all types of
allocators under saner cmake controls
Co-authored-by: Yiltan <ytemucin@amd.com>
* Uncached may not be only for debug
Need to include the rocshmem config otherwise produce an inconsistent
build with different allocators used in different files
* Undo this pr adding presumably useless hip_host_allocator_noncoherent
* Rename HEAP_IS_COHERENT/USE_COHERENT_HEAP to USE_HDP_FLUSH as the former
was misleading
* Remove unused __roc_inv()
---------
Co-authored-by: Yiltan <ytemucin@amd.com>
* Use a single printf per line (reduce chances of lines being cut in logs)
* team_comm can be an int or a pointer depending on MPI impl.
Received is confusing (since we are on the origin), use submitted
instead
* Print arguments to calls when using DEBUG
---------
Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>
* Add dlmalloc_strat allocator strategy
- Use mspace variant to ease encapsulation
- Make pow2bins and dlmalloc cmake selectable
* Add unit tester for dlmalloc, rework single_heap, pow2bins unit testers
accordingly
- add dlmalloc get_used/get_avail, and have all strats allocators also have a get_used
- Rework memallocator unit tests: bin size is per strat, alignment is verified in singleheap
* bugfix: dlmalloc exposed that the pingpong test would write past end of
allocation with -w 32
* iostream leakage/mixed usage of cerr and fprintf(stderr
---------
Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
* use correct MPI initialization method
rocSHMEM requires that the MPI library is initialized using
THREAD_MULTIPLE support. Lets use that function therefore in our
examples.
* Update examples/rocshmem_init_attr_test.cc
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
---------
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
* unify handling of env variables
create a class containing all (most?) environment variables used by rocshmem and an object that is instatiated
before library_init, since some of the environment variables need to be
set before we start the bootstraping process.
This allows us to remove two files from the bootstrap directory.
* replace INFO and TRACE macros with DPRINTF
to be more consistent with the rest of the rocSHMEM code
Show and log what the functional test driver is running
* Log errors in the log file
* list all failed tests at the end
* pretty colors :x
* Print stderr when the test has failed
---------
Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>