Grafico dei commit

237 Commit

Autore SHA1 Messaggio Data
Aurelien Bouteiller 19e7b4798e Show and log what the functional test driver is running (#70)
Show and log what the functional test driver is running
* Log errors in the log file
* list all failed tests at the end
* pretty colors :x
* Print stderr when the test has failed

---------

Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: 67bc5b9e5a]
2025-04-23 10:21:35 -04:00
Edgar Gabriel 32f11bd5e5 use correct id when accessing ipc-bases (#88)
we need to use the position of that processes in the local ipc-bases
array, not the global rank.

[ROCm/rocshmem commit: e3b0353fa9]
2025-04-17 10:11:32 -05:00
Aurelien Bouteiller ccf6833c6f bugfix: do not dereference ctx during create_ctx if we did run out (#83)
[ROCm/rocshmem commit: 9befbe8293]
2025-04-16 10:37:44 -04:00
Avinash Kethineedi c4de6833f6 Add SPDX license identifiers and update copyright headers (#85)
* Update copyright information and add SPDX license identifier

* Update AUTHORS

* Remove `sos_tests`

[ROCm/rocshmem commit: f6ef19f5a9]
2025-04-15 15:37:53 -05:00
Yiltan ea2df2aa26 Added sphinx dependencies (#84)
[ROCm/rocshmem commit: 5ee0c3407e]
2025-04-15 11:28:16 -04:00
Aurelien Bouteiller b7613a38c1 Remove dev-mono-linear (#81)
* Remove dev_mono_linear (followup to removal of slab_heap)

* cleanup: use CHECK_HIP rather than ad-hoc error checking

---------

Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: a1a0560ca3]
2025-04-15 09:57:59 -04:00
Edgar Gabriel bac7769483 Revamp the uniqueId code to support subgroups of processes (#80)
* add code for bootstrapping

the bootstrapping code has been extracted from the MSCCLPP library,
which in parts is based on the code from NVIDIA. The code has been
modified to match the specific requirements of the rocSHMEM library.

* add code to use the new uniqueId bootstrapping

* adjust init_attr example

extend the rocshmem_init_attr example to use two disjoint groups
of processe, in order to trigger the new code path.

* add env variable for bootstrap timeout

* Update examples/rocshmem_init_attr_test.cc

Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>

* Update src/rocshmem.cpp

Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>

---------

Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>

[ROCm/rocshmem commit: b5830a623b]
2025-04-14 12:02:09 -05:00
Avinash Kethineedi be35f3ef93 Update backend to use provided MPI communicator during library initialization (#79)
* Update backend to use provided MPI communicator during library initialization, default to `MPI_COMM_WORLD`

* Update `rocshmem_my_pe` and `rocshmem_n_pes` host APIs
   - Return values from backend if initialized; otherwise, fallback to MPI_Singleton.

[ROCm/rocshmem commit: 05755847f5]
2025-04-14 09:18:57 -05:00
Brandon Potter a3a211a677 Cleanup unused code in repository (#75)
* Remove unused forward_list

* Remove unused __read_clock function

* Replace wallClk code with hip function

* Remove unused unit test for ipc

* Remove slab heap

* Remove unused EBO spinlock

[ROCm/rocshmem commit: 0fd628458c]
2025-04-10 14:47:24 -05:00
Avinash Kethineedi 41d5d739e2 Update collective APIs naming (#77)
* Update the naming convention for collective APIs to ensure consistency across the interface.

* Move all collective API declarations to rocshmem_COLL.hpp

* The following APIs were updated as part of this change:
  - `barrier`
  - `barrier_all`
  - `sync`
  - `sync_all`
  - `all_to_all`
  - `broadcast`
  - `fcollect`
  - `all_reduce`

* Update header file generation code for collective APIs

[ROCm/rocshmem commit: 68421895d6]
2025-04-10 12:14:47 -05:00
Edgar Gabriel 5b22ddd1ff add new flag to build instructions (#78)
This flag is required to link a pytorch use-case correctly.
It doesn't seem to impact the rocSHMEM code.

[ROCm/rocshmem commit: 5e49567b6c]
2025-04-10 08:39:54 -05:00
Yiltan 4d6fd799ef Enable RO CI (#65)
[ROCm/rocshmem commit: 25e7109b64]
2025-04-08 16:12:22 -04:00
Avinash Kethineedi 9bd2b04899 Update Barrier and Sync APIs (#73)
* Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context
 - Implemented `barrier` APIs for thread, wavefront, and workgroup scopes
 - Added support into both IPC and RO conduits
 - Added functional tests to cover all `barrier` APIs
 - Removed collective operations on default context

* Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits.
  - Implemented `sync` APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync` APIs

* update naming convention for context-based `barrier` APIs

[ROCm/rocshmem commit: dc61bca066]
2025-04-08 11:25:31 -05:00
Avinash Kethineedi 426bbf525b Update Barrier_All and Sync_All APIs (#72)
* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
  - Added separate pSync buffers for each device context
  - Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts

* Update barrier_all functional tests for multi-context support

* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
  - Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
  - Added support in both IPC and RO conduits
  - Updated functional tests to cover all `barrier_all` APIs

* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
  - Implemented sync_all APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync_all` APIs

[ROCm/rocshmem commit: c652f58cef]
2025-04-02 11:58:55 -05:00
Yiltan 0cde5f53dc Update GTEST version (#68)
[ROCm/rocshmem commit: e16ca7a1e3]
2025-03-31 08:58:30 -04:00
Edgar Gabriel 2ab585ce8d add uniqueID initialization (#69)
add the interfaces required to support rocshmem initialization
through the uniqueID mechanism. At the moment this still maps to
MPI initialization underneath the hood, but adding the functions might
simplify the porting of some applications to rocshmem. In addition, if
we need to transition away from MPI one day, this is also one step into
this direction.

[ROCm/rocshmem commit: e9f6227d75]
2025-03-28 16:34:00 -05:00
Edgar Gabriel c7297b1d6b Performance tuning for inter-node communication (#66)
This PR addresses two issues:
 - reduce the number of contexts supported by the host-interface by
   default to 1, we are not using those at the moment, and hence
   we now create fewer MPI_Win at the startup
 - introduces a micro-sleep in RO progress engine in case there are no
   pending requests. This leads significant performance improvements
   observed for inter-node communication with THor2 NICs.

[ROCm/rocshmem commit: 12561783de]
2025-03-26 21:09:26 -05:00
Edgar Gabriel 44b81efdab Update CODEOWNERS (#67)
[ROCm/rocshmem commit: 607c6bd044]
2025-03-26 12:35:39 -05:00
Avinash Kethineedi 7a4d1ac064 Implement default RO context (#64)
* Allocate default context buffers and initialize queue for management

- Allocated the status flag, g return, and atomic return buffers for
  the default context.
- Initialized `AtomicWFQueueProxy` instances to manage these buffers
  efficiently for concurrent access.

* Update `BlockHandle` with default context buffers

* Add default context flag and update buffer retrieval functions

- Added a flag to distinguish the default context from other contexts.
- Modified return buffer functionns and `get_status_flag` function to accommodate
  the default context

* Add default context primitive tests

-  get, put, get_nbi, put_nbi, g, and p APIs.

[ROCm/rocshmem commit: 867519e1d0]
2025-03-25 18:51:54 -05:00
Edgar Gabriel 7aecbdec4d update README documentation for RO (#63)
* README: update documentation for RO support

update the README and the install_dependencies script to match the
requirements of the RO conduit.

* add CODEOWNERS file

[ROCm/rocshmem commit: 4e48c9748e]
2025-03-25 07:50:15 -05:00
Avinash Kethineedi 370e2dda09 Add AtomicWFQueue implementation and tests (#62)
* feat: Add AtomicWFQueue implementation
  - Implemented wavefront-safe atomic FIFO queue ensuring first-come, first-serve order
  - Added efficient synchronization using atomics
  - Enhanced `dequeue` to wait until an element is available

* test: Add GTest for AtomicWFQueue
  - Implemented unit tests for AtomicWFQueue using GoogleTest framework
  - Added tests for `enqueue`, `dequeue`, and edge cases
  - Ensured synchronization behavior and correctness under concurrent conditions

* Add assert in `enqueue` and update atomics
  - Added an assert in the `enqueue` function to ensure it fails if the queue is full

[ROCm/rocshmem commit: b84b5638cf]
2025-03-25 00:45:19 -05:00
Avinash Kethineedi baca5fd7a1 Fix/RO Backend Hang Issue (#53)
* Update HIP version check for compatibility with versions >= 5.5

* Update memory allocator for context BlockHandle
   - Replaced `HIPAllocator` with `HIPDefaultFinegrainedAllocator` for context `BlockHandle`.

* Update run commands for `rocshmem_g` and `rocshmem_p` functional tests

[ROCm/rocshmem commit: c16b0d6952]
2025-03-24 22:54:07 -05:00
Edgar Gabriel 1ee9b72449 add rocshmem_barrier() (#61)
* add team-barrier implementation

add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.

* add team_barrier_tests to functional tests

[ROCm/rocshmem commit: bcbc42e78f]
2025-03-24 11:23:03 -05:00
Aurelien Bouteiller 413346525b Update README (#60)
Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: e8ba20c5f5]
2025-03-24 10:19:12 -04:00
Yiltan 1ed4512106 Removed GPU_IB (#59)
[ROCm/rocshmem commit: 658bf2a3b5]
2025-03-24 09:04:52 -04:00
Avinash Kethineedi 6f78b4300f Remove support code for GFX940 and GFX941 targets (#55)
[ROCm/rocshmem commit: 1210b6419f]
2025-03-21 14:31:49 -05:00
Edgar Gabriel 033253fbdf RO/collectives: add linear algorithms using RPut/Rget (#58)
* RO/collectives: add linear algorithms using RPut/Rget

- make broadcast, alltoall and fcollect use a simple linear algorithm
  using MPI_RPut/Rget, but without blocking in the execution
- remove the to_all interfaces, since they have been deprecated.
- remove the active-set interfaces, since they have been removed from
  rocSHMEM

* avoid notification after barrier

Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>

* disable allocation of ata_buffer

a temporary buffer of 128MB was allocated when creating a team. In
previous versions of the code, that buffer was used by some collective
operations. This is not the case for now. Therefore, do not allocate the
buffer for now. I am not removing the element itself from teh
structure, since we might need it in future versions again.

---------

Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>

[ROCm/rocshmem commit: 908bd5bda3]
2025-03-21 12:49:39 -05:00
Yiltan 1380f43156 ROCm 6.4.0rc3 bug fix (#56)
[ROCm/rocshmem commit: 68a1646399]
2025-03-19 15:37:58 -04:00
Yiltan 6d6dccfebe Sync Reverse Offload Scripts (#52)
* Sync Reverse Offload scripts
- Disable IPC unit tests when IPC is not available in the rocSHMEM configuration

* Added missing ptr in ipc_policy

[ROCm/rocshmem commit: 3428957de9]
2025-03-19 14:31:07 -04:00
Yiltan c4b81768df Bug fix for PR43 (#54)
[ROCm/rocshmem commit: 7d9e82fb34]
2025-03-19 09:39:07 -04:00
Avinash Kethineedi e16bb62767 Update RMA functional tests (#50)
* Update primitive tests for multi-workgroup support

* Update workgroup primitive tests for multi-workgroup support

* Update workfront primitive tests for multi-workgroup support

* Update team based primitive tests for multi-workgroup support

* Update RMA functional tests to capture timing after quiet call
   - Modified RMA functional tests to record the time after a `quiet` call in thread, wavefront, and workgroup RMA calls.

* Improve error handling and memory management
   - Replaced `cout` with `cerr` for improved error reporting.
   - Ensured all allocated memory is freed when `rocshmem_malloc` fails.

* Update start time in primitive tests and latency calculations
   - Modified primitive tests to capture the earliest start time.
   - Updated latency calculations in functional tests.

* Remove `GetSwarmTester`

* Update start time in team primitive tests

* Invoke quiet call from a single thread within a block on a rocshmem context

[ROCm/rocshmem commit: aa3121a967]
2025-03-18 14:39:57 -05:00
Yiltan 9b187a2e44 Updated IPC detection logic (#51)
* Added environment variable to enable/disable IPC at runtime

* Fixed IPC detection logic allow for difference process mappings

* Updated README.md

[ROCm/rocshmem commit: b7f3839f27]
2025-03-17 11:36:11 -04:00
Avinash Kethineedi 7f3879ff31 Refactor RO backend data structures (#49)
- Remove hdp and ipc pointers from BlockHandle, align RO stats with RO contexts

- Add run commands for `rocshmem_g` and `rocshmem_p` API tests in driver.sh

- Allocate rocshmem API return buffers based on number of device contexts.

- Associate status flag address with blocking calls and remove threadId dependency
   - Associated the status flag address with each blocking call request to notify the GPU thread.
   - Removed dependency on threadId for determining the appropriate status flag index.

- Move status flag buffer allocation to backend.

- Initialize allocated memeory to zero

[ROCm/rocshmem commit: df4ad2c04d]
2025-03-14 10:49:44 -05:00
Yiltan a16492cdf9 Added option to build only tests and link to an external rocshmem library (#43)
* Rearrange CMakefile

* Enable linking to external rocshmem library

* Minor fix for the functional test driver

* ROCSHMEM_HOME detection fixed

[ROCm/rocshmem commit: 96424a59a8]
2025-03-13 15:49:50 -04:00
Avinash Kethineedi 1831a1b33c Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations (#48)
* Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations

- Modified the Device proxy class to determine memory allocation size at runtime.
- Updated all classes that include the Device proxy to use dynamic memory allocation.
- Removed compile-time memory size calculations.
- Ensured the allocated number of backend queue data structures matches the number of RO device contexts.

[ROCm/rocshmem commit: eb5a38e806]
2025-02-24 15:11:46 -06:00
Yiltan 95c4c0d428 Fix ROCm 6.4 warnings (#47)
* Removed __AMDGCN_WAVEFRONT_SIZE

* Added unit test to validate WF_SIZE

[ROCm/rocshmem commit: 487e5b7d0f]
2025-02-24 13:34:13 -05:00
Avinash Kethineedi 65b4ff4c41 Add multi work-group support for collective functional tests (#45)
- Added multi-work group support for the All-to-all, Fcollect, Broadcast, Barrier and Sync collective functional tests
- Renamed All-to-all and Fcollect tests to TeamAlltoAll and TeamFcollect

[ROCm/rocshmem commit: 57d60aa727]
2025-02-19 10:31:53 -06:00
Yiltan e1ed36e58f Sync develop with amd-mainline (#46)
* Update install_dependencies.sh

* Updated to ROCm repos

* Merge pull request #37 from ROCm/depBuild

locked specific version on ompi and ucx

* locked specific version on ompi and ucx

* [IPC] Fix ROCSHMEM_SIGNAL_ADD

* Generate CMake Package Configuration Files

---------

Co-authored-by: akolliasAMD <99202231+akolliasAMD@users.noreply.github.com>
Co-authored-by: akolliasAMD <akollias@amd.com>

[ROCm/rocshmem commit: 785e31aa48]
2025-02-18 12:30:34 -05:00
avinashkethineedi 6c70aee32e Remove rocshmem_timer function
[ROCm/rocshmem commit: 21dbd5cc5e]
2025-02-17 17:10:51 +00:00
avinashkethineedi dba989733f Update bandwidth and latency calculations
- Refined bandwidth and latency calculations for improved accuracy


[ROCm/rocshmem commit: c155636da4]
2025-02-17 06:18:46 +00:00
avinashkethineedi 539e991b2a RO Backend: Add support for char, signed char and unsigned char
[ROCm/rocshmem commit: 540cd4b918]
2025-02-12 20:10:03 +00:00
Yiltan 1f3881fa21 Merge pull request #38 from Yiltan/ro/implement-sigops
Implements Signalling Operations for RO

[ROCm/rocshmem commit: 495cd6970b]
2025-02-10 15:10:07 -05:00
Yiltan 87e049f9c9 Merge pull request #34 from Yiltan/sigops-test-fix
Fix Signalling Operations Functional Test

[ROCm/rocshmem commit: 94144f4460]
2025-02-10 14:56:45 -05:00
Yiltan f64c76b31c Merge pull request #39 from Yiltan/ro/fix-teamreduce
Fix Team reduction intra-node

[ROCm/rocshmem commit: 944444cf12]
2025-02-10 14:56:27 -05:00
Yiltan Hassan Temucin 76981d6374 Fix Team reduction intra-node
[ROCm/rocshmem commit: 022b2c27e7]
2025-02-07 08:39:35 -06:00
Avinash Kethineedi 4d919faf55 Merge pull request #36 from avinashkethineedi/fix/rocshmem-ctx-wg-team-sync
Fix `rocshmem_ctx_wg_team_sync` API

[ROCm/rocshmem commit: d97e5ba2c8]
2025-02-06 13:41:16 -06:00
Yiltan Hassan Temucin e50460af83 Use the precalculated num_warps variable
[ROCm/rocshmem commit: b83ff2fa84]
2025-02-06 13:21:25 -06:00
Yiltan Hassan Temucin c4f2ccd48f [RO] implemented signaling operations
[ROCm/rocshmem commit: f1c25f7e19]
2025-02-06 10:17:32 -06:00
Yiltan Hassan Temucin 90b8f191d6 [RO] added MPI_UNSIGNED_LONG as type
[ROCm/rocshmem commit: 21171deeb8]
2025-02-06 10:17:32 -06:00
Yiltan Hassan Temucin 257610bdc5 Validate signal after put signal operations
[ROCm/rocshmem commit: 8d74c7b73e]
2025-02-06 08:17:22 -06:00