Граф коммитов

259 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel 7aecbdec4d update README documentation for RO (#63)
* README: update documentation for RO support

update the README and the install_dependencies script to match the
requirements of the RO conduit.

* add CODEOWNERS file

[ROCm/rocshmem commit: 4e48c9748e]
2025-03-25 07:50:15 -05:00
Avinash Kethineedi 370e2dda09 Add AtomicWFQueue implementation and tests (#62)
* feat: Add AtomicWFQueue implementation
  - Implemented wavefront-safe atomic FIFO queue ensuring first-come, first-serve order
  - Added efficient synchronization using atomics
  - Enhanced `dequeue` to wait until an element is available

* test: Add GTest for AtomicWFQueue
  - Implemented unit tests for AtomicWFQueue using GoogleTest framework
  - Added tests for `enqueue`, `dequeue`, and edge cases
  - Ensured synchronization behavior and correctness under concurrent conditions

* Add assert in `enqueue` and update atomics
  - Added an assert in the `enqueue` function to ensure it fails if the queue is full

[ROCm/rocshmem commit: b84b5638cf]
2025-03-25 00:45:19 -05:00
Avinash Kethineedi baca5fd7a1 Fix/RO Backend Hang Issue (#53)
* Update HIP version check for compatibility with versions >= 5.5

* Update memory allocator for context BlockHandle
   - Replaced `HIPAllocator` with `HIPDefaultFinegrainedAllocator` for context `BlockHandle`.

* Update run commands for `rocshmem_g` and `rocshmem_p` functional tests

[ROCm/rocshmem commit: c16b0d6952]
2025-03-24 22:54:07 -05:00
Edgar Gabriel 1ee9b72449 add rocshmem_barrier() (#61)
* add team-barrier implementation

add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.

* add team_barrier_tests to functional tests

[ROCm/rocshmem commit: bcbc42e78f]
2025-03-24 11:23:03 -05:00
Aurelien Bouteiller 413346525b Update README (#60)
Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: e8ba20c5f5]
2025-03-24 10:19:12 -04:00
Yiltan 1ed4512106 Removed GPU_IB (#59)
[ROCm/rocshmem commit: 658bf2a3b5]
2025-03-24 09:04:52 -04:00
Avinash Kethineedi 6f78b4300f Remove support code for GFX940 and GFX941 targets (#55)
[ROCm/rocshmem commit: 1210b6419f]
2025-03-21 14:31:49 -05:00
Edgar Gabriel 033253fbdf RO/collectives: add linear algorithms using RPut/Rget (#58)
* RO/collectives: add linear algorithms using RPut/Rget

- make broadcast, alltoall and fcollect use a simple linear algorithm
  using MPI_RPut/Rget, but without blocking in the execution
- remove the to_all interfaces, since they have been deprecated.
- remove the active-set interfaces, since they have been removed from
  rocSHMEM

* avoid notification after barrier

Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>

* disable allocation of ata_buffer

a temporary buffer of 128MB was allocated when creating a team. In
previous versions of the code, that buffer was used by some collective
operations. This is not the case for now. Therefore, do not allocate the
buffer for now. I am not removing the element itself from teh
structure, since we might need it in future versions again.

---------

Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>

[ROCm/rocshmem commit: 908bd5bda3]
2025-03-21 12:49:39 -05:00
Yiltan 1380f43156 ROCm 6.4.0rc3 bug fix (#56)
[ROCm/rocshmem commit: 68a1646399]
2025-03-19 15:37:58 -04:00
Yiltan 6d6dccfebe Sync Reverse Offload Scripts (#52)
* Sync Reverse Offload scripts
- Disable IPC unit tests when IPC is not available in the rocSHMEM configuration

* Added missing ptr in ipc_policy

[ROCm/rocshmem commit: 3428957de9]
2025-03-19 14:31:07 -04:00
Yiltan c4b81768df Bug fix for PR43 (#54)
[ROCm/rocshmem commit: 7d9e82fb34]
2025-03-19 09:39:07 -04:00
Avinash Kethineedi e16bb62767 Update RMA functional tests (#50)
* Update primitive tests for multi-workgroup support

* Update workgroup primitive tests for multi-workgroup support

* Update workfront primitive tests for multi-workgroup support

* Update team based primitive tests for multi-workgroup support

* Update RMA functional tests to capture timing after quiet call
   - Modified RMA functional tests to record the time after a `quiet` call in thread, wavefront, and workgroup RMA calls.

* Improve error handling and memory management
   - Replaced `cout` with `cerr` for improved error reporting.
   - Ensured all allocated memory is freed when `rocshmem_malloc` fails.

* Update start time in primitive tests and latency calculations
   - Modified primitive tests to capture the earliest start time.
   - Updated latency calculations in functional tests.

* Remove `GetSwarmTester`

* Update start time in team primitive tests

* Invoke quiet call from a single thread within a block on a rocshmem context

[ROCm/rocshmem commit: aa3121a967]
2025-03-18 14:39:57 -05:00
Yiltan 9b187a2e44 Updated IPC detection logic (#51)
* Added environment variable to enable/disable IPC at runtime

* Fixed IPC detection logic allow for difference process mappings

* Updated README.md

[ROCm/rocshmem commit: b7f3839f27]
2025-03-17 11:36:11 -04:00
Avinash Kethineedi 7f3879ff31 Refactor RO backend data structures (#49)
- Remove hdp and ipc pointers from BlockHandle, align RO stats with RO contexts

- Add run commands for `rocshmem_g` and `rocshmem_p` API tests in driver.sh

- Allocate rocshmem API return buffers based on number of device contexts.

- Associate status flag address with blocking calls and remove threadId dependency
   - Associated the status flag address with each blocking call request to notify the GPU thread.
   - Removed dependency on threadId for determining the appropriate status flag index.

- Move status flag buffer allocation to backend.

- Initialize allocated memeory to zero

[ROCm/rocshmem commit: df4ad2c04d]
2025-03-14 10:49:44 -05:00
Yiltan a16492cdf9 Added option to build only tests and link to an external rocshmem library (#43)
* Rearrange CMakefile

* Enable linking to external rocshmem library

* Minor fix for the functional test driver

* ROCSHMEM_HOME detection fixed

[ROCm/rocshmem commit: 96424a59a8]
2025-03-13 15:49:50 -04:00
Avinash Kethineedi 1831a1b33c Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations (#48)
* Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations

- Modified the Device proxy class to determine memory allocation size at runtime.
- Updated all classes that include the Device proxy to use dynamic memory allocation.
- Removed compile-time memory size calculations.
- Ensured the allocated number of backend queue data structures matches the number of RO device contexts.

[ROCm/rocshmem commit: eb5a38e806]
2025-02-24 15:11:46 -06:00
Yiltan 95c4c0d428 Fix ROCm 6.4 warnings (#47)
* Removed __AMDGCN_WAVEFRONT_SIZE

* Added unit test to validate WF_SIZE

[ROCm/rocshmem commit: 487e5b7d0f]
2025-02-24 13:34:13 -05:00
Avinash Kethineedi 65b4ff4c41 Add multi work-group support for collective functional tests (#45)
- Added multi-work group support for the All-to-all, Fcollect, Broadcast, Barrier and Sync collective functional tests
- Renamed All-to-all and Fcollect tests to TeamAlltoAll and TeamFcollect

[ROCm/rocshmem commit: 57d60aa727]
2025-02-19 10:31:53 -06:00
Yiltan e1ed36e58f Sync develop with amd-mainline (#46)
* Update install_dependencies.sh

* Updated to ROCm repos

* Merge pull request #37 from ROCm/depBuild

locked specific version on ompi and ucx

* locked specific version on ompi and ucx

* [IPC] Fix ROCSHMEM_SIGNAL_ADD

* Generate CMake Package Configuration Files

---------

Co-authored-by: akolliasAMD <99202231+akolliasAMD@users.noreply.github.com>
Co-authored-by: akolliasAMD <akollias@amd.com>

[ROCm/rocshmem commit: 785e31aa48]
2025-02-18 12:30:34 -05:00
Avinash Kethineedi f2990df7f2 Merge pull request #44 from avinashkethineedi/fix/time_calculations
Update bandwidth and latency calculations

[ROCm/rocshmem commit: f8701a44fa]
2025-02-17 12:48:33 -06:00
avinashkethineedi 6c70aee32e Remove rocshmem_timer function
[ROCm/rocshmem commit: 21dbd5cc5e]
2025-02-17 17:10:51 +00:00
avinashkethineedi dba989733f Update bandwidth and latency calculations
- Refined bandwidth and latency calculations for improved accuracy


[ROCm/rocshmem commit: c155636da4]
2025-02-17 06:18:46 +00:00
Avinash Kethineedi 04889cb71c Merge pull request #40 from avinashkethineedi/RO_data_types
RO Backend: Add support for char, signed char and unsigned char

[ROCm/rocshmem commit: 40bd8a38a0]
2025-02-12 14:40:05 -06:00
avinashkethineedi 539e991b2a RO Backend: Add support for char, signed char and unsigned char
[ROCm/rocshmem commit: 540cd4b918]
2025-02-12 20:10:03 +00:00
Yiltan 1f3881fa21 Merge pull request #38 from Yiltan/ro/implement-sigops
Implements Signalling Operations for RO

[ROCm/rocshmem commit: 495cd6970b]
2025-02-10 15:10:07 -05:00
Yiltan 87e049f9c9 Merge pull request #34 from Yiltan/sigops-test-fix
Fix Signalling Operations Functional Test

[ROCm/rocshmem commit: 94144f4460]
2025-02-10 14:56:45 -05:00
Yiltan f64c76b31c Merge pull request #39 from Yiltan/ro/fix-teamreduce
Fix Team reduction intra-node

[ROCm/rocshmem commit: 944444cf12]
2025-02-10 14:56:27 -05:00
Yiltan Hassan Temucin 76981d6374 Fix Team reduction intra-node
[ROCm/rocshmem commit: 022b2c27e7]
2025-02-07 08:39:35 -06:00
Avinash Kethineedi 4d919faf55 Merge pull request #36 from avinashkethineedi/fix/rocshmem-ctx-wg-team-sync
Fix `rocshmem_ctx_wg_team_sync` API

[ROCm/rocshmem commit: d97e5ba2c8]
2025-02-06 13:41:16 -06:00
Avinash Kethineedi 614d5c7c81 Merge pull request #35 from avinashkethineedi/fix/team-context-pe-numbering
Fix `rocshmem_ctx_my_pe` and `rocshmem_ctx_n_pes` APIs

[ROCm/rocshmem commit: 5861346a8e]
2025-02-06 13:39:28 -06:00
Yiltan Hassan Temucin e50460af83 Use the precalculated num_warps variable
[ROCm/rocshmem commit: b83ff2fa84]
2025-02-06 13:21:25 -06:00
Yiltan Hassan Temucin c4f2ccd48f [RO] implemented signaling operations
[ROCm/rocshmem commit: f1c25f7e19]
2025-02-06 10:17:32 -06:00
Yiltan Hassan Temucin 90b8f191d6 [RO] added MPI_UNSIGNED_LONG as type
[ROCm/rocshmem commit: 21171deeb8]
2025-02-06 10:17:32 -06:00
Yiltan Hassan Temucin 257610bdc5 Validate signal after put signal operations
[ROCm/rocshmem commit: 8d74c7b73e]
2025-02-06 08:17:22 -06:00
avinashkethineedi fca7471d67 Fix rocshmem_ctx_wg_team_sync API
- Updated `rocshmem_ctx_wg_team_sync` to utilize a team-specific memory buffer for synchronization


[ROCm/rocshmem commit: c5b548c398]
2025-02-05 19:09:07 +00:00
avinashkethineedi 71af1b366d Fix rocshmem_ctx_my_pe and rocshmem_ctx_n_pes APIs to return PE numbering and size relative to the team in a team-specific context.
[ROCm/rocshmem commit: e311400d15]
2025-02-05 03:41:40 +00:00
Yiltan Hassan Temucin 9317172fab Fix sigops functional test
- Ensure quiet is called on the correct context


[ROCm/rocshmem commit: bae1641311]
2025-02-04 13:30:31 -08:00
Avinash Kethineedi 2214d21491 Merge pull request #31 from avinashkethineedi/rocshmem_g
Implement `rocshmem_g` API and optimize memory usage

[ROCm/rocshmem commit: 248972b30b]
2025-02-04 11:15:41 -06:00
Yiltan f967be4f54 Merge pull request #32 from Yiltan/ipc/sigop-bug
[IPC] Fix ROCSHMEM_SIGNAL_ADD

[ROCm/rocshmem commit: 2d9d09ea01]
2025-02-03 16:48:05 -05:00
Yiltan Hassan Temucin ffdce76fe4 [IPC] Fix ROCSHMEM_SIGNAL_ADD
[ROCm/rocshmem commit: fd3eaa3f69]
2025-02-03 09:59:28 -08:00
avinashkethineedi 5af3fdeacb Implement rocshmem_g API and optimize memory usage
- Implement `rocshmem_g` API
- Free up memory space allocated for `rocshmem_g` and atomic operations' return values


[ROCm/rocshmem commit: 757d7e53ca]
2025-02-02 05:56:46 +00:00
Yiltan 86c3c5ff39 Merge pull request #29 from Yiltan/multi-node
Updated RO builds script and functional test driver

[ROCm/rocshmem commit: 7e5b533904]
2025-01-27 14:44:22 -05:00
Yiltan c39e737eef Merge pull request #28 from Yiltan/cmakefiles-create
Generate CMake Package Configuration Files

[ROCm/rocshmem commit: f851411ac5]
2025-01-24 10:50:27 -05:00
Yiltan 3a071f1d69 Merge pull request #25 from mawad-amd/muhaawad/build_examples_option
Add `BUILD_EXAMPLES` CMake option

[ROCm/rocshmem commit: a458ea2ef4]
2025-01-24 10:50:10 -05:00
Yiltan Hassan Temucin adf66d04f4 Updated RO builds script and functional test driver for multi-node support
[ROCm/rocshmem commit: 3a8b0d4647]
2025-01-23 16:46:19 -06:00
Yiltan Hassan Temucin 90f2bf1ec8 Generate CMake Package Configuration Files
[ROCm/rocshmem commit: 00824385ba]
2025-01-22 11:24:41 -06:00
Yiltan b159f0ecb4 Merge pull request #27 from ROCm/package-bug-fix
Minor fixes for packaging

[ROCm/rocshmem commit: bacced0cc3]
2025-01-21 09:13:29 -05:00
Yiltan fc8007bec6 Minor fixes for packaging
[ROCm/rocshmem commit: fa90f4b0ac]
2025-01-20 18:15:07 +00:00
Yiltan 498f76f72c Merge pull request #24 from Yiltan/install-script
Added script to install dependencies

[ROCm/rocshmem commit: 469e2a0167]
2025-01-20 11:02:47 -05:00
Muhammad Awad 6688cf8fd6 Add BUILD_EXAMPLES CMake option
Signed-off-by: Muhammad Awad <MuhammadAbdelghaffar.Awad@amd.com>


[ROCm/rocshmem commit: 7a6b3261a3]
2025-01-18 15:26:20 -06:00