커밋 그래프

256 커밋

작성자 SHA1 메시지 날짜
Edgar Gabriel bcbc42e78f add rocshmem_barrier() (#61)
* add team-barrier implementation

add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.

* add team_barrier_tests to functional tests
2025-03-24 11:23:03 -05:00
Aurelien Bouteiller e8ba20c5f5 Update README (#60)
Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
2025-03-24 10:19:12 -04:00
Yiltan 658bf2a3b5 Removed GPU_IB (#59) 2025-03-24 09:04:52 -04:00
Avinash Kethineedi 1210b6419f Remove support code for GFX940 and GFX941 targets (#55) 2025-03-21 14:31:49 -05:00
Edgar Gabriel 908bd5bda3 RO/collectives: add linear algorithms using RPut/Rget (#58)
* RO/collectives: add linear algorithms using RPut/Rget

- make broadcast, alltoall and fcollect use a simple linear algorithm
  using MPI_RPut/Rget, but without blocking in the execution
- remove the to_all interfaces, since they have been deprecated.
- remove the active-set interfaces, since they have been removed from
  rocSHMEM

* avoid notification after barrier

Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>

* disable allocation of ata_buffer

a temporary buffer of 128MB was allocated when creating a team. In
previous versions of the code, that buffer was used by some collective
operations. This is not the case for now. Therefore, do not allocate the
buffer for now. I am not removing the element itself from teh
structure, since we might need it in future versions again.

---------

Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>
2025-03-21 12:49:39 -05:00
Yiltan 68a1646399 ROCm 6.4.0rc3 bug fix (#56) 2025-03-19 15:37:58 -04:00
Yiltan 3428957de9 Sync Reverse Offload Scripts (#52)
* Sync Reverse Offload scripts
- Disable IPC unit tests when IPC is not available in the rocSHMEM configuration

* Added missing ptr in ipc_policy
2025-03-19 14:31:07 -04:00
Yiltan 7d9e82fb34 Bug fix for PR43 (#54) 2025-03-19 09:39:07 -04:00
Avinash Kethineedi aa3121a967 Update RMA functional tests (#50)
* Update primitive tests for multi-workgroup support

* Update workgroup primitive tests for multi-workgroup support

* Update workfront primitive tests for multi-workgroup support

* Update team based primitive tests for multi-workgroup support

* Update RMA functional tests to capture timing after quiet call
   - Modified RMA functional tests to record the time after a `quiet` call in thread, wavefront, and workgroup RMA calls.

* Improve error handling and memory management
   - Replaced `cout` with `cerr` for improved error reporting.
   - Ensured all allocated memory is freed when `rocshmem_malloc` fails.

* Update start time in primitive tests and latency calculations
   - Modified primitive tests to capture the earliest start time.
   - Updated latency calculations in functional tests.

* Remove `GetSwarmTester`

* Update start time in team primitive tests

* Invoke quiet call from a single thread within a block on a rocshmem context
2025-03-18 14:39:57 -05:00
Yiltan b7f3839f27 Updated IPC detection logic (#51)
* Added environment variable to enable/disable IPC at runtime

* Fixed IPC detection logic allow for difference process mappings

* Updated README.md
2025-03-17 11:36:11 -04:00
Avinash Kethineedi df4ad2c04d Refactor RO backend data structures (#49)
- Remove hdp and ipc pointers from BlockHandle, align RO stats with RO contexts

- Add run commands for `rocshmem_g` and `rocshmem_p` API tests in driver.sh

- Allocate rocshmem API return buffers based on number of device contexts.

- Associate status flag address with blocking calls and remove threadId dependency
   - Associated the status flag address with each blocking call request to notify the GPU thread.
   - Removed dependency on threadId for determining the appropriate status flag index.

- Move status flag buffer allocation to backend.

- Initialize allocated memeory to zero
2025-03-14 10:49:44 -05:00
Yiltan 96424a59a8 Added option to build only tests and link to an external rocshmem library (#43)
* Rearrange CMakefile

* Enable linking to external rocshmem library

* Minor fix for the functional test driver

* ROCSHMEM_HOME detection fixed
2025-03-13 15:49:50 -04:00
Avinash Kethineedi eb5a38e806 Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations (#48)
* Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations

- Modified the Device proxy class to determine memory allocation size at runtime.
- Updated all classes that include the Device proxy to use dynamic memory allocation.
- Removed compile-time memory size calculations.
- Ensured the allocated number of backend queue data structures matches the number of RO device contexts.
2025-02-24 15:11:46 -06:00
Yiltan 487e5b7d0f Fix ROCm 6.4 warnings (#47)
* Removed __AMDGCN_WAVEFRONT_SIZE

* Added unit test to validate WF_SIZE
2025-02-24 13:34:13 -05:00
Avinash Kethineedi 57d60aa727 Add multi work-group support for collective functional tests (#45)
- Added multi-work group support for the All-to-all, Fcollect, Broadcast, Barrier and Sync collective functional tests
- Renamed All-to-all and Fcollect tests to TeamAlltoAll and TeamFcollect
2025-02-19 10:31:53 -06:00
Yiltan 785e31aa48 Sync develop with amd-mainline (#46)
* Update install_dependencies.sh

* Updated to ROCm repos

* Merge pull request #37 from ROCm/depBuild

locked specific version on ompi and ucx

* locked specific version on ompi and ucx

* [IPC] Fix ROCSHMEM_SIGNAL_ADD

* Generate CMake Package Configuration Files

---------

Co-authored-by: akolliasAMD <99202231+akolliasAMD@users.noreply.github.com>
Co-authored-by: akolliasAMD <akollias@amd.com>
2025-02-18 12:30:34 -05:00
Avinash Kethineedi f8701a44fa Merge pull request #44 from avinashkethineedi/fix/time_calculations
Update bandwidth and latency calculations
2025-02-17 12:48:33 -06:00
avinashkethineedi 21dbd5cc5e Remove rocshmem_timer function 2025-02-17 17:10:51 +00:00
avinashkethineedi c155636da4 Update bandwidth and latency calculations
- Refined bandwidth and latency calculations for improved accuracy
2025-02-17 06:18:46 +00:00
Avinash Kethineedi 40bd8a38a0 Merge pull request #40 from avinashkethineedi/RO_data_types
RO Backend: Add support for char, signed char and unsigned char
2025-02-12 14:40:05 -06:00
avinashkethineedi 540cd4b918 RO Backend: Add support for char, signed char and unsigned char 2025-02-12 20:10:03 +00:00
Yiltan 495cd6970b Merge pull request #38 from Yiltan/ro/implement-sigops
Implements Signalling Operations for RO
2025-02-10 15:10:07 -05:00
Yiltan 94144f4460 Merge pull request #34 from Yiltan/sigops-test-fix
Fix Signalling Operations Functional Test
2025-02-10 14:56:45 -05:00
Yiltan 944444cf12 Merge pull request #39 from Yiltan/ro/fix-teamreduce
Fix Team reduction intra-node
2025-02-10 14:56:27 -05:00
Yiltan Hassan Temucin 022b2c27e7 Fix Team reduction intra-node 2025-02-07 08:39:35 -06:00
Avinash Kethineedi d97e5ba2c8 Merge pull request #36 from avinashkethineedi/fix/rocshmem-ctx-wg-team-sync
Fix `rocshmem_ctx_wg_team_sync` API
2025-02-06 13:41:16 -06:00
Avinash Kethineedi 5861346a8e Merge pull request #35 from avinashkethineedi/fix/team-context-pe-numbering
Fix `rocshmem_ctx_my_pe` and `rocshmem_ctx_n_pes` APIs
2025-02-06 13:39:28 -06:00
Yiltan Hassan Temucin b83ff2fa84 Use the precalculated num_warps variable 2025-02-06 13:21:25 -06:00
Yiltan Hassan Temucin f1c25f7e19 [RO] implemented signaling operations 2025-02-06 10:17:32 -06:00
Yiltan Hassan Temucin 21171deeb8 [RO] added MPI_UNSIGNED_LONG as type 2025-02-06 10:17:32 -06:00
Yiltan Hassan Temucin 8d74c7b73e Validate signal after put signal operations 2025-02-06 08:17:22 -06:00
avinashkethineedi c5b548c398 Fix rocshmem_ctx_wg_team_sync API
- Updated `rocshmem_ctx_wg_team_sync` to utilize a team-specific memory buffer for synchronization
2025-02-05 19:09:07 +00:00
avinashkethineedi e311400d15 Fix rocshmem_ctx_my_pe and rocshmem_ctx_n_pes APIs to return PE numbering and size relative to the team in a team-specific context. 2025-02-05 03:41:40 +00:00
Yiltan Hassan Temucin bae1641311 Fix sigops functional test
- Ensure quiet is called on the correct context
2025-02-04 13:30:31 -08:00
Avinash Kethineedi 248972b30b Merge pull request #31 from avinashkethineedi/rocshmem_g
Implement `rocshmem_g` API and optimize memory usage
2025-02-04 11:15:41 -06:00
Yiltan 2d9d09ea01 Merge pull request #32 from Yiltan/ipc/sigop-bug
[IPC] Fix ROCSHMEM_SIGNAL_ADD
2025-02-03 16:48:05 -05:00
Yiltan Hassan Temucin fd3eaa3f69 [IPC] Fix ROCSHMEM_SIGNAL_ADD 2025-02-03 09:59:28 -08:00
avinashkethineedi 757d7e53ca Implement rocshmem_g API and optimize memory usage
- Implement `rocshmem_g` API
- Free up memory space allocated for `rocshmem_g` and atomic operations' return values
2025-02-02 05:56:46 +00:00
Yiltan 7e5b533904 Merge pull request #29 from Yiltan/multi-node
Updated RO builds script and functional test driver
2025-01-27 14:44:22 -05:00
Yiltan f851411ac5 Merge pull request #28 from Yiltan/cmakefiles-create
Generate CMake Package Configuration Files
2025-01-24 10:50:27 -05:00
Yiltan a458ea2ef4 Merge pull request #25 from mawad-amd/muhaawad/build_examples_option
Add `BUILD_EXAMPLES` CMake option
2025-01-24 10:50:10 -05:00
Yiltan Hassan Temucin 3a8b0d4647 Updated RO builds script and functional test driver for multi-node support 2025-01-23 16:46:19 -06:00
Yiltan Hassan Temucin 00824385ba Generate CMake Package Configuration Files 2025-01-22 11:24:41 -06:00
Yiltan bacced0cc3 Merge pull request #27 from ROCm/package-bug-fix
Minor fixes for packaging
2025-01-21 09:13:29 -05:00
Yiltan fa90f4b0ac Minor fixes for packaging 2025-01-20 18:15:07 +00:00
Yiltan 469e2a0167 Merge pull request #24 from Yiltan/install-script
Added script to install dependencies
2025-01-20 11:02:47 -05:00
Muhammad Awad 7a6b3261a3 Add BUILD_EXAMPLES CMake option
Signed-off-by: Muhammad Awad <MuhammadAbdelghaffar.Awad@amd.com>
2025-01-18 15:26:20 -06:00
Yiltan 0fb673e186 Update scripts/install_dependencies.sh
Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>
2025-01-16 13:38:08 -05:00
Yiltan Temucin 5de0371bec Added script to install dependencies 2025-01-16 10:06:39 -06:00
Avinash Kethineedi 17b7afdcba Merge pull request #23 from avinashkethineedi/bugfix/memory-usage
Automatic Memory Management with `shared_ptr` for host interface
2025-01-15 02:34:14 +05:30