نمودار کامیت

249 کامیت‌ها

مولف SHA1 پیام تاریخ
Yiltan 7d9e82fb34 Bug fix for PR43 (#54) 2025-03-19 09:39:07 -04:00
Avinash Kethineedi aa3121a967 Update RMA functional tests (#50)
* Update primitive tests for multi-workgroup support

* Update workgroup primitive tests for multi-workgroup support

* Update workfront primitive tests for multi-workgroup support

* Update team based primitive tests for multi-workgroup support

* Update RMA functional tests to capture timing after quiet call
   - Modified RMA functional tests to record the time after a `quiet` call in thread, wavefront, and workgroup RMA calls.

* Improve error handling and memory management
   - Replaced `cout` with `cerr` for improved error reporting.
   - Ensured all allocated memory is freed when `rocshmem_malloc` fails.

* Update start time in primitive tests and latency calculations
   - Modified primitive tests to capture the earliest start time.
   - Updated latency calculations in functional tests.

* Remove `GetSwarmTester`

* Update start time in team primitive tests

* Invoke quiet call from a single thread within a block on a rocshmem context
2025-03-18 14:39:57 -05:00
Yiltan b7f3839f27 Updated IPC detection logic (#51)
* Added environment variable to enable/disable IPC at runtime

* Fixed IPC detection logic allow for difference process mappings

* Updated README.md
2025-03-17 11:36:11 -04:00
Avinash Kethineedi df4ad2c04d Refactor RO backend data structures (#49)
- Remove hdp and ipc pointers from BlockHandle, align RO stats with RO contexts

- Add run commands for `rocshmem_g` and `rocshmem_p` API tests in driver.sh

- Allocate rocshmem API return buffers based on number of device contexts.

- Associate status flag address with blocking calls and remove threadId dependency
   - Associated the status flag address with each blocking call request to notify the GPU thread.
   - Removed dependency on threadId for determining the appropriate status flag index.

- Move status flag buffer allocation to backend.

- Initialize allocated memeory to zero
2025-03-14 10:49:44 -05:00
Yiltan 96424a59a8 Added option to build only tests and link to an external rocshmem library (#43)
* Rearrange CMakefile

* Enable linking to external rocshmem library

* Minor fix for the functional test driver

* ROCSHMEM_HOME detection fixed
2025-03-13 15:49:50 -04:00
Avinash Kethineedi eb5a38e806 Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations (#48)
* Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations

- Modified the Device proxy class to determine memory allocation size at runtime.
- Updated all classes that include the Device proxy to use dynamic memory allocation.
- Removed compile-time memory size calculations.
- Ensured the allocated number of backend queue data structures matches the number of RO device contexts.
2025-02-24 15:11:46 -06:00
Yiltan 487e5b7d0f Fix ROCm 6.4 warnings (#47)
* Removed __AMDGCN_WAVEFRONT_SIZE

* Added unit test to validate WF_SIZE
2025-02-24 13:34:13 -05:00
Avinash Kethineedi 57d60aa727 Add multi work-group support for collective functional tests (#45)
- Added multi-work group support for the All-to-all, Fcollect, Broadcast, Barrier and Sync collective functional tests
- Renamed All-to-all and Fcollect tests to TeamAlltoAll and TeamFcollect
2025-02-19 10:31:53 -06:00
Yiltan 785e31aa48 Sync develop with amd-mainline (#46)
* Update install_dependencies.sh

* Updated to ROCm repos

* Merge pull request #37 from ROCm/depBuild

locked specific version on ompi and ucx

* locked specific version on ompi and ucx

* [IPC] Fix ROCSHMEM_SIGNAL_ADD

* Generate CMake Package Configuration Files

---------

Co-authored-by: akolliasAMD <99202231+akolliasAMD@users.noreply.github.com>
Co-authored-by: akolliasAMD <akollias@amd.com>
2025-02-18 12:30:34 -05:00
Avinash Kethineedi f8701a44fa Merge pull request #44 from avinashkethineedi/fix/time_calculations
Update bandwidth and latency calculations
2025-02-17 12:48:33 -06:00
avinashkethineedi 21dbd5cc5e Remove rocshmem_timer function 2025-02-17 17:10:51 +00:00
avinashkethineedi c155636da4 Update bandwidth and latency calculations
- Refined bandwidth and latency calculations for improved accuracy
2025-02-17 06:18:46 +00:00
Avinash Kethineedi 40bd8a38a0 Merge pull request #40 from avinashkethineedi/RO_data_types
RO Backend: Add support for char, signed char and unsigned char
2025-02-12 14:40:05 -06:00
avinashkethineedi 540cd4b918 RO Backend: Add support for char, signed char and unsigned char 2025-02-12 20:10:03 +00:00
Yiltan 495cd6970b Merge pull request #38 from Yiltan/ro/implement-sigops
Implements Signalling Operations for RO
2025-02-10 15:10:07 -05:00
Yiltan 94144f4460 Merge pull request #34 from Yiltan/sigops-test-fix
Fix Signalling Operations Functional Test
2025-02-10 14:56:45 -05:00
Yiltan 944444cf12 Merge pull request #39 from Yiltan/ro/fix-teamreduce
Fix Team reduction intra-node
2025-02-10 14:56:27 -05:00
Yiltan Hassan Temucin 022b2c27e7 Fix Team reduction intra-node 2025-02-07 08:39:35 -06:00
Avinash Kethineedi d97e5ba2c8 Merge pull request #36 from avinashkethineedi/fix/rocshmem-ctx-wg-team-sync
Fix `rocshmem_ctx_wg_team_sync` API
2025-02-06 13:41:16 -06:00
Avinash Kethineedi 5861346a8e Merge pull request #35 from avinashkethineedi/fix/team-context-pe-numbering
Fix `rocshmem_ctx_my_pe` and `rocshmem_ctx_n_pes` APIs
2025-02-06 13:39:28 -06:00
Yiltan Hassan Temucin b83ff2fa84 Use the precalculated num_warps variable 2025-02-06 13:21:25 -06:00
Yiltan Hassan Temucin f1c25f7e19 [RO] implemented signaling operations 2025-02-06 10:17:32 -06:00
Yiltan Hassan Temucin 21171deeb8 [RO] added MPI_UNSIGNED_LONG as type 2025-02-06 10:17:32 -06:00
Yiltan Hassan Temucin 8d74c7b73e Validate signal after put signal operations 2025-02-06 08:17:22 -06:00
avinashkethineedi c5b548c398 Fix rocshmem_ctx_wg_team_sync API
- Updated `rocshmem_ctx_wg_team_sync` to utilize a team-specific memory buffer for synchronization
2025-02-05 19:09:07 +00:00
avinashkethineedi e311400d15 Fix rocshmem_ctx_my_pe and rocshmem_ctx_n_pes APIs to return PE numbering and size relative to the team in a team-specific context. 2025-02-05 03:41:40 +00:00
Yiltan Hassan Temucin bae1641311 Fix sigops functional test
- Ensure quiet is called on the correct context
2025-02-04 13:30:31 -08:00
Avinash Kethineedi 248972b30b Merge pull request #31 from avinashkethineedi/rocshmem_g
Implement `rocshmem_g` API and optimize memory usage
2025-02-04 11:15:41 -06:00
Yiltan 2d9d09ea01 Merge pull request #32 from Yiltan/ipc/sigop-bug
[IPC] Fix ROCSHMEM_SIGNAL_ADD
2025-02-03 16:48:05 -05:00
Yiltan Hassan Temucin fd3eaa3f69 [IPC] Fix ROCSHMEM_SIGNAL_ADD 2025-02-03 09:59:28 -08:00
avinashkethineedi 757d7e53ca Implement rocshmem_g API and optimize memory usage
- Implement `rocshmem_g` API
- Free up memory space allocated for `rocshmem_g` and atomic operations' return values
2025-02-02 05:56:46 +00:00
Yiltan 7e5b533904 Merge pull request #29 from Yiltan/multi-node
Updated RO builds script and functional test driver
2025-01-27 14:44:22 -05:00
Yiltan f851411ac5 Merge pull request #28 from Yiltan/cmakefiles-create
Generate CMake Package Configuration Files
2025-01-24 10:50:27 -05:00
Yiltan a458ea2ef4 Merge pull request #25 from mawad-amd/muhaawad/build_examples_option
Add `BUILD_EXAMPLES` CMake option
2025-01-24 10:50:10 -05:00
Yiltan Hassan Temucin 3a8b0d4647 Updated RO builds script and functional test driver for multi-node support 2025-01-23 16:46:19 -06:00
Yiltan Hassan Temucin 00824385ba Generate CMake Package Configuration Files 2025-01-22 11:24:41 -06:00
Yiltan bacced0cc3 Merge pull request #27 from ROCm/package-bug-fix
Minor fixes for packaging
2025-01-21 09:13:29 -05:00
Yiltan fa90f4b0ac Minor fixes for packaging 2025-01-20 18:15:07 +00:00
Yiltan 469e2a0167 Merge pull request #24 from Yiltan/install-script
Added script to install dependencies
2025-01-20 11:02:47 -05:00
Muhammad Awad 7a6b3261a3 Add BUILD_EXAMPLES CMake option
Signed-off-by: Muhammad Awad <MuhammadAbdelghaffar.Awad@amd.com>
2025-01-18 15:26:20 -06:00
Yiltan 0fb673e186 Update scripts/install_dependencies.sh
Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>
2025-01-16 13:38:08 -05:00
Yiltan Temucin 5de0371bec Added script to install dependencies 2025-01-16 10:06:39 -06:00
Avinash Kethineedi 17b7afdcba Merge pull request #23 from avinashkethineedi/bugfix/memory-usage
Automatic Memory Management with `shared_ptr` for host interface
2025-01-15 02:34:14 +05:30
avinashkethineedi 1ef2d3a6b7 Replace raw pointers for host_interface with shared_ptr to enable automatic memory handling 2025-01-13 20:58:43 +00:00
Avinash Kethineedi 4a3c3d54fb Merge pull request #22 from avinashkethineedi/functional_tests 2024-12-27 20:49:55 +05:30
avinashkethineedi 23172c9150 Updated driver.sh and tester.hpp with sequential numbering for test identification
* Enabled Ping Pong tests
* Removed test commands for multi-workgroup collective tests
2024-12-26 21:28:21 +00:00
avinashkethineedi e40e6a63fa Updated default case of functional tests with empty test 2024-12-26 19:33:23 +00:00
Edgar Gabriel 12aeab1a59 Merge pull request #20 from edgargabriel/topic/remove-internal-dir
remove internal directory
2024-12-26 09:21:09 -06:00
Yiltan c87fa8183a Merge pull request #21 from Yiltan/global-exit-fix
IPC backend now aborts with rocshmem_global_exit()
2024-12-24 08:52:04 -05:00
Yiltan Temucin c0e4a32ca2 IPC backend now aborts with rocshmem global_exit() 2024-12-23 11:03:04 -06:00