2
0
Gráfico de cometimentos

86 Cometimentos

Autor(a) SHA1 Mensagem Data
Aurelien Bouteiller 2bbe21db56 func-tests: Don't rely on asserts to catch invalid argv/env params (#96) 2025-05-02 12:00:35 -04:00
Aurelien Bouteiller f0501550f7 cleanup leftovers from SOS testers removal (#97)
Followup to pr#85
2025-05-02 11:59:52 -04:00
Aurelien Bouteiller b835de6cd5 Substitute pow2bin allocator with a dlmalloc based allocator (#71)
* Add dlmalloc_strat allocator strategy
 - Use mspace variant to ease encapsulation
 - Make pow2bins and dlmalloc cmake selectable
* Add unit tester for dlmalloc, rework single_heap, pow2bins unit testers
accordingly
 - add dlmalloc get_used/get_avail, and have all strats allocators also have a get_used
 - Rework memallocator unit tests: bin size is per strat, alignment is verified in singleheap
* bugfix: dlmalloc exposed that the pingpong test would write past end of
allocation with -w 32
* iostream leakage/mixed usage of cerr and fprintf(stderr

---------

Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
2025-05-01 11:55:23 -04:00
Yiltan edcd1ed57e Added XNACK support (#94)
* Added xnack flags
* Updated examples compile command
2025-04-30 08:57:55 -04:00
Yiltan c81722c339 Check RMA functional test data in GPU kernel (#91) 2025-04-28 16:06:05 -04:00
Aurelien Bouteiller 9befbe8293 bugfix: do not dereference ctx during create_ctx if we did run out (#83) 2025-04-16 10:37:44 -04:00
Avinash Kethineedi f6ef19f5a9 Add SPDX license identifiers and update copyright headers (#85)
* Update copyright information and add SPDX license identifier

* Update AUTHORS

* Remove `sos_tests`
2025-04-15 15:37:53 -05:00
Brandon Potter 0fd628458c Cleanup unused code in repository (#75)
* Remove unused forward_list

* Remove unused __read_clock function

* Replace wallClk code with hip function

* Remove unused unit test for ipc

* Remove slab heap

* Remove unused EBO spinlock
2025-04-10 14:47:24 -05:00
Avinash Kethineedi 68421895d6 Update collective APIs naming (#77)
* Update the naming convention for collective APIs to ensure consistency across the interface.

* Move all collective API declarations to rocshmem_COLL.hpp

* The following APIs were updated as part of this change:
  - `barrier`
  - `barrier_all`
  - `sync`
  - `sync_all`
  - `all_to_all`
  - `broadcast`
  - `fcollect`
  - `all_reduce`

* Update header file generation code for collective APIs
2025-04-10 12:14:47 -05:00
Avinash Kethineedi dc61bca066 Update Barrier and Sync APIs (#73)
* Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context
 - Implemented `barrier` APIs for thread, wavefront, and workgroup scopes
 - Added support into both IPC and RO conduits
 - Added functional tests to cover all `barrier` APIs
 - Removed collective operations on default context

* Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits.
  - Implemented `sync` APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync` APIs

* update naming convention for context-based `barrier` APIs
2025-04-08 11:25:31 -05:00
Avinash Kethineedi c652f58cef Update Barrier_All and Sync_All APIs (#72)
* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
  - Added separate pSync buffers for each device context
  - Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts

* Update barrier_all functional tests for multi-context support

* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
  - Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
  - Added support in both IPC and RO conduits
  - Updated functional tests to cover all `barrier_all` APIs

* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
  - Implemented sync_all APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync_all` APIs
2025-04-02 11:58:55 -05:00
Yiltan e16ca7a1e3 Update GTEST version (#68) 2025-03-31 08:58:30 -04:00
Avinash Kethineedi 867519e1d0 Implement default RO context (#64)
* Allocate default context buffers and initialize queue for management

- Allocated the status flag, g return, and atomic return buffers for
  the default context.
- Initialized `AtomicWFQueueProxy` instances to manage these buffers
  efficiently for concurrent access.

* Update `BlockHandle` with default context buffers

* Add default context flag and update buffer retrieval functions

- Added a flag to distinguish the default context from other contexts.
- Modified return buffer functionns and `get_status_flag` function to accommodate
  the default context

* Add default context primitive tests

-  get, put, get_nbi, put_nbi, g, and p APIs.
2025-03-25 18:51:54 -05:00
Avinash Kethineedi b84b5638cf Add AtomicWFQueue implementation and tests (#62)
* feat: Add AtomicWFQueue implementation
  - Implemented wavefront-safe atomic FIFO queue ensuring first-come, first-serve order
  - Added efficient synchronization using atomics
  - Enhanced `dequeue` to wait until an element is available

* test: Add GTest for AtomicWFQueue
  - Implemented unit tests for AtomicWFQueue using GoogleTest framework
  - Added tests for `enqueue`, `dequeue`, and edge cases
  - Ensured synchronization behavior and correctness under concurrent conditions

* Add assert in `enqueue` and update atomics
  - Added an assert in the `enqueue` function to ensure it fails if the queue is full
2025-03-25 00:45:19 -05:00
Edgar Gabriel bcbc42e78f add rocshmem_barrier() (#61)
* add team-barrier implementation

add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.

* add team_barrier_tests to functional tests
2025-03-24 11:23:03 -05:00
Yiltan 68a1646399 ROCm 6.4.0rc3 bug fix (#56) 2025-03-19 15:37:58 -04:00
Yiltan 3428957de9 Sync Reverse Offload Scripts (#52)
* Sync Reverse Offload scripts
- Disable IPC unit tests when IPC is not available in the rocSHMEM configuration

* Added missing ptr in ipc_policy
2025-03-19 14:31:07 -04:00
Avinash Kethineedi aa3121a967 Update RMA functional tests (#50)
* Update primitive tests for multi-workgroup support

* Update workgroup primitive tests for multi-workgroup support

* Update workfront primitive tests for multi-workgroup support

* Update team based primitive tests for multi-workgroup support

* Update RMA functional tests to capture timing after quiet call
   - Modified RMA functional tests to record the time after a `quiet` call in thread, wavefront, and workgroup RMA calls.

* Improve error handling and memory management
   - Replaced `cout` with `cerr` for improved error reporting.
   - Ensured all allocated memory is freed when `rocshmem_malloc` fails.

* Update start time in primitive tests and latency calculations
   - Modified primitive tests to capture the earliest start time.
   - Updated latency calculations in functional tests.

* Remove `GetSwarmTester`

* Update start time in team primitive tests

* Invoke quiet call from a single thread within a block on a rocshmem context
2025-03-18 14:39:57 -05:00
Yiltan 96424a59a8 Added option to build only tests and link to an external rocshmem library (#43)
* Rearrange CMakefile

* Enable linking to external rocshmem library

* Minor fix for the functional test driver

* ROCSHMEM_HOME detection fixed
2025-03-13 15:49:50 -04:00
Yiltan 487e5b7d0f Fix ROCm 6.4 warnings (#47)
* Removed __AMDGCN_WAVEFRONT_SIZE

* Added unit test to validate WF_SIZE
2025-02-24 13:34:13 -05:00
Avinash Kethineedi 57d60aa727 Add multi work-group support for collective functional tests (#45)
- Added multi-work group support for the All-to-all, Fcollect, Broadcast, Barrier and Sync collective functional tests
- Renamed All-to-all and Fcollect tests to TeamAlltoAll and TeamFcollect
2025-02-19 10:31:53 -06:00
avinashkethineedi c155636da4 Update bandwidth and latency calculations
- Refined bandwidth and latency calculations for improved accuracy
2025-02-17 06:18:46 +00:00
Yiltan Hassan Temucin b83ff2fa84 Use the precalculated num_warps variable 2025-02-06 13:21:25 -06:00
Yiltan Hassan Temucin 8d74c7b73e Validate signal after put signal operations 2025-02-06 08:17:22 -06:00
Yiltan Hassan Temucin bae1641311 Fix sigops functional test
- Ensure quiet is called on the correct context
2025-02-04 13:30:31 -08:00
avinashkethineedi 23172c9150 Updated driver.sh and tester.hpp with sequential numbering for test identification
* Enabled Ping Pong tests
* Removed test commands for multi-workgroup collective tests
2024-12-26 21:28:21 +00:00
avinashkethineedi e40e6a63fa Updated default case of functional tests with empty test 2024-12-26 19:33:23 +00:00
Avinash Kethineedi c5902afe28 Merge pull request #19 from avinashkethineedi/teams_split_API 2024-12-23 20:42:09 +05:30
avinashkethineedi cb8b9094b4 Fix rocshmem_team_split_strided API 2024-12-21 18:16:42 +00:00
Yiltan Temucin 83a588ee2b Commented function that fails functional tests 2024-12-20 14:48:54 -06:00
Brandon Potter b1f6621f33 Fix signal calculation bug for fine-tiled unit tests 2024-12-19 18:34:47 +00:00
Yiltan Temucin fa0858833e Remove comparisons of signed to unsigned values 2024-12-12 10:21:08 -06:00
Yiltan Temucin b60a460681 Use ROCm-CMake 2024-12-06 15:49:41 -06:00
avinashkethineedi d8ce066adc Merge branch PR #55 into naming_scheme 2024-12-04 21:46:38 +00:00
Yiltan 0c5524d7df Merge pull request #54 from Yiltan/sig-ops
Added SHMEM Signalling Operations
2024-11-26 15:38:12 -05:00
Brandon Potter 46f0b42ac3 Merge pull request #48 from BKP/ipc_fine_tiled_unit_11-04-24
Add tiled fine-grained unit tests
2024-11-25 14:36:04 -06:00
Brandon Potter fd8dbc7fb6 Use new naming scheme 2024-11-25 14:25:29 -06:00
Yiltan Temucin f710a301fe Added functional tests 2024-11-22 15:36:17 -06:00
Yiltan 308816bc5e Merge pull request #49 from Yiltan/unit-tests-driver-bug
driver should now return a fail code if any of the mpirun's fail
2024-11-22 16:35:36 -05:00
Yiltan Temucin 50e46847c6 Explicitly require rocPRIM and rocThrust. 2024-11-19 08:54:18 -06:00
Brandon Potter 03719bbb0e Update tests/unit_tests/ipc_impl_tiled_fine_gtest.cpp
Co-authored-by: Avinash Kethineedi <avinash.kethineedi@amd.com>
2024-11-14 13:08:20 -06:00
Yiltan Temucin c2b736ef3d converted roc_shmemx to roc_shmem 2024-11-12 08:37:56 -06:00
Yiltan Hassan Temucin 997eb69b5a modified team based to_all -> reduce 2024-11-06 09:46:43 -06:00
avinashkethineedi b2b0d559cb Merge branch 'ROCm:develop' into active_set_APIs 2024-11-05 23:02:44 +00:00
Brandon Potter d241015e0f Add tiled fine-grained unit tests 2024-11-04 17:16:07 -06:00
Brandon Potter 749d9f0781 Convert simple fine tests into parameterized tests 2024-11-04 10:46:50 -06:00
Yiltan Hassan Temucin fe767d9abf remove cooperative groups 2024-10-30 20:10:21 +00:00
avinashkethineedi 9a524046fe Remove active-set-based broadcast test from the functional tests suite 2024-10-29 16:18:46 +00:00
avinashkethineedi abec29bd6a Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays
* Change log_stride calcualtions to stride calculations
* Update all_reduce example code to use team based interface
2024-10-28 22:10:18 +00:00
avinashkethineedi e9484bbb86 Remove active-set-based reduction test from the functional tests suite 2024-10-28 21:22:46 +00:00