93 次程式碼提交

作者 SHA1 備註 日期
Aurelien Bouteiller ede2adfe49 new tester: put to all pes from all lanes concurrently (#112)
* Add put to all pes from all lanes concurrently

* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)

* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly

* Add flood tester to the testing script

* add to gda test case w/o the _g variant that is not implemented.

[ROCm/rocshmem commit: cca7872bcf]
2026-01-16 10:40:48 -05:00
Edgar Gabriel e38f98fad5 fix reduction test for gfx1201 (#374)
* fix reduction for gfx942 and 1201

match the synchronizaation of internal_putmem_wg and internal_getmem_wg
to their non-internal counterparts. the internal_putmem_wg is used in
the ipc reduction

* move specialization to internal_putmem

[ROCm/rocshmem commit: 8d2504d6c1]
2026-01-06 10:15:38 -06:00
Edgar Gabriel cc727261de disable the putmem_signal_on_stream on RO (#376)
it fails in about 50% of the cases. Will revisit later why it fails,
but RO is at the moment lower priority, so disabling the test for now.

[ROCm/rocshmem commit: ed2f75f1de]
2026-01-06 08:10:46 -06:00
Aurelien Bouteiller dde4902844 Fix driver.sh script for system where neither amd-smi or rocm-smi are (#370)
found

[ROCm/rocshmem commit: 5eaa152010]
2025-12-19 10:00:11 -05:00
Edgar Gabriel f9fd5d3cdd use 64 threads for reduction test (#360)
* use 64 threads for reduction test

much faster with IPC backend.

* change all relevant collective tests.

[ROCm/rocshmem commit: c35210f174]
2025-12-15 08:14:18 -06:00
Anatolii Rozanov f98c72d627 Add host API for *_on_stream operations (#340)
* Add functional test for barrier_all_on_stream

* Add rocshmem_barrier_all_on_stream support for GDA and RO backends

Implements rocshmem_barrier_all_on_stream operation for
GPU Direct Access and Reverse Offload backends.

Previously, rocshmem_barrier_all_on_stream was only supported for IPC backend.

* Add functional test for rocshmem_broadcastmem_on_stream

* Add host-side rocshmem_broadcastmem_on_stream API

Implement stream-based broadcast collective operation

- Add rocshmem_broadcastmem_on_stream host API and kernel implementation
- Add functional test TeamBroadcastmemOnStreamTester with multi-stream
  support and correctness verification
- Use per-workgroup contexts to avoid contention across parallel streams

API:
rocshmem_broadcastmem_on_stream(team, dest, source, nelems, pe_root, stream)

* Add functional test for rocshmem_getmem_on_stream

* Add host-side rocshmem_getmem_on_stream API

Implement stream-based point-to-point RMA get operation

- Add rocshmem_getmem_on_stream host API and kernel implementation
- Support for asynchronous getmem operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group collective getmem for efficient memory transfer

API:
rocshmem_getmem_on_stream(dest, source, nelems, pe, stream)

(AI Assist)

* Add host-side rocshmem_putmem_on_stream API

- Add rocshmem_putmem_on_stream for asynchronous remote writes
- Support for concurrent RMA operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group device collective operation

API:
rocshmem_putmem_on_stream(dest, source, bytes, pe, stream)

(AI Assist)

* Add functional test for rocshmem_putmem_on_stream

* Add host-side rocshmem_putmem_signal_on_stream API

Enables asynchronous putmem operations with signaling on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_putmem_signal_kernel
- Host interface putmem_signal_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Public API

Function signature:
void rocshmem_putmem_signal_on_stream(void *dest, const void *source,
                                      size_t bytes, uint64_t *sig_addr,
                                      uint64_t signal, int sig_op,
                                      int pe, hipStream_t stream);

* Add functional test for rocshmem_putmem_signal_on_stream

* Add host-side rocshmem_signal_wait_until_on_stream API

Enables asynchronous signal wait operations on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_signal_wait_until_kernel
- Host interface signal_wait_until_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Native uint64_t support in wait_until API (generated from P2P_SYNC.py)

Function signature:
void rocshmem_signal_wait_until_on_stream(uint64_t *sig_addr, int cmp,
                                          uint64_t cmp_value,
                                          hipStream_t stream);

(AI Assist)

* Add functional test for rocshmem_signal_wait_until_on_stream

* Add documentation for stream API functions

This commit adds API documentation for the following host-side
stream functions:

- rocshmem_barrier_all_on_stream (collective routines)
- rocshmem_broadcastmem_on_stream (collective routines)
- rocshmem_getmem_on_stream (RMA operations)
- rocshmem_putmem_on_stream (RMA operations)
- rocshmem_putmem_signal_on_stream (signaling operations)
- rocshmem_signal_wait_until_on_stream (point-to-point sync)

The documentation includes function signatures, parameter descriptions,
and detailed explanations of asynchronous behavior and stream handling.

(AI Assist)

* Rename "bytes" -> "nelems"

* Add "_TEST_" to the variables used in tests

* Remove incorrect hipStreamDefault usage

hipStreamDefault is not a default stream. This is a flag.

If stream == nullptr, then just pass it to kernel. It will launch the kernel on the default stream

[ROCm/rocshmem commit: d0c8380650]
2025-12-09 08:55:46 -06:00
Anatolii Rozanov 4b04b540bf Add host API for alltoallmem_on_stream collective operation (#333)
* Add host-side rocshmem_alltoallmem_on_stream function

Function signature:
  rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void *dest,
                                 const void *source, size_t size,
                                 hipStream_t stream)

- The function launches rocshmem_alltoallmem_kernel which calls
device-side alltoall<char> workgroup collective through default context.
- Uses dynamic block size determination via occupancy API.
- Implemented for all backends.

* Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends

When allocating memory for alltoall_pSync_pool in setup_teams() and
teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE
instead of ROCSHMEM_ALLTOALL_SYNC_SIZE.

* Add functional test for team_alltoallmem_on_stream

This commit adds a new functional test to verify the correctness of
the host-side rocshmem_team_alltoallmem_on_stream API.

* Add documentation for rocshmem_alltoallmem_on_stream

This commit adds API documentation for the host-side
rocshmem_alltoallmem_on_stream function in the collective routines
section. The documentation includes:

[ROCm/rocshmem commit: 5577feb70d]
2025-12-03 08:40:24 -05:00
Edgar Gabriel 5e6a4e15f6 disable memory tests (#310)
disable fine-grain and coarse-grain memory testst until a fix is
available in ROCm 7.1 and/or our CI image. Otherwise we might miss other
errors due to constant CI failures.

[ROCm/rocshmem commit: 4fc5541d78]
2025-11-07 08:04:31 -06:00
Aurelien Bouteiller 51cf7c6c05 python venv madness round 2: use ensurepip if installed (#308)
When creating a python venv during the install_dependencies script, we try to use ensurepip if it is installed, as it deals better with cases where multiple venvs are active simultaneously. (as seen in CI buildbot)

[ROCm/rocshmem commit: b7a6d86c6b]
2025-11-05 10:52:22 -05:00
Aurelien Bouteiller 76e8750d88 Add backend type query method, use it to disable 32bit amo testers on gda (#307)
* Add backend type query method, use it to disable 32bit amo testers on
gda

* The infrateam testers work

[ROCm/rocshmem commit: 8c175315f2]
2025-11-05 10:24:07 -05:00
Aurelien Bouteiller e622398337 install_dependencies pip issues with ubuntu 24 (#302)
* The install_dependencies script would fail on ubuntu 24.04
they changed how pip works so we need to create a venv first now

* Fix install_dependencies for ubuntu 22

* Make sure we build in the builddir and install in the installdir
combine installdir for ucx and ompi when user-provided by INSTALL_DIR
retain prior behavior if not overridden to avoid breaking CI scripts

[ROCm/rocshmem commit: e155af8704]
2025-10-31 16:34:36 -04:00
Aurelien Bouteiller bdb30e2984 Tests/syncall (#291)
* SyncAll test case would run Sync

* Despecialized name for argument reader

* Rename sync-test to team-sync-test as it uses teams

* Another stab at probing NUM_GPUS

[ROCm/rocshmem commit: 054bc33dc4]
2025-10-23 13:40:41 -04:00
Edgar Gabriel d37af80d7e add support for GPUs using wavefront size of 32 (#285)
* add gfx1100 support

Add support for Radeon 7900 GPUs (RX and PRO), and 7800 PRO.

I was contemplating to add gfx1101 and gfx1102 GPUs as well, but those are the lower end models that are more unlikely to be used for compute intensive jobs. In addition, I do not have access to them to test the support.

* update WF_SIZe for different options

Radeon systems use a WarpSize of 32, unlike current Instinct systems,
which use a warp size of 64. For the device side, a gfx specific ifdef
is sufficient. For the host side, we need to query the device
properties.

* adjust functional tests to wf_size of 32

* update unit tests to handle wf_size of 32

* address reviewer comments

[ROCm/rocshmem commit: d0c2845031]
2025-10-22 16:04:58 -05:00
Edgar Gabriel 6bc1cc63ae update tester for RO (#281)
update the tester script to only tests the amo functions on RO that are
expected to pass. We can revisit the non-passing tests later, but this
prevents us from having passing CIs at the moment, while RO is simply
lower priority than other asks.

[ROCm/rocshmem commit: 6f74cdfd75]
2025-10-20 09:03:17 -05:00
Aurelien Bouteiller bb8406b013 Runtime selection of IONIC (#272)
* Split ionic code to a subdirectory; dyld libionicl; move the fntable to provider_gda_xxx.hpp
pass the pattr to ionic_setup_pd, include endian.hpp
Enable building IONIC conduit for runtime selection

* Uniform style for the fntable between ionic and the rest

* Move mlx5 gda conduit to a subdir; resolve conflict with backend_can_run
function

* Don't forget to init qp for ionic, move mlx5 specialized init qp code to
the mlx5 subdir

* Don't add cmakecaches...

Typo: GDA_BXNT

* Add gda-ionic to all_backends build scripts

* Apply suggestion from reviews

Co-authored-by: Omri Mor <omri50@gmail.com>
Co-authored-by: Edgar Gabriel <edgar.gabriel@amd.com>

* Remove duplicate definitiion of DLSYM macros

---------

Co-authored-by: Omri Mor <omri50@gmail.com>
Co-authored-by: Edgar Gabriel <edgar.gabriel@amd.com>

[ROCm/rocshmem commit: 3cfe76522e]
2025-10-16 15:53:01 -04:00
Edgar Gabriel 192c549d40 allow all three backends to co-exist in a single build (#270)
* add support for compiling all backends

also include the logic to select backends either based on user requests
or through some heuristics

* checkpoint for compiling all backends

* final checkpoint

all tests seem to pass when compiling all three backends simultaneasly
and forcing to use any of the three Backends.

* update PR to new envvar system

[ROCm/rocshmem commit: a1269e3db5]
2025-10-07 10:49:20 -05:00
Yiltan 722a8de453 [GDA] Implement fetching atomics for BNXT (#253)
* Indent driver script
* Implemented fetching atomics BNXT

[ROCm/rocshmem commit: f5aefd15f3]
2025-09-18 09:50:42 -04:00
Aurelien Bouteiller e607bfbb7b Remove unused scripts from functional tests (#237)
[ROCm/rocshmem commit: 38a7820aa8]
2025-09-12 10:14:33 -04:00
Edgar Gabriel a19c98b20a add an all-ro flag (#252)
to specify the subset of tests that we want to run in Jenkins with the
RO conduit

[ROCm/rocshmem commit: b6b5a82d2b]
2025-09-11 16:32:08 -05:00
Aurelien Bouteiller 6ec9247a54 Make it possible to test RMA GET and PUT separately (#250)
DISABLE_GET removed from ALL, idea is that the CI scripts will invoke a
subset that is known to work rather than ALL

[ROCm/rocshmem commit: 5dc7d4539e]
2025-09-11 16:44:48 -04:00
Yiltan 94547d8936 [GDA] implement rocshmem_p (#247)
[ROCm/rocshmem commit: 2abeebbb6d]
2025-09-11 09:24:43 -04:00
Avinash Kethineedi 6860bc1275 GDA get* APIs (#243)
* feat(GDA): add `get*` and `get*_nbi` APIs for mlx and bnxt NICs
   - implemented thread, wave and wg variants of `get*` and `get_nbi`.

* test(GDA): enable functional tests for `get*` and `get*_nbi` APIs

[ROCm/rocshmem commit: 671f8187f4]
2025-09-10 11:24:53 -05:00
Yiltan b79b9f4b60 Unify common BNXT and MLX5 initialization code (#233)
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: cb39f7a313]
2025-09-10 09:13:36 -04:00
Yiltan 9b8404693c Implemented workgroup puts (#238)
[ROCm/rocshmem commit: 58f96af7ec]
2025-09-08 10:57:39 -04:00
Aurelien Bouteiller 70294d8e8c Import gda_devel back into develop (#206)
* Import gda_devel back into develop

Squashed commit of the following:

commit d9e2fed2f7e55d266c7dfcacc4641b92a3b008ed
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Thu Jul 24 14:50:47 2025 -0500

    Only issue a single completion per wavefront (#199)

commit 6b6e41ef3c955d914c83cc77cecbf8c4ec6a363e
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date:   Thu Jul 24 14:12:35 2025 -0400

    non-fetching amos are implicit nbi, we do not need the terminal quiet. (#179)

commit 78feb0e15ba864b6bfd1b4ae3365e0312d7170c5
Author: Alsop, John <johnathan.alsop@amd.com>
Date:   Tue Jul 8 10:25:43 2025 -0700

    Relax ibgda synchronization (#191)

    * rocshmem mcm: relax ibdga orderings

    convert all SEQ_CST orderings in queue_pair to RELAXED except:
    -system scope ring_doorbell access: required to flush push buffer
     (unless data is uncached - in which case a waitcnt is sufficient)
    -agent scope leader thread read in post_qpe_rma: unclear why this
     is necessary, but when relaxed, the code breaks. either the waitcnt
     or the L1inv associated with agent scope SEQ_CST is needed for
     functionality.

    * Undo changing atomic_signal_fence from SEQ_CST to RELAXED as this
    appears to have no performance advantage and we are not entirely sure is
    correct

    ---------

    Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>

commit 9eb45465775f5f00140788c65adceeabf83d4268
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date:   Mon Jul 7 13:56:19 2025 -0500

    Make gda_devel branch work without MPI library  (#188)

    * First cut on adding the no-mpi path to gpu_ib

    more functions to follow.
    add mpi_init_singleton stuff

    * make gda compile with no-mpi support

    * gda_device without mpi support

    * fixes for functional tests

    - disable the mpi_init_singleton tests in the unit tests.
      There is no point in fixing them on this branch to adjust to the new structure/logic.
    - replace MPI_Barrier with rocshmem_barrier_all in tester.cpp
    - I missed one Allgather statements in gda_device.cpp, add the non-MPI
      version for that call as well

    * Update src/gpu_ib/gda_device.cpp

    Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

    * Update tests/functional_tests/CMakeLists.txt

    Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

    ---------

    Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

commit 3766e4293c070efde091b9d1675aeef3cccdf701
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Thu Jun 26 19:12:49 2025 -0500

    Check for counter load order update in send queue (#178)

commit 255c240b2d001cea13a3c8c77cc0a049dd598631
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date:   Thu Jun 26 15:10:44 2025 -0500

    Refactor Barrier_all and Sync_all to use default context (GDA) (#175)

    - Removed context-specific implementations of barrier_all and sync_all
    - Added barrier_all and sync_all to the default context implementation
    - Updated functional tests to use the default context for barrier_all and sync_all

commit 1c5d004eb56f420ede1cc7cbf563c618a2d6c5d8
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date:   Tue Jun 24 14:24:48 2025 -0400

    Reeneable Release by default (#168)

commit c7b90bc78a605da418912b51af339fb3747c3b74
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Tue Jun 24 12:20:22 2025 -0500

    Fix issues with queue_pair (#167)

    * Add amo fetch_add and non_fetch add self tester

    * Validate both ways

    * Intermediate debug for atomic hang

    * Fixes for amo test

    * Convert to release build

    * Revert SYSTEM to AGENT for scope

    * Restore tester arguments

    * Make nonfetch amo into blocking call

commit caec3441855135510a4747b64d5d8ebc88a8eea0
Author: Aurelien Bouteiller <abouteil@amd.com>
Date:   Mon Jun 23 22:30:00 2025 -0400

    bugfix: prevent reuse of sqe items before they are ready

commit b5c474b7573029b84a7aeee417fc8fbe9402f227
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date:   Tue Jun 17 09:17:24 2025 -0500

    change default compilation mode for gda_devel (#162)

    for the moment, switch to Debug builds being the default, since it seems
    to be more stable with DeepEp

commit 2d771c8f335ffb552589f6f0b3cd60275c87506d
Author: Yiltan <ytemucin@amd.com>
Date:   Thu Jun 12 16:08:32 2025 -0400

    Add Broadcom support for gda_devel (#148)

    * Added bnxt headers

    * Updated bnxt headers to compile with rocSHMEM

    * Preliminary BNXT Support

    * Update direct verbs to 2025/05/30 drop

    * Use umem_reg to create queues

commit 51cf8ee72bf18550947b9bde0926fc5f68900f46
Author: Andrew Boyer <andrew.boyer@amd.com>
Date:   Tue May 20 17:01:39 2025 -0400

    gpu_ib ionic: Address review comment (#137)

commit 822541e7f7ed56857185779d91a62f4fac362fbd
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Tue May 20 15:57:17 2025 -0500

    Check RMA functional test data in GPU kernel (#91) (#132)

    Co-authored-by: Yiltan <ytemucin@amd.com>

commit 5dc74b6fa605f7703b22cbf7035196bbd6ab306a
Author: Andrew Boyer <andrew.boyer@amd.com>
Date:   Tue May 20 16:35:07 2025 -0400

    gpu_ib ionic: add gpu_ib provider for ionic (#133)

    Port gpu_ib ionic changes from earlier proof-of-concept codebase.

    Build with GPUIB_IONIC=1 to enable ionic and disable mlx5.

    Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
    Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>

commit f64fc480e2d960bf7a88c80b1896d6980b0e9fc1
Author: Andrew Boyer <andrew.boyer@amd.com>
Date:   Fri May 16 09:07:43 2025 -0400

    gpu_ib: Cleanups to Mlx5 provider to ease Ionic integration (#129)

    Keep both pd_orig and pd_parent.
    Add some helpers for lane mask etc.
    Add generic defines in a few places.

commit d546e43c71120544366f2fa4496ca1ee32a1ede4
Author: Andrew Boyer <andrew.boyer@amd.com>
Date:   Thu May 15 14:07:33 2025 -0400

    gpu_ib: Fix up putmem_wave() (#128)

    Add a thread ID check to GPUIBContext::putmem_wave() so that only one
    thread gets through.

    Since the context layer checks, the QP layer doesn't need to. Thus
    QueuePair::put_nbi() and QueuePair::put_nbi_wave() are the same and
    can be combined.

    Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>

commit cf6231593a5b6d9370c605bc0f63a8806baf73bc
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date:   Thu May 15 11:41:21 2025 -0500

    re-add code to select closest NIC to a GPU (#127)

commit e7f3911f173f42caf48d05d4ec41f69a1e4569fc
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Mon May 12 17:09:00 2025 -0500

    Fix MPI_Comm bug (#123)

commit 866d52768b1131d9ba2b85c537e0a425039189a1
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date:   Fri May 9 13:13:08 2025 -0500

    Fix Barrier API implementation and add missing variants (#121)

    - Fixed issues in the existing Barrier API
    - Allocated sync buffers of team using the symmetric heap
    - Added missing thread-level and wavefront-level Barrier APIs
    - Updated functional tests to cover all Barrier variants

commit caa4dc3c4ed3330a98b69d88aba57699b1c135b4
Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
Date:   Thu May 8 16:58:58 2025 -0400

    Missing variable in ibgda branch and use create_ctx to avoid default ctx (#120)

    in num_pes and my_pe

commit 483636e380bbaf67d92ae386b8ab99156415a078
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Thu May 8 14:36:46 2025 -0500

    Refactor several classes and bugfixes (#115)

    * Merge backend connection and network classes

    * Use agent scope instead of system scope for counters

    * Remove monitor thread

commit 83e7a0487194c6ed34fcf9449e0f02e9d5934229
Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
Date:   Thu May 8 14:52:52 2025 -0400

    Add verification, fix only rank0 runs the test (#114)

commit 3469aea496e0d7afcb01059969bbc5c99082fa0e
Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
Date:   Thu May 8 10:55:40 2025 -0400

    new tester: put to all pes from all lanes concurrently - ibgda (#113)

    * Add put to all pes from all lanes concurrently

    * This runs on ro 64(8x8) pes, the workload increases with the num_pes so it gets very slow at scale

    * Adapt for ibgda branch

commit 3e10a287e8ff9f8b4e2d40bb4969b226049726f6
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date:   Wed May 7 18:20:12 2025 -0500

    Fix and extend Barrier_All API support (#110)

    - Fixed issues in the existing Barrier_All API implementation
    - Added missing thread-level and wavefront-level Barrier_All APIs
    - Updated functional tests to cover all Barrier_All variants

commit c5a369c2247b67555eb773aa8b2c77d723e28104
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Wed May 7 11:45:12 2025 -0500

    Serialize entrance into queue pair code by PE (#108)

commit 5e916dad8757fdd7cb7294aef9d1148074c367d6
Author: Yiltan <ytemucin@amd.com>
Date:   Wed May 7 12:38:58 2025 -0400

    Fix ibv_reg_mr when using subcommunicators (#104)

commit aa65c8a7ecdb495da30a08cedd336ae9256ce5b5
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date:   Tue May 6 11:10:12 2025 -0500

    add code for determining closest NIC to a GPU (#100)

    add code for detecting the closest NIC given a GPU device ID.
    The code is based on the same functionality in Transferbench, and has
    been stripped down to the required functionality in rocSHMEM. (Note,
    there is probably more code that could be removed/simplified probably).

    There are two interfaces that are of interest:
      - int GetClosestNicToGpu(int gpuIndex, char **dev_name): returns the
        id of the NIC in the device list as well as the name of the device
        (if dev_name is not a nullptr);
      - void DisplayTopology(bool outputToCsv): prints out the entire
        topology detected on the node. THis does not happen automatically,
        but could be integrated in the future with some debugging output
        when the user sets an environment variable.

commit 1bd5c302759dcfd0201d549c2620d53f57cf011f
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Tue May 6 11:09:56 2025 -0500

    Fix several bugs of gda_devel branch (#103)

    * Revert "Use 32-bit counter values"

    This reverts commit 464374e5f7157cb4124d01d662103056a04a933c.

    * Call hipMemset after allocation on QueuePair members

    * Undo previous relaxations and use SEQ_CST atomics

    * Remove placement new on QueuePair creation

    * Bugfix on outstanding wqe table off by one

commit 8f1fef97a809d829ee05495b0f55cf43e610b99f
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Tue May 6 11:09:40 2025 -0500

    Remove unused code (#102)

    * Remove unused code

    * Remove unused connection method

commit dd675b459db961d1f366c33a95528c4946179c02
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Fri May 2 15:45:51 2025 -0500

    Add AMO support

commit 025569c252b91ad4be46a70cd94ec2af117b9167
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 23:13:34 2025 -0500

    Change names around

commit b80ccee955edecdda320b38d4c395ee9aeb4ae43
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 22:34:48 2025 -0500

    Remove unused code

commit b2227a72817c130535bfad43e931878da3d799b1
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 22:23:47 2025 -0500

    Replace do-while with while

commit 464374e5f7157cb4124d01d662103056a04a933c
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 22:11:22 2025 -0500

    Use 32-bit counter values

commit 6e6c2c9587c8e57ea669b49ed5b8d40aa17da4e0
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 22:08:22 2025 -0500

    Relax synchronization

commit 5c95d2967b675d3cb568b2624923d1cefdb6d26e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 21:58:42 2025 -0500

    Remove unused method

commit 91bdc47b4c2c488315aa7b0e27235cd6046032bb
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 21:48:53 2025 -0500

    Use __shfl for broadcast

commit 5a04575d731c37fed6ed8be7ad49483ee23781f1
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 21:40:44 2025 -0500

    Relax order

commit ccd29bb037a8a726351e9cde4f3df9e64242545f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 21:29:58 2025 -0500

    Relax synchronization

commit 21f26f2d31549e3c9c71c2b0ddabe2548d2c59f6
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 21:10:27 2025 -0500

    Rename sq variables

commit d921a5165d63bed51f8fed6cf84a0c47f7df94f4
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 20:37:54 2025 -0500

    Rename variables in quiet

commit 8d83f6bfb9dac1315748f0a9337c993e4ce4609e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 20:27:59 2025 -0500

    Rename quiet counter variables

commit 41d303d37fed401fd807fcf85af74637c3bcb68d
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 20:24:15 2025 -0500

    Refactor quiet

commit 6fdae5426a8cf30e2208af8a3f0e5e31c78674f6
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 20:05:19 2025 -0500

    Replace some lds broadcasts with __shfl

commit 5ed8835fa8e4008519dd3ca5abed7155a51ea825
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 19:42:47 2025 -0500

    Use constant for wavefront size instead of literal

commit d9f24ff7f7bd87a574febd6a07e5d79fbe71b708
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 19:38:42 2025 -0500

    Remove debug statements

commit 7fa040d11fe0f886b319e87b536748558dda8de8
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 15:55:59 2025 -0500

    Fixed several bugs - stable

commit a923e6eecacb9398031f41350e6a464895064479
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 12:10:29 2025 -0500

    Fix bug in post_wqe_rma

commit d55ce6183a199ef06d6ff16f9aefa329e99e3875
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 08:22:17 2025 -0500

    Use better variable name

commit 4536ccde50119e05abdc38450e19c665f21ecc8b
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 30 08:18:42 2025 -0500

    Remove atomics for cqe64 access

commit 48722eaf3145507bda7346cce14c6c62995aa342
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 29 22:40:28 2025 -0500

    Use volatile on cqe polling

commit e08e52ab9476a564a015d27ba23c14690f0dd425
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 29 21:35:15 2025 -0500

    Debug synchronization

commit e7ebad19140caf3f6ecba0d770f0a127cd3db421
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 28 11:18:25 2025 -0500

    Minor changes

commit ed41a9635f3058b8b2ee7cd58486d54f3bd35d4f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 28 09:39:16 2025 -0500

    Implement mt queues

commit f91cefd62058b44c79c382a746787145d11bf953
Merge: eb18b0c0 59908366
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 28 10:07:48 2025 -0500

    Merge branch 'abouteil/gpuib_bare-dlmalloc' into bpotter/gpuib_bare-04_28_25-devel

commit 59908366d9c436ef4dd8c77038a9ce31da49f202
Author: Aurelien Bouteiller <abouteil@amd.com>
Date:   Mon Apr 28 10:13:08 2025 -0400

    dlmalloc: resolve drift with ibgda branch

commit fe527fa9bf8f604056151ab5baf617ff1d686be6
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date:   Wed Apr 9 11:57:07 2025 -0400

    Add unit tester for dlmalloc, rework single_heap, pow2bins unit testers accordingly

     * add dlmalloc get_used/get_avail, and have all strats allocators also have a get_used
     * Rework memallocator unit tests: bin size is per strat, alignment is verified in singleheap

    Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

commit 8714dc647a2b5982812ff4405ad82ad43ebc509e
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date:   Fri Mar 28 14:17:49 2025 -0400

    Add dlmalloc_strat allocator strategy Use mspace variant to ease encapsulation Make pow2bins and dlmalloc cmake selectable

    Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

commit eb18b0c0e22616f6154a8d02963ad28b63ec4733
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 27 15:56:54 2025 -0500

    Use SND DBR offset

commit 114e8df3f0f265546e1a1f876a9af79b7f9aa547
Merge: ed7fb58a d192f5b6
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Sun Apr 27 11:09:03 2025 -0500

    Merge pull request #74 from ROCm/ytemucin/gpuib_bare-04-25-25

    Ytemucin/gpuib bare 04 25 25

commit d192f5b6164f9b4bb5305118688f262bc95993e6
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 16:50:02 2025 -0500

    Check default_ctx_ ptr before freeing

commit 60b641a2e5f1fe3ec2848366a5bc0daae7652bb9
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date:   Mon Apr 14 09:18:57 2025 -0500

    Update backend to use provided MPI communicator during library initialization (#79)

    * Update backend to use provided MPI communicator during library initialization, default to `MPI_COMM_WORLD`

    * Update `rocshmem_my_pe` and `rocshmem_n_pes` host APIs
       - Return values from backend if initialized; otherwise, fallback to MPI_Singleton.

commit 474929f8ae3fd254a740626ce50935a223992b6c
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date:   Mon Apr 14 12:02:09 2025 -0500

    Revamp the uniqueId code to support subgroups of processes (#80)

    * add code for bootstrapping

    the bootstrapping code has been extracted from the MSCCLPP library,
    which in parts is based on the code from NVIDIA. The code has been
    modified to match the specific requirements of the rocSHMEM library.

    * add code to use the new uniqueId bootstrapping

    * adjust init_attr example

    extend the rocshmem_init_attr example to use two disjoint groups
    of processe, in order to trigger the new code path.

    * add env variable for bootstrap timeout

    * Update examples/rocshmem_init_attr_test.cc

    Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>

    * Update src/rocshmem.cpp

    Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>

    ---------

    Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>

commit b123c12f10bfd170da8daf3fbfcfd48d153d8f45
Author: Yiltan <yiltan@amd.com>
Date:   Fri Apr 25 11:48:59 2025 -0500

    Required changes to compile with deepep
    - three missing apis (barriers and fence)
    - Enable -fpic

commit ed7fb58aa54907d493d02e6492c08eb58e0b10ad
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 05:01:05 2025 -0500

    Cleanup debug statements

commit b161c046f472ed4c4519b874bceffb9afb5c02e1
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 04:52:21 2025 -0500

    Disabler tester and TicketMutex

commit e51a24ea9d494d472c0f5b759f0e275982940777
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 04:41:35 2025 -0500

    Remove monitor thread

commit 36159551971025520af1f2207c65231eca8b65e5
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 04:32:25 2025 -0500

    Revert "Revert "Remove print statements""

    This reverts commit 763fd7032f9c09a2f642184baa9cb927da414e64.

commit ce7db5f03b796c4e7e994b98b3e5a88922607a23
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 04:31:13 2025 -0500

    Revert "Revert "Turn off debug""

    This reverts commit c9e1c3b1c4300fb7f6b65ba9882f9651d4362221.

commit 8e0d801f8643566f02592a70e50b7de1b25322c7
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 04:30:33 2025 -0500

    Fix THE OTHER bug

commit c9e1c3b1c4300fb7f6b65ba9882f9651d4362221
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 04:02:33 2025 -0500

    Revert "Turn off debug"

    This reverts commit 03303d10ad2155911632888676e045baaea3c2ca.

commit 763fd7032f9c09a2f642184baa9cb927da414e64
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 03:53:16 2025 -0500

    Revert "Remove print statements"

    This reverts commit ae65f024a00e4dc416e3c6efd8f10e4665e0dbbd.

commit 03303d10ad2155911632888676e045baaea3c2ca
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 22 20:06:32 2025 -0500

    Turn off debug

commit ae65f024a00e4dc416e3c6efd8f10e4665e0dbbd
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 03:46:29 2025 -0500

    Remove print statements

commit a63ceff9a74e8811b6749215859e0836ee11ae40
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 03:41:44 2025 -0500

    Fixes THE bug

commit e5276bb50eefc717dca52cf726d3e14aaaa198c6
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 03:04:24 2025 -0500

    Undo tester changes

commit 6ce6be2618bc27cb8fdfc0a0c772c3f44952631a
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 02:58:27 2025 -0500

    Viola?

commit 29f9a063697302641a3a4f18d29eef9db633d966
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 02:24:12 2025 -0500

    Add debug statments for dest_info

commit 89647908c7c1a7ceb25f242e41e6db6c8242b9f3
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 02:08:36 2025 -0500

    Flip ctx destory

commit a703401ccd1409bfe7c73dccace002df2be4e10e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 01:56:13 2025 -0500

    Move ctx out of shared memory

commit 10e458e76b3b79e3be847bd124d6b423e7d01874
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 23 01:43:32 2025 -0500

    Add a second context create

commit 29ad27b160dbcbbda22ef0f246bd71bdd862ccd9
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 22 20:04:33 2025 -0500

    Simplify CQE checks

commit 65fefc2436adfa59146ce67d36e89cdd0c8eb2fc
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 22 18:09:28 2025 -0500

    Use DPRINTF instead of printf

commit d296987d01a326d26838cb16f07159a2b6ae23c0
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 22 13:13:35 2025 -0500

    Remove ibv_fork_init

commit f80c18e8c23ad9be784bf9f1699369dd4972eea0
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sat Apr 19 21:15:48 2025 -0500

    Try to use hipHostMalloc

commit 89ca8ab6b710a8c2d257075cc83321bd6c90503f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sat Apr 19 21:05:11 2025 -0500

    Use hipHostMalloc instead of default allocator

commit a4ceb2c7c8717a7a09a4ff50b9d8681f21ad66b0
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sat Apr 19 16:05:20 2025 -0500

    rkey/lkey debug

commit bfef39a60609626938d2fb58374e9dbc32b60292
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sat Apr 19 15:32:46 2025 -0500

    Convert rkey/lkey back to BE

commit 573a7391afd238fabe6be80093f4a2a0c95b1164
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sat Apr 19 15:29:35 2025 -0500

    rkey/lkey debug

commit ca5ad13b943028c1f561b5a37e0935dea135ee8e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sat Apr 19 14:49:03 2025 -0500

    Add monitor thread

commit a952a997ca80ca5363cc52c8a3753035d0377cb5
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sat Apr 19 11:50:31 2025 -0500

    Add more debug messages

commit 2d2a7813f980ed549951f852e652a7a718bc5928
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sat Apr 19 11:17:53 2025 -0500

    Minor changes to debug statements

commit eef3a7fb22df67c0117c7909702d6e8b366b8b37
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sat Apr 19 10:41:42 2025 -0500

    Allocate network queue pair memory in host memory

commit ea412f1de3a320a422d37b170f2bbd9727ec56e5
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Fri Apr 18 01:29:37 2025 -0500

    dbrec debugging

commit ad0be143f90c4727439046316491ccc9731e11f1
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Fri Apr 18 01:02:19 2025 -0500

    Dump qp debug info

commit 6c5224fc5a05d3fbac43db6c6127131eedd927da
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Fri Apr 18 00:33:29 2025 -0500

    More debug info

commit 4063a883f3cd8dab48574688499e289a6ef9a668
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 23:48:16 2025 -0500

    Debug information

commit a95ac6d3220571537dd8e1f13545d9b530202cea
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 23:20:06 2025 -0500

    Change init attr cap

commit 774324af9a21f93777324a63eee6efc12d33aac3
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 23:08:57 2025 -0500

    Bugfix on param type

commit f1117d0444f8e6384b006d5bface2a3586ff7e68
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 23:05:13 2025 -0500

    More debug

commit eabfa80f11ed65736f4ce07767c5e7e546d50c51
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 22:20:01 2025 -0500

    Debug effort

commit 88f45b8e56fb8fdbad9d0923c65593288b4d1f58
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 14:14:25 2025 -0500

    Remove unused functions

commit f2267c7ec5dfa5c6370ee23d80988cc8f32ca00d
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 14:03:08 2025 -0500

    Remove host-side calls into the qps

commit 2659ed54a2b26699a4f41a4ad609f2f1387f1ea3
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 13:07:30 2025 -0500

    Add device object file

commit d1a143a3e19c19b47a0440980c60e3afe72e6c7b
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 13:07:16 2025 -0500

    Add ticket mutex file

commit 93e6e87497a24e2520bd0a09e2122fe2435f43bc
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 13:05:59 2025 -0500

    Try to protect doorbell with mutex

commit 0d8f0de88c5bfed013a5edc83ae4bea098647c42
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 12:03:31 2025 -0500

    Cleanup doorbell ringing code

commit d3acb6a090743ca70c386464cbbe2110353246f0
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 10:43:03 2025 -0500

    more doorbell prints

commit 46f9ad9d3e3228cdcaa35c8fe09ff4469999c782
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 10:33:01 2025 -0500

    Add print statements

commit 8cb1ce8468348b17e14055182eac15bbddc12757
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 10:14:04 2025 -0500

    Increase blueflame back to two reg and add prints

commit 6276f16edaac719ccc3a98c6013e399deb1a5210
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 17 10:02:52 2025 -0500

    Add print statuements

commit 27d2c1384de697cfdf0a91e7b516a8b305cb92ed
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 16 20:52:46 2025 -0500

    Minor modifications to printf debug

commit 5e03fd5261e621916d190a231419b88e1870cd62
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 16 20:46:54 2025 -0500

    Remove ipc unit tests

commit 548d040ef5dea7f415512774fffdfb151e10716a
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 19:33:53 2025 -0500

    Add print statements

commit ebc1198c4e545bcbac1ea628fdbff3329aea663e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 16:25:00 2025 -0500

    Remove optional doorbell ringing support

commit b487be0f0b23731276549a7e5d7badfe559082ab
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 14:40:51 2025 -0500

    Only allocate space for one blueflame register

commit 3b9ba1321c4a525f6a60d29a39ab07756678d7ae
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 14:25:05 2025 -0500

    Convert protected members to private

commit 52b561e4835a8f4cb6552f8a57a98773139dd011
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 13:53:25 2025 -0500

    Fixes

commit dc0da4a2e406bc74a6a1ad7b847363717ef502ae
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 12:13:49 2025 -0500

    Debug - omit address

commit acc9af949a7102c753f67efb223918a2c24d982d
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 12:05:02 2025 -0500

    Uncomment some code

commit 71387ed3b6d01403c516257c537b64a2cf244d20
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 11:39:41 2025 -0500

    Modify print

commit f402241c2fa09d0e68cbcfdeaebfaa9f5c347ff0
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 11:36:37 2025 -0500

    Change tester arguments

commit d39a9402558fb2ca4aac02de880eb40463c09f6e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 11:34:18 2025 -0500

    Add prints

commit ba1b432112168845fce2684d84e47c239eefa554
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 11:32:05 2025 -0500

    Add print statements

commit 98cbfd1ab5143409db810d4a98868c8b50c36602
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 11:27:47 2025 -0500

    Add device-side print

commit 19bab0ee0f49426b6068b2cecc4594c4ea9a1b5c
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 11:18:04 2025 -0500

    Add wqe debug host print

commit 6cb777e0a1f4ff54271cb6fd57bf3002e4b92629
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 10:55:13 2025 -0500

    Initialize wqe fields without host post call

commit 83c8988c1381ef48581222f7eef64e9a547da1de
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 10:30:06 2025 -0500

    Remove endian conversion since it's done on host

commit c967c2831efb8ea8df73b501b6cec5a724605abe
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 10:27:27 2025 -0500

    Set rkey/lkey using backend

commit 3e9ecfca664cf3c4c53c9ab57c952b37500b9155
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 09:02:51 2025 -0500

    bugfix endian

commit 2ceb456263d31972db5da64d15cc10a9a05922d9
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 08:59:08 2025 -0500

    endian conversion

commit cebc34e78506db5964cc14465e3290f8c2acd351
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 08:38:14 2025 -0500

    Enable tester

commit 05ad6d72322cb50b980940974e27df3bfd00f295
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 08:31:39 2025 -0500

    Add in rkey/lkey writes

commit 7340f209de4e00958224e639ad7d1b8cdfb685d7
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 08:28:15 2025 -0500

    Add rkey/lkey check

commit c22cf537c726c14d5a3a374b152a4d755e23f95f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 15 08:13:29 2025 -0500

    Add documentation, psuedocode, and modify

commit d984f5b11a26360b55d34b5d46b15b6f012f870b
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 21:32:20 2025 -0500

    Finish removing fence

commit c4bd2fbccd523bc0120da99df102697a5ee4180f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 21:20:23 2025 -0500

    Remove fence

commit 5ba16a45b233a739c52d2e2c0ea2cb63314b8812
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 20:53:42 2025 -0500

    Style change

commit 23ef8626de0df20bc9fa420847ca64d7612d5e25
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 20:50:51 2025 -0500

    Remove comments

commit 4f98c3b8a763d479de4ce6995b6faf6f38fed90e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 20:42:45 2025 -0500

    Straight line code

commit 265c812ecb1aab73608e0f7e97a4ebe5b711e2b2
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 18:01:42 2025 -0500

    Remove singlethreadpolicy

commit 6472ce17427eb6ca4a7bada95499cdfdcf9fb82e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 17:49:00 2025 -0500

    Minor fixes

commit f0ebda6a7b5f9232856831cd4367e2eb58d9ff1e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 14:49:32 2025 -0500

    Style changes for backend

commit 8a5a46d939e9cdfc6d6064c7978adbac7b9576dc
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 14:28:05 2025 -0500

    Minor fixes

commit a914cb31cba2480b30fba737beb528f53ae59a75
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 14:18:19 2025 -0500

    Remove inlining mechanism

commit 319649049827458464a887318b898be092745f08
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 13:59:43 2025 -0500

    Remove unused header file

commit 40facc053b1317701a8bb1376ef4d9921531ba33
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 13:54:31 2025 -0500

    Fix comment and variable name

commit 4358b0e4161e8f0a7558a3e3d3cb83d1d2754213
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 13:31:05 2025 -0500

    Encapsulate members in queue_pair

commit 99347175b1d83d2590ea01fa255905598053fcb8
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 13:19:17 2025 -0500

    Style change

commit 45c258c703693cef3b33baf269c22da53da24be1
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 13:16:06 2025 -0500

    Cleanup for queue_pair class

commit 06aa050ffa85aa4ab8dcdc9df9e07a5675d7e0cc
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 12:53:33 2025 -0500

    Add documentation for segments

commit 1dbd02f4ab4860bebaa495cf5d31ab17dd38700e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 12:10:35 2025 -0500

    Remove unused struct

commit d290cbf477878a84f9a1f4e5edcae41d6a047bac
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 12:07:33 2025 -0500

    Remove method

commit df37aa233a3f616630585667e2e7313cc296e5d1
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 12:04:16 2025 -0500

    Remove unused variable

commit 0ade3759464a56c1830b59b78dff744846edf10b
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 11:52:19 2025 -0500

    Cleanup files

commit e9a20b1d762c702141fe386b13263db526a02920
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 11:35:56 2025 -0500

    Style changes for queue_pair and segment_builder

commit f09ce8a7b9a4dae0158747199840d8a7c8ad16fd
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 11:16:14 2025 -0500

    Remove weird + 1 offset

commit 964b4a1e99bbf43406a4d735606b7aacb43e553e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 11:13:49 2025 -0500

    Rename sq fields

commit e5a2eb170a8cd05ba481cbb226528c8a6c8810ce
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 10:55:31 2025 -0500

    Remove unused headers

commit 90e53b9240900bf9eeb2427a93a602ce30a4edff
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Mon Apr 14 10:49:30 2025 -0500

    Cleanup gpu_ib context files

commit 93de9e9c1a8037f4f1f9570e90115132310b86ca
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 23:03:50 2025 -0500

    Continue document MLX structures

commit b900305b0fe88cce5de4729d6b54c189caf2fe48
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 22:46:54 2025 -0500

    Document gpu queue-pair MLX structures

commit a9e2ea1c78e753343759f46e295e97d0dd21b6a1
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 22:08:21 2025 -0500

    Bugfix for host RDMA_WRITE WQEs

commit 330d4f47383c42cd3aeec250fdb88bd263932820
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 22:04:16 2025 -0500

    Add host-side initial RDMA_WRITE WQEs back

commit f63d61363300ab60b91d11466634ac4b5799ce4e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 21:46:16 2025 -0500

    Try to remove host-side post_wqe

commit 180d088b65680a8c352b6159e05f5d7f9eb83c9d
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 21:39:40 2025 -0500

    Always allocate queues in gpu memory

commit 8376221faabfd6eb00a379ec6d4ef1be9e03bc5e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 21:16:14 2025 -0500

    Bugfix for connection class

commit 980827e582b6914f034bd773a2b48351ba310184
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 21:07:45 2025 -0500

    Refactor connection class

commit 2206134e0e8eede6bb6ebb02be6da4dc33b9ddf2
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 20:53:28 2025 -0500

    Refactor some files

commit d608c5c77ab23e4609e43917c8c2aa9d3ad79d1b
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 20:18:05 2025 -0500

    Update connection class

commit 5aa7017f473509585aa1acb702adb86391ddb89f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 19:12:06 2025 -0500

    Cleanup connection and network classes

commit 0159d8d615364c24b2bee2ca31a76d1a5f44fbd9
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 18:50:50 2025 -0500

    Remove unused member

commit 4dd77e49ab2ac6fc8d6b4e380f16d2dfcd6f9de4
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 17:54:29 2025 -0500

    Add uncached heap option

commit a62b0ccec7e0b19677d4dac3669392e8c6838921
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 17:05:53 2025 -0500

    Device mem for cq/sq queues

commit 9877461f13fde11f7b647b84fd74208fca7dc4ad
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 16:49:40 2025 -0500

    Change heap allocation policies

commit 73217f1a50edd38514092bc1a32827b53ee466ae
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 16:15:36 2025 -0500

    Remove compile options and cleanup

commit 41f9349df1ce17a052981e0945c4b81309ae74bb
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 15:24:29 2025 -0500

    Cleanup coalescer files

commit 0b4559e87e5bdb590cb5864f4978a2a36628cb8c
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 15:20:26 2025 -0500

    Cleaup files

commit 70a654e1f4828f210eb70ea2fd49a0b4748d1bb7
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 15:11:17 2025 -0500

    Cleanup rocshmemgpu and team files

commit d0a5f62192d6d4d3f9d6bfa5e5699ca9c920e1d5
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 14:56:34 2025 -0500

    Cleanup gpu ib team files

commit 0e4fa1472af4b6d4b0de9b091ad1b74a65a781c8
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 14:38:53 2025 -0500

    Add inline and cleanup

commit 145095464e10de9d0fb78c8ccaab8b279b602808
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 14:25:27 2025 -0500

    Cleaup file

commit 2e09aa27640e10d057da9d00f83ceca8d90efce5
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 13:50:20 2025 -0500

    Cleanup host files

commit 7d870601f63652495d3ce792c3e2385788529275
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 13:18:50 2025 -0500

    Minor style changes to context_device

commit 1ce38ae151fdf4603bfef0ae4af5c8bbc6dbf0f7
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 13:03:30 2025 -0500

    Remove unused constants

commit 55e3b6f8e96f979febe234d5e4b64d6ed5de2a8e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 12:57:57 2025 -0500

    Remove unnecessary init functions

commit dbd208ea6e3526d80778d5fca0f07a6e6d5dc869
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 12:45:48 2025 -0500

    Remove manage memory stubs

commit dcd655ca3c6ba7a3367fe7f1e2f59349b2010306
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 12:34:55 2025 -0500

    Remove comment

commit 4e5fea4ca3789f3d5fb2b7334a124b79125138c7
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 12:23:41 2025 -0500

    Remove unused ThreadImpl types

commit ff3ed3d4f5cd51a2e7f1cb95a4553887ff12ca2f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 12:03:34 2025 -0500

    Move constant into different file

commit 90598f25b324f0ae899e47e56cc5eea1acb4b098
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 11:53:50 2025 -0500

    Remove g_ret mechanisms

commit 7ec81c5ce211f287b51091b95df0cf0822cc4997
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 11:36:42 2025 -0500

    Remove unused externSharedBytes method

commit b62944d2193ce5c026b266426fb9d3f8a0c57938
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Sun Apr 13 11:25:52 2025 -0500

    Remove unused variables

commit b5cd7f266f3d942258b94522ef5408a17a988263
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Fri Apr 11 11:15:24 2025 -0500

    Tear out internal references to removed atomics

commit db21ad6b145a903306fad6c76b87a3b4992556a9
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Fri Apr 11 10:57:54 2025 -0500

    Remove unused atomic types

commit 123cfb5a9c2b9e5c59c5cb3e8d058878a570143c
Merge: 684dd1c4 c26503f3
Author: Brandon Potter <BKP@users.noreply.github.com>
Date:   Fri Apr 11 10:18:00 2025 -0500

    Merge pull request #71 from Yiltan/yiltan-cleanup-april-11

    Yiltan cleanup april 11

commit c26503f345cf9f7412fc96184a12641985ad46de
Author: Yiltan <yiltan@amd.com>
Date:   Fri Apr 11 10:09:02 2025 -0500

    removed unused collevtive buffers

commit 4079afa7191c4bd538152f9b55e72b27464f8d34
Author: Yiltan <yiltan@amd.com>
Date:   Fri Apr 11 10:02:05 2025 -0500

    removed USE_SINGLE_NODE

commit 3dcb1edb7e79788a0505eb89cf64f7a38e000df3
Author: Yiltan <yiltan@amd.com>
Date:   Fri Apr 11 10:00:38 2025 -0500

    removed network impl off

commit a35d9e32b65138b26281bbd7c29da58a89b6cf23
Author: Yiltan <yiltan@amd.com>
Date:   Fri Apr 11 08:42:38 2025 -0500

    removed reliable connection into connection

commit 99177002760b2e598d3ec69a5ae69b20495a9c80
Author: Yiltan <yiltan@amd.com>
Date:   Fri Apr 11 08:37:34 2025 -0500

    remove rocshmem_calc.hpp

commit 2bd9fb4e99b38af6c661e28348c972aa75ca3adc
Author: Yiltan <yiltan@amd.com>
Date:   Fri Apr 11 08:33:17 2025 -0500

    Removed more unused files

commit 684dd1c43452982f8b1791aae868d744b9d61d91
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 16:58:20 2025 -0500

    Remove straggler wait_until variants

commit 7358ce0b3e128894729bdf08638f2e55e99f9147
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 16:51:14 2025 -0500

    Remove get variants

commit 3559a12eb5ed316c2fedb010d5db3667cf2bb215
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 14:42:32 2025 -0500

    Remove unnecessary interfaces

commit 9836fdf63cefbd9d8cdaf028fa21f688e20d865f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 12:12:43 2025 -0500

    Tear out SYNC, WG_RMA, related functional tests

commit b35b8d86dc7e24a837021a9a6901c48795e5904b
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 11:13:55 2025 -0500

    Tear out signal ops from include and dependencies

commit 54cb94c46f561be054ed9cfdf592c023eed2f3c8
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 10:46:37 2025 -0500

    Remove debug header

commit 99e1fee3554b680bdd194f6127ee16ff02165c87
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 10:42:21 2025 -0500

    Tear out collectives from include and dependencies

commit 6cc6dfecdf0092507df51eea67531a7d4b844067
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 08:46:57 2025 -0500

    Remove empty RC functions

commit 811a4b48d6a43f2381d43c9f8fdb020e1bd24916
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 08:36:03 2025 -0500

    Remove qe dumper and debug

commit 785d15563d5a4ca50c832769d0c4ed46e60ddd69
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 08:32:06 2025 -0500

    Remove helper_macros header since dependencies removed

commit 30ef452825c5d7dae949d2bead11ca4818e59784
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 08:28:41 2025 -0500

    Remove dev_mono_linear strategy

commit ad8106bae31fffe8c731fd21007f7ca9bad9abe0
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 08:20:40 2025 -0500

    Remove container strategies

commit 2d3e883a1bf67c9fee171214832802f96bf2e9bf
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 08:15:05 2025 -0500

    Remove bitwise gtest and matrix container

commit 54c6e819185237901745396b03d76230c4f75594
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Thu Apr 10 08:10:18 2025 -0500

    Remove array container

commit f92ffc42822dc46e9e887eaff8608a666a2b0116
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 23:46:33 2025 -0500

    Remove DC transport files

commit 9a89d0450e7629a97fec70b48b6b94d1d65eb9ff
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 22:37:33 2025 -0500

    Remove relative pathing for includes

commit 5b8c7294d780dc08a11fa1dbc93b48a96a75c13f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 22:07:21 2025 -0500

    Remove todo notes

commit 7e920054296a152a410bb576f680adb85063ae45
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 21:58:23 2025 -0500

    Remove extra line

commit 53cbd975f97937f62f6f7673f8529124723c494e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 21:49:29 2025 -0500

    Merge backend classes

commit d5f10791544ce14d7d6d7149bd926e7d034926dc
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 14:19:27 2025 -0500

    Remove USE_RO and USE_IPC conditions

commit 3f60ae758fb8378f5d97f0663e603cda3d3d51f0
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 13:52:36 2025 -0500

    Tear out IPC call points

commit 7b0239fc69a79c34fa5f6eecc46ac767ef9b65a3
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 10:28:48 2025 -0500

    Tear out hdp_policy

commit db02bd53c6d2b92cb5bbd78268f9e6dd48e868fe
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 10:19:29 2025 -0500

    Convert backend_type to GPUIB only

commit 86af41f0e38af3017376666c23231e03bccc7d67
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 10:06:24 2025 -0500

    Tear out IPC conduit

commit 48cc0f326686af412ab6b37d8a208061c85bfc33
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 10:01:55 2025 -0500

    Tear out RO conduit

commit ebdbeb08464633f79f8dd9c315d9b714565d1b83
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 09:57:25 2025 -0500

    Tear out atomic and notifier files

commit 9c7d699b6fa03a974a6cfba0584e912a9026dcd2
Merge: 4d1213ef 62906a5f
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 09:13:31 2025 -0500

    Merge branch 'gpu-ib-working-draft' into bpotter/gpuib_mini

commit 62906a5fbd1882a92c807057b49c6b929b6c829e
Merge: 81693634 d2326a15
Author: Yiltan <ytemucin@amd.com>
Date:   Wed Apr 9 10:09:19 2025 -0400

    Merge pull request #67 from Yiltan/gpu-ib-working-draft

    Removed HDP code and error checking to ibv_* functions

commit 4d1213ef2d6d413883c67453295c7d5c88667a3e
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 08:40:46 2025 -0500

    Remove unused wrapper class

commit 8d80c0dd18579c2d42d96ac6d91d3fe09ffb5f71
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Wed Apr 9 00:05:57 2025 -0500

    Remove unused EBO spinlock

commit 282f6dbb6203a3cbed909cf2d660f5e148da4960
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 8 23:56:15 2025 -0500

    Remove slab heap

commit f83b20bd713ae25313d93967c3c5877fa794f90d
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 8 23:46:33 2025 -0500

    Remove unused unit test for ipc

commit 8eaf49e37cbccf5e5bd4e5743f81fe4264c23580
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 8 23:44:45 2025 -0500

    Fix store_asm function and util memcpy funcs

commit 376357961cd69e2bc1fd96ea69f9e296a6114ce6
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 8 23:05:42 2025 -0500

    Replace wallClk code with hip function

commit 8c34400fd1dc040a95974e0bb5db362d7a6e7550
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 8 22:44:11 2025 -0500

    Remove unused __read_clock function

commit eea7e817537451013ca8a498bd413833b464f766
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 8 22:18:13 2025 -0500

    Remove unused forward_list

commit 8869129a122d7b1c42a3f00754960a331be051e6
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 8 22:01:15 2025 -0500

    Disable verification of functional tests

commit 6e35940cdbb041cac2f1e03590be79934516b281
Author: Brandon Potter <brandon.potter@amd.com>
Date:   Tue Apr 8 22:00:52 2025 -0500

    Increase functional test loop size to 200

commit d2326a15cff0a17ab90c9e9885c4a5aff3de1137
Author: Yiltan <yiltan@amd.com>
Date:   Mon Apr 7 12:55:32 2025 -0500

    Added error checking to verbs functions

commit 3a1034d64f8a086687ce6143b392651f9c12b0b9
Author: Yiltan <yiltan@amd.com>
Date:   Mon Apr 7 10:21:06 2025 -0500

    removed unused file

commit e874bba7865806cbbf7a758c74115ec32e8b7867
Author: Yiltan <yiltan@amd.com>
Date:   Mon Apr 7 10:19:07 2025 -0500

    removed hdp comments

commit 45ee7e3c69afdb0679785ad63c84f586ee4df41c
Author: Yiltan <yiltan@amd.com>
Date:   Mon Apr 7 09:54:46 2025 -0500

    fixed dc

commit 81693634b31507ada6fb892e9ce645f29fb70841
Author: Yiltan <yiltan@amd.com>
Date:   Thu Apr 3 14:17:54 2025 -0500

    cant lock cq if on device mem

commit 34acee47d6ba5c6a9a5af669e8f5f855d479c292
Author: Yiltan <yiltan@amd.com>
Date:   Thu Apr 3 08:06:37 2025 -0500

    null ptr

commit a7431148884c5c782b1e301020ee8c30dee82643
Author: Yiltan <yiltan@amd.com>
Date:   Wed Apr 2 15:25:44 2025 -0500

    comment out hdp

commit a0865b4d1b77e2a55726af20fda628cf9ee33c94
Author: Yiltan <yiltan@amd.com>
Date:   Wed Apr 2 12:50:00 2025 -0500

    GPU_IB Compiles

commit a1940e0d99b9c59b879b4ba97cdcdd349e2c8396
Author: Yiltan <yiltan@amd.com>
Date:   Wed Apr 2 10:07:04 2025 -0500

    Add GPU IB back

* Revert "Only issue a single completion per wavefront (#199)" (#205)

This reverts commit d9e2fed2f7e55d266c7dfcacc4641b92a3b008ed.

(cherry picked from commit c931145560e357b267a9b693c56de6915458702f)

* GDA Cmake modifications, move topology to gpu_ib specific folder

* Do not use ../thing.h

* Use WF_SIZE: AMDGCN_WAVEFRONT_SIZE is deprecated

* 2-way merge between context_ipc and context_gpuib

* Select MTU based on network config (#214)

* rocSHMEM GDA BNXT POC (#213)

* rocSHMEM GDA PoC for Thor 2 (233.2.76.0)

(cherry picked from commit abe172f74d32c26ae714ee329088fcb39f07da60)

* Rename gpu_ib to gda

* Renaming part2: includes and cmakery

* Fix DISPATCH macro; use backend_comm when needed; some GPUDevices where
left

* Consolidate GDA_CHECK_NNULL/CHECK_ZERO/CHECK_HIP to look and feel
similar

* Update copyrights to the new style

* Rework default-ctx init, missing heap init, missing qpe field

* backend_gda: single init, use systematic naming for setup/cleanup,
prefix team structures,

* setup_wrk_psync must precede setup_teams etc

* silence recasting error

* Some remnants of GDADevice and missing friend classes, public some
fields, it compiles

* Fix redefinitaion of CHECK_HIP in functional testers, we still have a
duplicate definition that would probably be better having only one

* typo in backend_type

* Undo unneeded change to functional test driver

* Add -lnuma

* ctx must be initialized after qps

* gda: Disable non-functional tests  (#216)

* Do not try to run functional tests that are not implemented

* Revert "Increase functional test loop size to 200"

This reverts commit 6e35940cdbb041cac2f1e03590be79934516b281.

* Make a specific test case for gda

* Disabled further tests that do not currently pass with explanation as to
why disabled

(cherry picked from commit dc0b8e889621b8a6c4685c44394350947f9b547c)

* gda_devel: teams with MPI initalization (#229)

* Fix missing communicator initialization

* Reenable team functional testers

---------

Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
(cherry picked from commit 2c74a458600e848751b8b6f9890fd0e677c2a4c2)

* [GDA] Query for the correct GID index (#215)

* Added GID query code for CX7/Thor2 NICs

(cherry picked from commit 3b7c42d745756ed5866e501614d0d54ef8fc072f)

* Reorder code to make ipc and gda more similar

* Do not double free Wrk_Sync, uniform styling with ipc

* Remove unused includes

* Abort when using not-implemented device functions

* BNXT Compiles

* Silence compiler warnings

* Cleanup unused .h

* Uniform indentation between ipc and gda

* gda: add cleanups, address todos

* Disable pingpong tests, enable defaultctxtest

* Reenable testing non-fetching amos

* build scripts: use a single script backed for all gda variants
enable configuring INSTALL_PREFIX and BUILD_TYPE from the command line
same order in all scripts

* fix: prevent double free in `GDADefaultContextProxy` with custom move assignment

* The default move assignment, invoked during initialization of
  `default_context_proxy_`, caused the default context’s QPs to be freed
  prematurely because the destructor is triggered by the xrvalue after
  initialization.

* Undo changes to the amo standard tester during gda_devel dd675b45, as
they cause RO failures

---------

Co-authored-by: Yiltan <ytemucin@amd.com>
Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
Co-authored-by: Yiltan <yiltan@amd.com>
Co-authored-by: avinashkethineedi <Avinash.kethineedi@amd.com>
Co-authored-by: bpotter <brandon.potter@amd.com>

[ROCm/rocshmem commit: 69bd4bfe44]
2025-09-05 12:21:18 -04:00
Edgar Gabriel 56eb68bc4a Add extended team tests (#207)
Create teams in the functional test that are not a duplicate of the
ROCSHMEM_TEAM_WORLD. THis commit contains only infra-tests to make sure
that n_pes and my_pe on the new teams is indeed correct.

[ROCm/rocshmem commit: e95360961d]
2025-08-01 08:50:14 -05:00
Avinash Kethineedi 2a7416d016 Implement rocshmem_ptr in IPC conduit (#197)
* Implement `rocshmem_ptr` in IPC conduit

* tests: add functional test for `rocshmem_ptr`
  - Add safety check for pointer access and condition check before printing results for `rocshmem_ptr` test
  - Use `rocshmem_put` to store `rocshmem_ptr` availability for data validation

[ROCm/rocshmem commit: 526105d315]
2025-07-28 12:01:02 -05:00
akolliasAMD d2e4a18f11 changed the function tests name on the codebase (#177)
[ROCm/rocshmem commit: ebd92a7b3c]
2025-07-04 13:28:59 -06:00
Aurelien Bouteiller 08d8324f74 Rework cmakery: (#136)
* Rework cmakery:
  * detect rocm/hip/rocshmem better, make sure that ROCM_PATH and
    ROCM_ROOT don't conflict and are taken by default
  * add /opt/rocm as a fallback when nothing else found
  * obtain hipcc in a sanitized way (ensure we use the same logic we
    use to later find_package hip)
  * factorize redundancies
  * export GPU_TARGETS as part of the cmake target for librocshmem,
    this helps with a clean error when an application tries to link
    with the wrong offload-target flag (rather than a cryptic link error)
  * phased out ROCSHMEM_HOME, in favor of rocshmem_ROOT (the cmake
    blessed way)

* Remove references to ROCSHMEM_HOME, we prefer ROCSHMEM_ROOT

* Pick CMAKE_PREFIX_PATH method for consistent finding hip/rocm

* Undo this pr using LANGUAGE HIP, maybe later

* Use only rocmcmakebuildtools as recommended from 6.4 onward

[ROCm/rocshmem commit: ee5363be7a]
2025-06-18 11:46:33 -04:00
Avinash Kethineedi 14756a73b1 Refactor Barrier_all and Sync_all APIs to use default context (#159)
* Refactor `Barrier_all` and `Sync_all` to use default context

- Removed context-specific implementations of barrier_all and sync_all
- Added barrier_all and sync_all to the default context implementation
- Updated functional tests to use the default context for barrier_all and sync_all

* Update `Barrier_all` and `Sync_all` API usage in documentation

* Update `CHANGELOG`

---------

Co-authored-by: Yiltan <ytemucin@amd.com>

[ROCm/rocshmem commit: bf48bcabf2]
2025-06-17 11:16:18 -05:00
akolliasAMD 482490f48f added init example and all_reduce example on the files (#150)
* added init example and all_reduce example on the files

* typo fix on folder name

[ROCm/rocshmem commit: 08a6a733d8]
2025-06-13 15:28:13 -04:00
Aurelien Bouteiller f3345dbf05 Use finegrain allocator by default (#140)
* Use FineGrained allocator for heap by default, consolidate all types of
allocators under saner cmake controls

Co-authored-by: Yiltan <ytemucin@amd.com>

* Uncached may not be only for debug

Need to include the rocshmem config otherwise produce an inconsistent
build with different allocators used in different files

* Undo this pr adding presumably useless hip_host_allocator_noncoherent

* Rename HEAP_IS_COHERENT/USE_COHERENT_HEAP to USE_HDP_FLUSH as the former
was misleading

* Remove unused __roc_inv()

---------

Co-authored-by: Yiltan <ytemucin@amd.com>

[ROCm/rocshmem commit: 41fd9e2d57]
2025-06-13 15:26:26 -04:00
Yiltan 381625f060 [SWDEV-536571] Update OMPI Commit (#152)
Signed-off-by: Yiltan Hassan Temucin <yiltan.temucin@amd.com>

[ROCm/rocshmem commit: e340a220f9]
2025-06-09 11:03:48 -04:00
Jobbins 229e97afef Fix typo (#147)
[ROCm/rocshmem commit: e0ef34a9d1]
2025-06-04 10:46:38 -06:00
Yiltan 92bb6aeaaa [SWDEV-534546] Disable building tests in default build (#141)
[ROCm/rocshmem commit: 9fe166c8e1]
2025-05-26 16:50:22 -04:00
Jobbins 22ea3f0c8b Code Coverage (#82)
code coverage:  generate code coverage reports

* Add instrumentation flags to rocshmem target when adding -DBUILD_CODE_COVERAGE cmake flag
* Add helper script to build all subprojects and generate code coverage reports
* Update README with code coverage instructions

[ROCm/rocshmem commit: 474112d03c]
2025-05-16 09:09:17 -06:00
Aurelien Bouteiller 19e98852af cleanup leftovers from SOS testers removal (#97)
Followup to pr#85

[ROCm/rocshmem commit: f0501550f7]
2025-05-02 11:59:52 -04:00
Aurelien Bouteiller 27d1189ff3 Substitute pow2bin allocator with a dlmalloc based allocator (#71)
* Add dlmalloc_strat allocator strategy
 - Use mspace variant to ease encapsulation
 - Make pow2bins and dlmalloc cmake selectable
* Add unit tester for dlmalloc, rework single_heap, pow2bins unit testers
accordingly
 - add dlmalloc get_used/get_avail, and have all strats allocators also have a get_used
 - Rework memallocator unit tests: bin size is per strat, alignment is verified in singleheap
* bugfix: dlmalloc exposed that the pingpong test would write past end of
allocation with -w 32
* iostream leakage/mixed usage of cerr and fprintf(stderr

---------

Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: b835de6cd5]
2025-05-01 11:55:23 -04:00
Aurelien Bouteiller 19e7b4798e Show and log what the functional test driver is running (#70)
Show and log what the functional test driver is running
* Log errors in the log file
* list all failed tests at the end
* pretty colors :x
* Print stderr when the test has failed

---------

Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: 67bc5b9e5a]
2025-04-23 10:21:35 -04:00
Avinash Kethineedi c4de6833f6 Add SPDX license identifiers and update copyright headers (#85)
* Update copyright information and add SPDX license identifier

* Update AUTHORS

* Remove `sos_tests`

[ROCm/rocshmem commit: f6ef19f5a9]
2025-04-15 15:37:53 -05:00
Yiltan ea2df2aa26 Added sphinx dependencies (#84)
[ROCm/rocshmem commit: 5ee0c3407e]
2025-04-15 11:28:16 -04:00
Edgar Gabriel 5b22ddd1ff add new flag to build instructions (#78)
This flag is required to link a pytorch use-case correctly.
It doesn't seem to impact the rocSHMEM code.

[ROCm/rocshmem commit: 5e49567b6c]
2025-04-10 08:39:54 -05:00
Yiltan 4d6fd799ef Enable RO CI (#65)
[ROCm/rocshmem commit: 25e7109b64]
2025-04-08 16:12:22 -04:00
Avinash Kethineedi 9bd2b04899 Update Barrier and Sync APIs (#73)
* Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context
 - Implemented `barrier` APIs for thread, wavefront, and workgroup scopes
 - Added support into both IPC and RO conduits
 - Added functional tests to cover all `barrier` APIs
 - Removed collective operations on default context

* Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits.
  - Implemented `sync` APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync` APIs

* update naming convention for context-based `barrier` APIs

[ROCm/rocshmem commit: dc61bca066]
2025-04-08 11:25:31 -05:00
Avinash Kethineedi 426bbf525b Update Barrier_All and Sync_All APIs (#72)
* Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs
  - Added separate pSync buffers for each device context
  - Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts

* Update barrier_all functional tests for multi-context support

* Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits
  - Implemented barrier_all APIs at thread, wavefront, and workgroup granularity
  - Added support in both IPC and RO conduits
  - Updated functional tests to cover all `barrier_all` APIs

* Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits
  - Implemented sync_all APIs for thread, wavefront, and workgroup scopes
  - Added support into both IPC and RO conduits
  - Added functional tests to cover all `sync_all` APIs

[ROCm/rocshmem commit: c652f58cef]
2025-04-02 11:58:55 -05:00
Edgar Gabriel 7aecbdec4d update README documentation for RO (#63)
* README: update documentation for RO support

update the README and the install_dependencies script to match the
requirements of the RO conduit.

* add CODEOWNERS file

[ROCm/rocshmem commit: 4e48c9748e]
2025-03-25 07:50:15 -05:00
Avinash Kethineedi baca5fd7a1 Fix/RO Backend Hang Issue (#53)
* Update HIP version check for compatibility with versions >= 5.5

* Update memory allocator for context BlockHandle
   - Replaced `HIPAllocator` with `HIPDefaultFinegrainedAllocator` for context `BlockHandle`.

* Update run commands for `rocshmem_g` and `rocshmem_p` functional tests

[ROCm/rocshmem commit: c16b0d6952]
2025-03-24 22:54:07 -05:00
Edgar Gabriel 1ee9b72449 add rocshmem_barrier() (#61)
* add team-barrier implementation

add a team-barrier API and implementation in the IPC and RO conduit.
Clean up some of the logic in the RO Conduit to distinguish between
sync, sync_all, barrier, and barrier_all.

* add team_barrier_tests to functional tests

[ROCm/rocshmem commit: bcbc42e78f]
2025-03-24 11:23:03 -05:00
Yiltan 1ed4512106 Removed GPU_IB (#59)
[ROCm/rocshmem commit: 658bf2a3b5]
2025-03-24 09:04:52 -04:00
Yiltan 6d6dccfebe Sync Reverse Offload Scripts (#52)
* Sync Reverse Offload scripts
- Disable IPC unit tests when IPC is not available in the rocSHMEM configuration

* Added missing ptr in ipc_policy

[ROCm/rocshmem commit: 3428957de9]
2025-03-19 14:31:07 -04:00