rocm-systems

Auteur	SHA1	Message	Date
Edgar Gabriel	72286016c6	add additional runtime checks and gfx1201 fix (#2806 ) * add additional runtime checks and gfx1201 fix This commit contains three fixes: - increase the max. number of files at the beginning of the run to the max. allowed by the system - check for large BAR support. WE don not abort if its not available, but print a warning. - for gfx1201, do not use uncached memory at the moment. * Change get_arch_name to return const char* Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix C++ new syntax not sure how it compiled before Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * use snprintf instead of strncpy Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * destructor cleanip Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add const keyword --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-24 13:31:10 -06:00
systems-assistant[bot]	563776a949	[GDA/IONIC] only ring doorbell on active lanes (#2727 ) Co-authored-by: Yiltan <yiltan@amd.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com> Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>	2026-01-23 09:26:46 -05:00
Yiltan	55aab4d62e	[Docs] Clarify ROCSHMEM_HEAP_SIZE (#392 ) * clarify ROCSHMEM_HEAP_SIZE * Apply suggestions from code review Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> --------- Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> [ROCm/rocshmem commit: `0496586829`]	2026-01-20 17:22:18 -05:00
Allen Hubbe	3edd56ca23	gda ionic: ccqe cleanup and error check (#389 ) Delete unreachable ccqe polling path, ionic_poll_wave_ccqe(). Move cqe error check to ionic_quiet_internal_ccqe(). Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> [ROCm/rocshmem commit: `6b00964f32`]	2026-01-20 15:26:53 -05:00
Edgar Gabriel	55e2b501d3	replace memset with hipMemset (#390 ) [ROCm/rocshmem commit: `bc70ce551c`]	2026-01-20 08:14:25 -06:00
akolliasAMD	2606c13155	Tests package (#384 ) * added packaging for the tests and for the driver.sh * making .sh files into programs so they keep permissions [ROCm/rocshmem commit: `e7269cb925`]	2026-01-16 09:10:36 -07:00
Aurelien Bouteiller	ede2adfe49	new tester: put to all pes from all lanes concurrently (#112 ) * Add put to all pes from all lanes concurrently * Remove wg_init, use size_t for size params, 64bit data exchange (more bits for verification masking) * Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time correctly * Add flood tester to the testing script * add to gda test case w/o the _g variant that is not implemented. [ROCm/rocshmem commit: `cca7872bcf`]	2026-01-16 10:40:48 -05:00
Edgar Gabriel	3ce10dc688	fix allreduce tester (#385 ) - use the reduce_psync buffers for synchronization in allreduce, not the barrier_psync. - execute a wwg barrier after the allreduce operation. After internal discussion it was determined that it is required for correctness. [ROCm/rocshmem commit: `6f512e92a5`]	2026-01-16 08:10:25 -06:00
Omri Mor	93493e3e46	ionic: fix byteswap functions (added in #345 ), missed in #368 (#388 ) [ROCm/rocshmem commit: `885e41ec62`]	2026-01-15 14:19:19 -08:00
Omri Mor	3260759dfd	Replace byteswap interface to align with C++23 std::byteswap (#368 ) * byteswap<T> returns by value * replace hand-rolled implementations with Clang __builtin_bswap<N> intrinsics * new high-level interface endian::to_be, endian::from_be, etc. to indicate conversion direction [ROCm/rocshmem commit: `cf8b72a047`]	2026-01-15 13:03:01 -08:00
yugang-amd	bcd9119dbc	Bump rocm-docs-core to 1.31.2 (#387 ) [ROCm/rocshmem commit: `491739c9b4`]	2026-01-15 13:17:51 -05:00
dependabot[bot]	12d9d45667	Bump urllib3 from 2.6.0 to 2.6.3 in /docs/sphinx (#383 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.6.0 to 2.6.3. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.3) --- updated-dependencies: - dependency-name: urllib3 dependency-version: 2.6.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/rocshmem commit: `f9fc022ed5`]	2026-01-09 08:27:43 -05:00
Aurelien Bouteiller	6cad766d4e	dlclosing the dvlib may leave libibverbs in a broken state (#381 ) * Error out when IPC gets selected when it is impossible to run it. * Use RTLD_LAZY when dlopening * Do not dlclose libbnxt/ionic/mlx5.so as that breaks libibverbs [ROCm/rocshmem commit: `47f6fa6267`]	2026-01-08 13:40:11 -05:00
Yiltan	51d26b7cea	Fix __match_any_sync on ROCm 6.x (#382 ) [ROCm/rocshmem commit: `e47cff7f45`]	2026-01-08 11:25:16 -05:00
dependabot[bot]	645236aadd	Bump pynacl from 1.5.0 to 1.6.2 in /docs/sphinx (#379 ) Bumps [pynacl](https://github.com/pyca/pynacl) from 1.5.0 to 1.6.2. - [Changelog](https://github.com/pyca/pynacl/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/pynacl/compare/1.5.0...1.6.2) --- updated-dependencies: - dependency-name: pynacl dependency-version: 1.6.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/rocshmem commit: `fb644ddfa9`]	2026-01-07 14:39:00 -05:00
Aurelien Bouteiller	8d2dca4505	Fix DEBUG build (#378 ) [ROCm/rocshmem commit: `27d87b8b67`]	2026-01-07 10:39:57 -05:00
Allen Hubbe	67536a85ef	gda ionic: collapsed cqe (#345 ) * util: dlsym optional helper Like DLSYM_HELPER, but does not return if the symbol is not found. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: sync dv and fw headers Sync dv and fw headers to match out-of-tree libionic and firmware. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: collapsed cqe Detect and enable collapsed cqe if supported by drivers and firmware. Fall back to regular completion queue. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> --------- Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> [ROCm/rocshmem commit: `1494c24f9a`]	2026-01-06 20:42:15 -05:00
Aurelien Bouteiller	bcdf60def6	Enable new a2a (pr 334) on ionic as well (#366 ) * Enable new a2a (pr 334) on ionic as well * Apply suggestions from AI code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> [ROCm/rocshmem commit: `82d91433c9`]	2026-01-06 20:41:51 -05:00
Edgar Gabriel	e38f98fad5	fix reduction test for gfx1201 (#374 ) * fix reduction for gfx942 and 1201 match the synchronizaation of internal_putmem_wg and internal_getmem_wg to their non-internal counterparts. the internal_putmem_wg is used in the ipc reduction * move specialization to internal_putmem [ROCm/rocshmem commit: `8d2504d6c1`]	2026-01-06 10:15:38 -06:00
Edgar Gabriel	cc727261de	disable the putmem_signal_on_stream on RO (#376 ) it fails in about 50% of the cases. Will revisit later why it fails, but RO is at the moment lower priority, so disabling the test for now. [ROCm/rocshmem commit: `ed2f75f1de`]	2026-01-06 08:10:46 -06:00
Aurelien Bouteiller	abb1e0684a	Do not hardcode wf_size==64 in ionic provider (#367 ) * Do not hardcode wf_size==64 in ionic provider * Simpler same_qp_mask in ionic [ROCm/rocshmem commit: `0c496d83d6`]	2026-01-05 18:36:58 -05:00
Omri Mor	56bfb13644	QueuePair: prefix bnxt functions and variables (#373 ) [ROCm/rocshmem commit: `f5940f6b9a`]	2025-12-22 14:46:17 -08:00
Omri Mor	c43dc136f3	[Bugfix] GDA/bnxt: release SQ lock before return (#372 ) * bnxt_post_wqe_amo_single with fetching = true would return before releasing the send queue lock, resulting in a deadlock. * Release the send queue lock before returning from the function. [ROCm/rocshmem commit: `016e08120a`]	2025-12-22 12:05:00 -08:00
Kutovoi, Vadim	a4b99485a9	gda/ro: validate and exit cleanly when forced GDA config is invalid (#354 ) * gda: validate and exit cleanly when forced GDA config is invalid * ro: validate and exit cleanly when forced RO config is invalid [ROCm/rocshmem commit: `80a710ac0a`]	2025-12-22 10:54:33 +00:00
Omri Mor	ed38201b90	gda: fix incorrect casts from void* to uintptr_t (#369 ) [ROCm/rocshmem commit: `e8fc5e67c4`]	2025-12-19 16:18:49 -08:00
Dimple Prajapati	e21c087f2a	[BugFix] Fix rocshmem_get_device_ctx to return ctx_opaque pointer (#359 ) Changed rocshmem_get_device_ctx() to properly copy the full rocshmem_ctx_t structure and return only the ctx_opaque pointer instead of trying to copy directly to a void pointer. Prior implementation would cause undefined behavior or memory corruption as it was copying 16 bytes of data to 8 bytes. It worked so far beucase ctx_opaque field is at proper offsest, but incorrectly memcpy would overwrite some other allocations and cause issues. This fixes the context memory handling when passing device context from host to device kernels. [ROCm/rocshmem commit: `cf6a53e81c`]	2025-12-19 10:01:02 -05:00
Aurelien Bouteiller	dde4902844	Fix driver.sh script for system where neither amd-smi or rocm-smi are (#370 ) found [ROCm/rocshmem commit: `5eaa152010`]	2025-12-19 10:00:11 -05:00
dependabot[bot]	750d3f8b2e	Bump urllib3 from 2.5.0 to 2.6.0 in /docs/sphinx (#365 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0) --- updated-dependencies: - dependency-name: urllib3 dependency-version: 2.6.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/rocshmem commit: `166a591216`]	2025-12-19 09:55:42 -05:00
Yiltan	43363894a8	remove cq lock (#357 ) [ROCm/rocshmem commit: `b28a56bd54`]	2025-12-15 09:23:24 -05:00
Yiltan	5e2ba952f3	[GDA/BNXT] Remove doorbell arbitration (#363 ) [ROCm/rocshmem commit: `fe1a28e409`]	2025-12-15 09:23:01 -05:00
Edgar Gabriel	f9fd5d3cdd	use 64 threads for reduction test (#360 ) * use 64 threads for reduction test much faster with IPC backend. * change all relevant collective tests. [ROCm/rocshmem commit: `c35210f174`]	2025-12-15 08:14:18 -06:00
yugang-amd	195fe4e5ee	GDA docs style edits (#362 ) * Update docs/install.rst Co-authored-by: yugang-amd <yugang.wang@amd.com> * Update docs/install.rst Co-authored-by: yugang-amd <yugang.wang@amd.com> * Update docs/install.rst Co-authored-by: yugang-amd <yugang.wang@amd.com> * Update docs/install.rst Co-authored-by: yugang-amd <yugang.wang@amd.com> * Update docs/install.rst Co-authored-by: yugang-amd <yugang.wang@amd.com> * Update docs/install.rst Co-authored-by: yugang-amd <yugang.wang@amd.com> * Update docs/install.rst Co-authored-by: yugang-amd <yugang.wang@amd.com> * Update docs/install.rst Co-authored-by: yugang-amd <yugang.wang@amd.com> * Update docs/install.rst Co-authored-by: yugang-amd <yugang.wang@amd.com> * Update docs/sphinx/_toc.yml.in Co-authored-by: yugang-amd <yugang.wang@amd.com> * Apply suggestions from code review Co-authored-by: yugang-amd <yugang.wang@amd.com> --------- Co-authored-by: Yiltan <ytemucin@amd.com> [ROCm/rocshmem commit: `bbad1d8539`]	2025-12-10 17:03:58 -05:00
Aurelien Bouteiller	e783b47388	Bump version number for post-7.2 devel (#356 ) [ROCm/rocshmem commit: `64460f0ec9`]	2025-12-10 13:03:20 -05:00
Yiltan	258d264ecc	Add default context alltoall API (#350 ) [ROCm/rocshmem commit: `fddbe7b15d`]	2025-12-10 11:43:15 -05:00
Aurelien Bouteiller	972893bab2	Reenable building test-only with external MPI (#352 ) [ROCm/rocshmem commit: `1a16b3bedc`]	2025-12-10 11:40:29 -05:00
Aurelien Bouteiller	92459fa840	Update version to 3.2.0 for 7.2.0 rocm release (#351 ) [ROCm/rocshmem commit: `ef5f2be215`]	2025-12-09 10:26:55 -05:00
Anatolii Rozanov	f98c72d627	Add host API for _on_stream operations (#340 ) Add functional test for barrier_all_on_stream * Add rocshmem_barrier_all_on_stream support for GDA and RO backends Implements rocshmem_barrier_all_on_stream operation for GPU Direct Access and Reverse Offload backends. Previously, rocshmem_barrier_all_on_stream was only supported for IPC backend. * Add functional test for rocshmem_broadcastmem_on_stream * Add host-side rocshmem_broadcastmem_on_stream API Implement stream-based broadcast collective operation - Add rocshmem_broadcastmem_on_stream host API and kernel implementation - Add functional test TeamBroadcastmemOnStreamTester with multi-stream support and correctness verification - Use per-workgroup contexts to avoid contention across parallel streams API: rocshmem_broadcastmem_on_stream(team, dest, source, nelems, pe_root, stream) * Add functional test for rocshmem_getmem_on_stream * Add host-side rocshmem_getmem_on_stream API Implement stream-based point-to-point RMA get operation - Add rocshmem_getmem_on_stream host API and kernel implementation - Support for asynchronous getmem operations on HIP streams - Add backend support for GDA, RO, and IPC contexts - Use work-group collective getmem for efficient memory transfer API: rocshmem_getmem_on_stream(dest, source, nelems, pe, stream) (AI Assist) * Add host-side rocshmem_putmem_on_stream API - Add rocshmem_putmem_on_stream for asynchronous remote writes - Support for concurrent RMA operations on HIP streams - Add backend support for GDA, RO, and IPC contexts - Use work-group device collective operation API: rocshmem_putmem_on_stream(dest, source, bytes, pe, stream) (AI Assist) * Add functional test for rocshmem_putmem_on_stream * Add host-side rocshmem_putmem_signal_on_stream API Enables asynchronous putmem operations with signaling on HIP streams. The implementation includes: - Kernel wrapper rocshmem_putmem_signal_kernel - Host interface putmem_signal_on_stream method - Context layer support across all backends (IPC, GDA, RO) - Public API Function signature: void rocshmem_putmem_signal_on_stream(void dest, const void source, size_t bytes, uint64_t sig_addr, uint64_t signal, int sig_op, int pe, hipStream_t stream); Add functional test for rocshmem_putmem_signal_on_stream * Add host-side rocshmem_signal_wait_until_on_stream API Enables asynchronous signal wait operations on HIP streams. The implementation includes: - Kernel wrapper rocshmem_signal_wait_until_kernel - Host interface signal_wait_until_on_stream method - Context layer support across all backends (IPC, GDA, RO) - Native uint64_t support in wait_until API (generated from P2P_SYNC.py) Function signature: void rocshmem_signal_wait_until_on_stream(uint64_t sig_addr, int cmp, uint64_t cmp_value, hipStream_t stream); (AI Assist) Add functional test for rocshmem_signal_wait_until_on_stream * Add documentation for stream API functions This commit adds API documentation for the following host-side stream functions: - rocshmem_barrier_all_on_stream (collective routines) - rocshmem_broadcastmem_on_stream (collective routines) - rocshmem_getmem_on_stream (RMA operations) - rocshmem_putmem_on_stream (RMA operations) - rocshmem_putmem_signal_on_stream (signaling operations) - rocshmem_signal_wait_until_on_stream (point-to-point sync) The documentation includes function signatures, parameter descriptions, and detailed explanations of asynchronous behavior and stream handling. (AI Assist) * Rename "bytes" -> "nelems" * Add "_TEST_" to the variables used in tests * Remove incorrect hipStreamDefault usage hipStreamDefault is not a default stream. This is a flag. If stream == nullptr, then just pass it to kernel. It will launch the kernel on the default stream [ROCm/rocshmem commit: `d0c8380650`]	2025-12-09 08:55:46 -06:00
Dimple Prajapati	b9c172de16	Add IBGDA backend flag to enable bitcode generation (#347 ) * Change to enable ibgda bitcode compilation * Apply suggestion from @abouteiller --------- Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> [ROCm/rocshmem commit: `fbe57306b9`]	2025-12-08 16:19:48 -08:00
Avinash Kethineedi	4a0a3cc6e3	Refactor: modularize RMA and AMO WQE posting functions (#331 ) * Refactor: modularize RMA and AMO WQE posting functions - Extract shared logic for SQ/CQ waiting, doorbell ringing, and WQE building * Remove unused variables * Update return buffer address calculation for atomics [ROCm/rocshmem commit: `1acf454048`]	2025-12-08 14:54:41 -06:00
Yiltan	9b77387067	Fix docs rendering issue (#349 ) [ROCm/rocshmem commit: `d5bcb3a201`]	2025-12-08 15:54:06 -05:00
Yiltan	cf1db0529a	Remove unused fence policy (#348 ) [ROCm/rocshmem commit: `ecd4c9f561`]	2025-12-08 14:06:53 -05:00
Aurelien Bouteiller	92c56e7fbd	Functional tests without MPI support (#343 ) * Let functional tests build without external MPI * Fix error conditions when using uuid startup with internal MPI * Do not abort if libibverbs is not found but not using GDA * Enabled RO functional test initialized with TEST_UUID * Reduce load time for ro backend_can_run and prevent mpilib_dlclose crashing * Fix case TEST_UUID=1, ROCSHMEM_BACKEND='' (autoloading gda) [ROCm/rocshmem commit: `c99bc21e10`]	2025-12-08 11:46:16 -05:00
Yiltan	1c3ce17f13	[GDA/BNXT] Optimize Alltoall using put signal (#334 ) * Modularize bnxt * add post_wqe_amo_single * add alltoall with putsignal impl * make ringing the doorbell optional [ROCm/rocshmem commit: `baaf8091b5`]	2025-12-05 12:41:22 -05:00
Avinash Kethineedi	1ecc355062	IPC: insert `__threadfence_system()` after *wg RMA APIs to guarantee global memory visibility (#346 ) [ROCm/rocshmem commit: `f907ef91e4`]	2025-12-04 10:21:25 -06:00
Edgar Gabriel	3d658b558b	reenable gfx1100 (#328 ) * reenable gfx1100 use the modified version of the flat_store_short assembly instruction as suggested by the compiler team (32bit input value instead of 16bit) * add fix for gfx1201 add the same fix for gfx1201 that was introduced for gfx1100 [ROCm/rocshmem commit: `224c969bef`]	2025-12-03 13:49:38 -06:00
Anatolii Rozanov	4b04b540bf	Add host API for alltoallmem_on_stream collective operation (#333 ) * Add host-side rocshmem_alltoallmem_on_stream function Function signature: rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void dest, const void source, size_t size, hipStream_t stream) - The function launches rocshmem_alltoallmem_kernel which calls device-side alltoall<char> workgroup collective through default context. - Uses dynamic block size determination via occupancy API. - Implemented for all backends. * Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends When allocating memory for alltoall_pSync_pool in setup_teams() and teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE instead of ROCSHMEM_ALLTOALL_SYNC_SIZE. * Add functional test for team_alltoallmem_on_stream This commit adds a new functional test to verify the correctness of the host-side rocshmem_team_alltoallmem_on_stream API. * Add documentation for rocshmem_alltoallmem_on_stream This commit adds API documentation for the host-side rocshmem_alltoallmem_on_stream function in the collective routines section. The documentation includes: [ROCm/rocshmem commit: `5577feb70d`]	2025-12-03 08:40:24 -05:00
Yiltan	0f32739b52	Updated important missing enviroment variables (#344 ) [ROCm/rocshmem commit: `8b350a51fe`]	2025-12-02 11:40:30 -05:00
Aurelien Bouteiller	1e3a161c74	MLX5 cards have a vendor-id that does not match the pci-vendor-id for (#342 ) some reason. Signed-off-by: Aurelien Bouteiller <abouteil@amd.com> [ROCm/rocshmem commit: `0f7da76018`]	2025-12-02 11:32:37 -05:00
Kutovoi, Vadim	dde9e2464e	gda: add check for active interfaces when selecting the GDA backend (#327 ) * gda: add check for active interfaces when selecting the GDA backend * fix __func__ maco in rocshmem_ctx_pe_quiet * gda: switch to more generic RDMA NIC term in has_active_ib_interface * gda: add active MLX5 and Pensando vendor ID checks for backend selection [ROCm/rocshmem commit: `29000a5644`]	2025-12-01 15:49:25 -05:00
Adel Johar	2c243feb1b	[Docs] Move environment variables to separate page (#341 ) [ROCm/rocshmem commit: `ba77bdd9a6`]	2025-12-01 14:25:27 -05:00

1 2 3 4 5 ...

421 Révisions