rocm-systems

Συγγραφέας	SHA1	Μήνυμα	Ημερομηνία
Allen Hubbe	ed91c8cce2	functional_tests: n, nskip, nloop, nlarge options (#297 ) To make the functional tests more useful for benchmarking, allow user to specify the number of loops and related parameters via command options. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>	2025-10-30 11:54:49 -04:00
Edgar Gabriel	c0285ac0ce	minor change to MPI detection logic (#294 ) somehow the test whether we requested MPI support or not stopped working, although no obvious code change can be located. Make the if-statement more stringent by explicitely testing whether USE_MPI_SUPPORT is "ON".	2025-10-28 12:54:26 -05:00
akolliasAMD	87d87cc881	renamed memcpy to memcpy_lane (#296 )	2025-10-28 09:33:13 -06:00
Yiltan	ff29d139cb	[GDA] Improve error messages (#292 )	2025-10-27 16:51:15 -04:00
Yiltan	6290db319c	[GDA/BNXT] Implemented CQE Collapsing (#279 )	2025-10-23 14:53:44 -04:00
Aurelien Bouteiller	054bc33dc4	Tests/syncall (#291 ) * SyncAll test case would run Sync * Despecialized name for argument reader * Rename sync-test to team-sync-test as it uses teams * Another stab at probing NUM_GPUS	2025-10-23 13:40:41 -04:00
Edgar Gabriel	e2c6bb8bd4	fix Win_flush prototype in function table (#289 ) the bug was exposed when trying to compile a backend with HDP flush support.	2025-10-23 08:43:41 -05:00
Edgar Gabriel	d0c2845031	add support for GPUs using wavefront size of 32 (#285 ) * add gfx1100 support Add support for Radeon 7900 GPUs (RX and PRO), and 7800 PRO. I was contemplating to add gfx1101 and gfx1102 GPUs as well, but those are the lower end models that are more unlikely to be used for compute intensive jobs. In addition, I do not have access to them to test the support. * update WF_SIZe for different options Radeon systems use a WarpSize of 32, unlike current Instinct systems, which use a warp size of 64. For the device side, a gfx specific ifdef is sufficient. For the host side, we need to query the device properties. * adjust functional tests to wf_size of 32 * update unit tests to handle wf_size of 32 * address reviewer comments	2025-10-22 16:04:58 -05:00
Avinash Kethineedi	955c22aeed	Add `ROCSHMEM_CTX_INVALID` for invalid context handling (#287 ) * Add `ROCSHMEM_CTX_INVALID` for invalid context handling - Define `ROCSHMEM_CTX_INVALID` as {nullptr, nullptr} - Add == and != operators to rocshmem_ctx_t - Use `ROCSHMEM_CTX_INVALID` on failed context creation - Skip ctx destroy if context is invalid * Update docs for context create and destroy APIs usage and behavior	2025-10-22 12:00:56 -05:00
Yiltan	b534423de7	Provide an error when there are no NICs on a system (#286 )	2025-10-20 13:07:56 -04:00
Aurelien Bouteiller	c44f4ece1f	Print an error and quit cleanly if GDA required but could not init (#284 )	2025-10-20 13:04:13 -04:00
Yiltan	c3eeae473b	Implement rocshmem_pe_quiet() (#282 ) Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>	2025-10-20 11:42:39 -04:00
Edgar Gabriel	6f74cdfd75	update tester for RO (#281 ) update the tester script to only tests the amo functions on RO that are expected to pass. We can revisit the non-passing tests later, but this prevents us from having passing CIs at the moment, while RO is simply lower priority than other asks.	2025-10-20 09:03:17 -05:00
Yiltan	9338c84480	Updated docs for ROCm 7.x.x (#239 ) Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> Co-authored-by: yugang-amd <yugang.wang@amd.com>	2025-10-17 12:10:37 -04:00
Aurelien Bouteiller	aef74812ae	Add ROCSHMEM_GDA_PROVIDER envvar (#280 ) * Add ROCHSMEM_GDA_PROVIDER env control * Single name for a single concept: vendor->provider	2025-10-17 10:46:05 -04:00
Edgar Gabriel	ed957302d4	update to build system: (#277 ) - make adding PMIx library to compile time based on the result of finding PMIx support. This is required eg if compiling rocSHMEM with ompi 4.0/4.1, which do not have a built-in PMIx version. - when setting USE_EXTERNAL_MPI=OFF which ensures that we do not check for external MPI libraries (even if one would be available).	2025-10-17 07:42:11 -05:00
Aurelien Bouteiller	3cfe76522e	Runtime selection of IONIC (#272 ) * Split ionic code to a subdirectory; dyld libionicl; move the fntable to provider_gda_xxx.hpp pass the pattr to ionic_setup_pd, include endian.hpp Enable building IONIC conduit for runtime selection * Uniform style for the fntable between ionic and the rest * Move mlx5 gda conduit to a subdir; resolve conflict with backend_can_run function * Don't forget to init qp for ionic, move mlx5 specialized init qp code to the mlx5 subdir * Don't add cmakecaches... Typo: GDA_BXNT * Add gda-ionic to all_backends build scripts * Apply suggestion from reviews Co-authored-by: Omri Mor <omri50@gmail.com> Co-authored-by: Edgar Gabriel <edgar.gabriel@amd.com> * Remove duplicate definitiion of DLSYM macros --------- Co-authored-by: Omri Mor <omri50@gmail.com> Co-authored-by: Edgar Gabriel <edgar.gabriel@amd.com>	2025-10-16 15:53:01 -04:00
Dimple Prajapati	a44b581997	Add host API for enqueuing barrier on given stream (#274 ) * add host API for enqueuing barrier on given stream	2025-10-15 14:29:07 -07:00
akolliasAMD	4ecdbc026c	added dl on the list of linked libraries (#275 )	2025-10-10 09:12:59 -06:00
Aurelien Bouteiller	db8e5f1086	Make ROCSHMEM_DISABLE_MIXED_IPC a synonym for ROCSHMEM_RO_DISABLE_IPC, ROCSHMEM_DISABLE_IPC (#273 ) * Make ROCSHMEM_DISABLE_IPC a synonym for ROCSHMEM_RO_DISABLE_IPC * Introduce ROCSHMEM_DISABLE_MIXED_IPC and deprecate old variants	2025-10-09 19:57:53 -04:00
Aurelien Bouteiller	6e7277b544	Cleanup/wg init (#260 ) * remove wg_init and wg_finalize from functional tests * Remove wg_init and wg_finalize from examples * deprecate wg_init/finalize * Updated docs * Typo in documentation --------- Co-authored-by: Yiltan <yiltan@amd.com>	2025-10-07 14:34:18 -04:00
Edgar Gabriel	a1269e3db5	allow all three backends to co-exist in a single build (#270 ) * add support for compiling all backends also include the logic to select backends either based on user requests or through some heuristics * checkpoint for compiling all backends * final checkpoint all tests seem to pass when compiling all three backends simultaneasly and forcing to use any of the three Backends. * update PR to new envvar system	2025-10-07 10:49:20 -05:00
Allen Hubbe	c84bbc250b	gda ionic: restore functionality of ionic gda in rocshmem (#269 ) * Revamp findibverbs to find ionic again * gda ionic: rename ionic_sq_buf ionic_cq_buf Avoid duplicating member names used by mlx5 gda. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda: move spin lock to util.hpp Move spin lock out of ionic gda to util.hpp. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: assume latest fwabi changes There is no firmware abi compatibility in this ionic gda code yet, so assume we are using the latest firmware abi as of now. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: allow doorbell with incomplete wqes Use spin lock to ensure doorbell is only written with an increasing producer index. Ring the doorbell after this wave has initialized its wqes. Wqes of other waves might not be fully initialized, but firmware will not process them until the phase/color flag is updated in the respecitve wqes. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: poll cq for additional completions Keep polling the cq for more than just the minimum number of completions for this wave of threads to make progress, as long as the cq is not empty. A part of wave-optimized cq polling, at the expense of one wave polling additional completions, it was observed that nearly all other waves avoid taking the cq lock at all. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda: max_rd_atomic in rts transition In modify_qp(RTS), specify max_rd_atomic, not max_dest_rd_atomic. By not speicfying max_rd_atomic (rather, max_rd_atomic=zero), the local nic may get stuck transmitting the first read or atomic request. One read or atomic request is greater than the initiator depth of zero. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: allow specifying traffic class Allow specifying a traffic class. The network might have a specific traffic class configured as no-drop, for example. Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: tweak uxdma assignment The ideal arrangement will have an equal number of QPs active on each uxdma pipeline. Pre-rebase, the better arrangement for rocshmem funcitonal test benchmarks was [0, 1], [1, 0], [0, 1], [1, 0], ... Now, following changes that add 'ROCSHMEM_GDA_ALTERNATE_QP_PORTS=1' by default, the better arrangement is [0, 1], [0, 1], [0, 1], [0, 1], ... Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> --------- Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Co-authored-by: Aurelien Bouteiller <abouteil@amd.com> Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>	2025-10-07 10:08:19 -04:00
Omri Mor	a0fcbf8d35	Unify environment variable management (#235 ) * Add environment variable configuration infrastructure - Namespace rocshmem::envvar - Track all config env vars in per-category lists - Remove duplicates from list of allowed env var types - Reject negative inputs for unsigned integer types - Accept empty strings for std::string - Print error source location using C++20 std::source_location - Unit tests * Port environment variables - ROCSHMEM_UNIQUEID_WITH_MPI - ROCSHMEM_RO_DISABLE_IPC - ROCSHMEM_BOOTSTRAP_TIMEOUT - ROCSHMEM_BOOTSTRAP_HOSTID - ROCSHMEM_BOOTSTRAP_SOCKET_IFNAME - ROCSHMEM_RO_PROGRESS_DELAY - ROCSHMEM_BOOTSTRAP_SOCKET_FAMILY - ROCSHMEM_MAX_NUM_CONTEXTS + Merge the independent per-backend copies into a single variable that is used by all three backends (IPC, RO, GDA). + Set default to 32 (for GDA); prior default for IPC and RO was 1024. - ROCSHMEM_MAX_NUM_HOST_CONTEXTS - ROCSHMEM_MAX_WF_BUFFERS - ROCSHMEM_SQ_SIZE - ROCSHMEM_RO_NET_CPU_QUEUE + Renamed from RO_NET_CPU_QUEUE + Change env var input type to bool, default to false + Invert code logic: setting RO_NET_CPU_QUEUE to anything would /disable/ a variable gpu_queue, which defaulted to true. Variable is now named config::ro::net_cpu_queue, with all prior checks for gpu_queue inverted. - ROCSHMEM_USE_IB_HCA - ROCSHMEM_HEAP_SIZE + Defaults to 1L << 30 i.e. 1 GiB, from default heap size in memory/heap_memory.hpp. - ROCSHMEM_MAX_NUM_TEAMS + Unlike other env vars, this can be referenced from devices. + Function currently narrows from size_t to int: uses need to be audited for safety and correctness in using size_t directly. - ROCSHMEM_GDA_ALTERNATE_QP_PORTS * New env var ROCSHMEM_DEBUG - Debug levels: + NONE + VERSION + WARN + INFO + TRACE - Currently unused - will be added later - Mirrors RCCL debug control * Remove rocshmem::rocshmem_env_config * Change interface for GetClosestNicToGpu to accept const char instead of char: the pointed-to string does not need to be modified - Files were not audited for inclusion of util.hpp only for env vars --------- Signed-off-by: Omri Mor <Omri.Mor@amd.com>	2025-10-06 10:05:57 -07:00
Avinash Kethineedi	0a4f8a83b9	Update atomic functional tests (#262 ) * feat: implement function to return number of blocks in grid. * test: update atomics functional tests - Standard atomic tests: `atomic_add`, `atomic_inc`, `fetch_atomic_add`, `fetch_atomic_inc`, and `fetch_compare_and_swap` - Bitwise atomic tests: `atomic_and`, `atomic_or`, `atomic_xor`, fetch_atomic_and`, `fetch_atomic_or`, and `fetch_atomic_xor` - Extended atomic tests: `atomic_fetch`, `atomic_set`, and `atomic_swap` * Added two different address modes for atomics. * Added all supported data types for atomics tests.	2025-10-06 10:50:50 -05:00
yugang-amd	2bf1f889ad	remove dead link (#271 )	2025-10-06 11:07:52 -04:00
Edgar Gabriel	e4c427a736	Remove MPI compile-time dependency (#264 ) * use dlsym for MPI functions to allow compiling without MPI support, convert the usage of MPI functions and symbols to be based on a dlopen/dlsym based mechanism. Turns out this cannot be done entirely vendor neutral, slightly different solutions might be required for Open MPI, MPICH and the new MPI ABI. * checkpoint more work to be done. * checkpoint 2 * checkpoint 3 * checkpoint 4 examples compile and link correctly * checkpoitn 5 (I think) * Checkpoitn 6 * dyld-mpi: adapt GDA * dyldmpi: tests that depend on MPI need to link with it themselves * do not ../mpi_instance.h * dyldmpi: make the symetricHeapTestFixture compile * dyldmpi: Change cmakery, compiles and run gda w/o external MPI * Make it also compile in external MPI mode * dyldmpi: ipc unit tests compile but do not link * dyldmpi: new approach, if external mpi required, link with mpi, otherwise use ompi5 abi * C-style comments in cmakelist.. * dyldmpi: examples: do not fail compiling if MPI not found at build time, instead do not compile the MPI required examples * more updates to CMake logic * convert RO backend and a few other cleanups * update some unit tests to work with the dlopen MPI environment correctly. --------- Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>	2025-10-01 08:06:56 -05:00
Yiltan	6bb46887e8	Fix g/p tests (#266 )	2025-09-29 14:27:25 -04:00
Avinash Kethineedi	df46e80116	feat: add atomic CAS support for bnxt NICs (#267 )	2025-09-29 11:39:35 -05:00
Avinash Kethineedi	98323a6086	GDA 64 bit atomics APIs (#254 ) feat(gda): add support for * Standard atomics - atomic_CAS - atomic_fetch_CAS * Extended atomics - atomic set - atomic swap * Bitwise atomics - atomic_fetch_and - atomic_and - atomic_fetch_or - atomic_or - atomic_fetch_xor - atomic_xor	2025-09-29 11:38:49 -05:00
Aurelien Bouteiller	16a4f10203	Select device NIC vendor code at runtime (#263 ) * Runtime selection of device implementation for post_wqe, quiet, ring_doorbell * Normalize function naming	2025-09-26 00:27:41 -04:00
Aurelien Bouteiller	49e5554636	Modify find-pmix to create imported targets (#242 )	2025-09-25 13:33:07 -04:00
Yiltan	7ebf03fe2f	Improve qp mapping (#259 ) Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>	2025-09-25 10:24:59 -04:00
Yiltan	f4e4ea08a9	Select host NIC vendor code at runtime (#261 ) Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> Co-authored-by: Omri Mor <omri50@gmail.com>	2025-09-23 15:33:03 -04:00
Aurelien Bouteiller	96336da78f	Add IPC and/or/xor/swap amos, reenable functional tests (#184 )	2025-09-22 13:02:02 -04:00
Yiltan	f5aefd15f3	[GDA] Implement fetching atomics for BNXT (#253 ) * Indent driver script * Implemented fetching atomics BNXT	2025-09-18 09:50:42 -04:00
Yiltan	758c4a43f6	Improve Error Messages (#257 )	2025-09-18 09:49:32 -04:00
Aurelien Bouteiller	801d2c5012	Enable GDA+IPC (#249 ) * Enable GDA+IPC Fix ROCSHMEM_DISABLE_IPC for both RO and GDA * add more functionality to bootstrap class we need a few more functions in the boostrap class to be able to fully handle the rocshmem requirements: - add a function to return the list of local ranks - provide a groupAllgather operation which takes a vector of ranks participating - provide a groupAlltoall operation which takes a vector of ranks participating Also, update the functionality of the gda-Alltoall and gda-Allreduce operations to take advantage of these functions. * ipc_policy adapted to use bootstrap groupallgather * bugfix: there was a mistake in computing sendto in groupallgather * bugfix: shm_size and shm_rank were set in a local variable rather than the class member * mpi-bootstrap: remove an unecessary allgather --------- Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com>	2025-09-16 11:54:53 -04:00
Yiltan	3100d71f4b	Update Changelog.md (#256 )	2025-09-16 11:48:39 -04:00
Aurelien Bouteiller	38a7820aa8	Remove unused scripts from functional tests (#237 )	2025-09-12 10:14:33 -04:00
Yiltan	e856fbb0eb	Fix broken atomics from PR #233 (#251 ) * The QueuePair object was out of scope at the end of the for loop. So the deconstructor was called. * Although correct for C++ to do this, it ignores that we copied the QueuePair object into device memory and have an instance there. * Early deconstruction resulted in calling ibv_dereg_mr on the atomics memory region. So when the GPU kernel tried to use the memory region it wasn't registered which resulted in a protection domain error. * The solution was to allocate our QueuePair obj with the new operator which leaves memory management to us, then we can manually call the deconstructor.	2025-09-11 18:11:52 -04:00
Edgar Gabriel	b6b5a82d2b	add an all-ro flag (#252 ) to specify the subset of tests that we want to run in Jenkins with the RO conduit	2025-09-11 16:32:08 -05:00
Edgar Gabriel	99b753f103	move dv functionality to use dlopen (#248 ) abstract out the usage of direct verbs functionality to use tables with the function pointers. This will allow in a second step the library to be simultaniously be compiled for multiple NIC vendors/DV libraries and interfaces. For now, the conversion has been done for IB MLX5 and BCOM DV, the Pensando AINIC is to follow soon.	2025-09-11 16:13:31 -05:00
Aurelien Bouteiller	5dc7d4539e	Make it possible to test RMA GET and PUT separately (#250 ) DISABLE_GET removed from ALL, idea is that the CI scripts will invoke a subset that is known to work rather than ALL	2025-09-11 16:44:48 -04:00
Yiltan	2abeebbb6d	[GDA] implement rocshmem_p (#247 )	2025-09-11 09:24:43 -04:00
Omri Mor	f677e5eb59	Update AUTHORS (#246 ) * Update AUTHORS and CODEOWNERS	2025-09-10 11:32:23 -07:00
Omri Mor	0fd966611d	add missing copyright header to rocshmem_info.cpp (#245 )	2025-09-10 10:11:06 -07:00
Avinash Kethineedi	671f8187f4	GDA `get` APIs (#243 ) feat(GDA): add `get` and `get_nbi` APIs for mlx and bnxt NICs - implemented thread, wave and wg variants of `get` and `get_nbi`. test(GDA): enable functional tests for `get` and `get_nbi` APIs	2025-09-10 11:24:53 -05:00
yugang-amd	4a760d741a	remove broken link etc. (#234 )	2025-09-10 09:48:28 -04:00
Yiltan	cb39f7a313	Unify common BNXT and MLX5 initialization code (#233 ) Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>	2025-09-10 09:13:36 -04:00

1 2 3 4 5 ...

387 Υποβολές