rocm-systems

Autor(a)	SHA1	Mensagem	Data
Adel Johar	2c243feb1b	[Docs] Move environment variables to separate page (#341 ) [ROCm/rocshmem commit: `ba77bdd9a6`]	2025-12-01 14:25:27 -05:00
Yiltan	77ba8cc76c	Cleanup readme.md [ROCm/rocshmem commit: `774159a08f`]	2025-12-01 10:43:18 -05:00
Yiltan	2079193495	Update docs for GDA (#337 ) [ROCm/rocshmem commit: `5606fdafd6`]	2025-12-01 09:38:11 -05:00
Yiltan	f9caef6908	Add rocshmem_int64_p (#335 ) [ROCm/rocshmem commit: `d9e2890222`]	2025-11-26 10:31:23 -05:00
Yiltan	a40485d292	Alltoall bug fix for RCCL (#329 ) [ROCm/rocshmem commit: `7ab9823169`]	2025-11-20 11:16:00 -05:00
Edgar Gabriel	db4c6293cc	add relaxed_ordering option (#324 ) * add relaxed_ordering option add an environment variable that allows to control setting the IBV_ACCESS_RELAXED_ORDERING flag when registering memory with the ibv_reg_mr* functions. * missed a spot [ROCm/rocshmem commit: `2ae2033648`]	2025-11-20 08:20:25 -06:00
Yiltan	b126537b55	[GDA] Alltoall optimization - single warp (#319 ) * Remove testing of data types As the collective is templated, we are just testing if sizeof(T) works * Added single threaded varients * Applied thread puts optimization to barrier * Apply single threaded optimization to alltoall * This optimization only works on bnxt, so place a switch to protect it * Handle the edge case where the thread count is smaller than the number of PEs [ROCm/rocshmem commit: `1347d5d628`]	2025-11-19 14:25:29 -05:00
Anatolii Rozanov	8d8dffbd43	Fix typo in README.md (#326 ) [ROCm/rocshmem commit: `618bdef082`]	2025-11-19 08:36:01 -06:00
Yiltan	cfb3d42524	Update change log for rocm 7.1.1 (#320 ) (#321 ) (cherry picked from commit 783ab37f68ffb72b4baffda516fcc19e2f28804e) [ROCm/rocshmem commit: `1fb0fc4bd0`]	2025-11-14 13:51:03 -05:00
Edgar Gabriel	2b0bca5c87	disable gfx1100 temporarily (#322 ) [ROCm/rocshmem commit: `ef3ba6cd45`]	2025-11-14 10:46:19 -07:00
Yiltan	a500bc8029	only use rocm_install if we build the tools (#316 ) [ROCm/rocshmem commit: `73786e203e`]	2025-11-12 10:58:49 -05:00
Edgar Gabriel	722b54fddb	replace MPI function call. (#317 ) * replace MPI function call. * add two missing defs for RO [ROCm/rocshmem commit: `e1a7e20b1b`]	2025-11-12 07:38:47 -06:00
Dana Robinson	237f64065f	Fix typo in CONTRIBUTING.md (#315 ) [ROCm/rocshmem commit: `65790c1b4f`]	2025-11-09 12:58:19 -06:00
Yiltan	740cbe6098	Use dlopen for libnuma (#312 ) [ROCm/rocshmem commit: `80f0a39866`]	2025-11-07 10:12:11 -05:00
Edgar Gabriel	3c25349ec1	initial commit for gfx12 support (#305 ) [ROCm/rocshmem commit: `d185fe3555`]	2025-11-07 08:54:03 -06:00
Edgar Gabriel	5e6a4e15f6	disable memory tests (#310 ) disable fine-grain and coarse-grain memory testst until a fix is available in ROCm 7.1 and/or our CI image. Otherwise we might miss other errors due to constant CI failures. [ROCm/rocshmem commit: `4fc5541d78`]	2025-11-07 08:04:31 -06:00
Allen Hubbe	5e82060ba0	gda: fix getmem_nbi_wg source and dest (#311 ) A copy paste mistake in a previous commit caused source and dest to be reversed. Correct the source and dest params. Fixes: `e8a7371007` Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> [ROCm/rocshmem commit: `e2dcf99456`]	2025-11-06 16:21:20 -06:00
Yiltan	7348bac9bf	Memset queues (#313 ) [ROCm/rocshmem commit: `cd9b5ee806`]	2025-11-06 14:16:53 -05:00
Yiltan	bf19d70a29	Added ibv_wrapper which opens library using dlopen (#309 ) [ROCm/rocshmem commit: `110f9c8793`]	2025-11-05 16:12:44 -05:00
Allen Hubbe	e8a7371007	gda ionic: use all threads in wave operations (#295 ) Use all available threads for polling the cq to increase the maximum message rate. Even when posting a single wqe in the wave, use all available theads for polling the cq to reserve space in the sq. Changes were needed in the rocshmem abstraction to avoid disabling gpu threads, like taking turns or using only the first thread in a wave or wavefront. To avoid breaking other gda implementations, reimplement turn-based or single thread strategy in post_wqe_rma_turn and post_wqe_rma_single. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> [ROCm/rocshmem commit: `6de67d5d7c`]	2025-11-05 11:01:14 -06:00
Aurelien Bouteiller	51cf7c6c05	python venv madness round 2: use ensurepip if installed (#308 ) When creating a python venv during the install_dependencies script, we try to use ensurepip if it is installed, as it deals better with cases where multiple venvs are active simultaneously. (as seen in CI buildbot) [ROCm/rocshmem commit: `b7a6d86c6b`]	2025-11-05 10:52:22 -05:00
Aurelien Bouteiller	76e8750d88	Add backend type query method, use it to disable 32bit amo testers on gda (#307 ) * Add backend type query method, use it to disable 32bit amo testers on gda * The infrateam testers work [ROCm/rocshmem commit: `8c175315f2`]	2025-11-05 10:24:07 -05:00
erieaton-amd	4aaa1a27f5	Fix rocshmem_ptr definition signature (#306 ) Makes the signature of the definition match the declaration in rocshmem.hpp. Signed-off-by: Eric Eaton <erieaton@amd.com> [ROCm/rocshmem commit: `7b5765ec0e`]	2025-11-04 12:42:47 -05:00
Aurelien Bouteiller	e622398337	install_dependencies pip issues with ubuntu 24 (#302 ) * The install_dependencies script would fail on ubuntu 24.04 they changed how pip works so we need to create a venv first now * Fix install_dependencies for ubuntu 22 * Make sure we build in the builddir and install in the installdir combine installdir for ucx and ompi when user-provided by INSTALL_DIR retain prior behavior if not overridden to avoid breaking CI scripts [ROCm/rocshmem commit: `e155af8704`]	2025-10-31 16:34:36 -04:00
Yiltan	3535ce8c0a	Alltoall linear parallel Optimization (#303 ) [ROCm/rocshmem commit: `8dd2112ec8`]	2025-10-31 10:26:44 -04:00
Yiltan	2f8a1c02a4	[GDA] Implement internal_direct_barrier_wg (#299 ) [ROCm/rocshmem commit: `5f87bb061b`]	2025-10-31 10:26:24 -04:00
Allen Hubbe	fa7841f0d4	functional_tests: n, nskip, nloop, nlarge options (#297 ) To make the functional tests more useful for benchmarking, allow user to specify the number of loops and related parameters via command options. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> [ROCm/rocshmem commit: `ed91c8cce2`]	2025-10-30 11:54:49 -04:00
Edgar Gabriel	0ad710e537	minor change to MPI detection logic (#294 ) somehow the test whether we requested MPI support or not stopped working, although no obvious code change can be located. Make the if-statement more stringent by explicitely testing whether USE_MPI_SUPPORT is "ON". [ROCm/rocshmem commit: `c0285ac0ce`]	2025-10-28 12:54:26 -05:00
akolliasAMD	6f6719dbab	renamed memcpy to memcpy_lane (#296 ) [ROCm/rocshmem commit: `87d87cc881`]	2025-10-28 09:33:13 -06:00
Yiltan	163148ce7e	[GDA] Improve error messages (#292 ) [ROCm/rocshmem commit: `ff29d139cb`]	2025-10-27 16:51:15 -04:00
Yiltan	0bd07a26be	[GDA/BNXT] Implemented CQE Collapsing (#279 ) [ROCm/rocshmem commit: `6290db319c`]	2025-10-23 14:53:44 -04:00
Aurelien Bouteiller	bdb30e2984	Tests/syncall (#291 ) * SyncAll test case would run Sync * Despecialized name for argument reader * Rename sync-test to team-sync-test as it uses teams * Another stab at probing NUM_GPUS [ROCm/rocshmem commit: `054bc33dc4`]	2025-10-23 13:40:41 -04:00
Edgar Gabriel	3eadf8cc62	fix Win_flush prototype in function table (#289 ) the bug was exposed when trying to compile a backend with HDP flush support. [ROCm/rocshmem commit: `e2c6bb8bd4`]	2025-10-23 08:43:41 -05:00
Edgar Gabriel	d37af80d7e	add support for GPUs using wavefront size of 32 (#285 ) * add gfx1100 support Add support for Radeon 7900 GPUs (RX and PRO), and 7800 PRO. I was contemplating to add gfx1101 and gfx1102 GPUs as well, but those are the lower end models that are more unlikely to be used for compute intensive jobs. In addition, I do not have access to them to test the support. * update WF_SIZe for different options Radeon systems use a WarpSize of 32, unlike current Instinct systems, which use a warp size of 64. For the device side, a gfx specific ifdef is sufficient. For the host side, we need to query the device properties. * adjust functional tests to wf_size of 32 * update unit tests to handle wf_size of 32 * address reviewer comments [ROCm/rocshmem commit: `d0c2845031`]	2025-10-22 16:04:58 -05:00
Avinash Kethineedi	b771a26916	Add `ROCSHMEM_CTX_INVALID` for invalid context handling (#287 ) * Add `ROCSHMEM_CTX_INVALID` for invalid context handling - Define `ROCSHMEM_CTX_INVALID` as {nullptr, nullptr} - Add == and != operators to rocshmem_ctx_t - Use `ROCSHMEM_CTX_INVALID` on failed context creation - Skip ctx destroy if context is invalid * Update docs for context create and destroy APIs usage and behavior [ROCm/rocshmem commit: `955c22aeed`]	2025-10-22 12:00:56 -05:00
Yiltan	dd92cb2af8	Provide an error when there are no NICs on a system (#286 ) [ROCm/rocshmem commit: `b534423de7`]	2025-10-20 13:07:56 -04:00
Aurelien Bouteiller	349d7f6ad3	Print an error and quit cleanly if GDA required but could not init (#284 ) [ROCm/rocshmem commit: `c44f4ece1f`]	2025-10-20 13:04:13 -04:00
Yiltan	92a7904656	Implement rocshmem_pe_quiet() (#282 ) Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> [ROCm/rocshmem commit: `c3eeae473b`]	2025-10-20 11:42:39 -04:00
Edgar Gabriel	6bc1cc63ae	update tester for RO (#281 ) update the tester script to only tests the amo functions on RO that are expected to pass. We can revisit the non-passing tests later, but this prevents us from having passing CIs at the moment, while RO is simply lower priority than other asks. [ROCm/rocshmem commit: `6f74cdfd75`]	2025-10-20 09:03:17 -05:00
Yiltan	c269577b89	Updated docs for ROCm 7.x.x (#239 ) Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> Co-authored-by: yugang-amd <yugang.wang@amd.com> [ROCm/rocshmem commit: `9338c84480`]	2025-10-17 12:10:37 -04:00
Aurelien Bouteiller	bba611d702	Add ROCSHMEM_GDA_PROVIDER envvar (#280 ) * Add ROCHSMEM_GDA_PROVIDER env control * Single name for a single concept: vendor->provider [ROCm/rocshmem commit: `aef74812ae`]	2025-10-17 10:46:05 -04:00
Edgar Gabriel	feab645795	update to build system: (#277 ) - make adding PMIx library to compile time based on the result of finding PMIx support. This is required eg if compiling rocSHMEM with ompi 4.0/4.1, which do not have a built-in PMIx version. - when setting USE_EXTERNAL_MPI=OFF which ensures that we do not check for external MPI libraries (even if one would be available). [ROCm/rocshmem commit: `ed957302d4`]	2025-10-17 07:42:11 -05:00
Aurelien Bouteiller	bb8406b013	Runtime selection of IONIC (#272 ) * Split ionic code to a subdirectory; dyld libionicl; move the fntable to provider_gda_xxx.hpp pass the pattr to ionic_setup_pd, include endian.hpp Enable building IONIC conduit for runtime selection * Uniform style for the fntable between ionic and the rest * Move mlx5 gda conduit to a subdir; resolve conflict with backend_can_run function * Don't forget to init qp for ionic, move mlx5 specialized init qp code to the mlx5 subdir * Don't add cmakecaches... Typo: GDA_BXNT * Add gda-ionic to all_backends build scripts * Apply suggestion from reviews Co-authored-by: Omri Mor <omri50@gmail.com> Co-authored-by: Edgar Gabriel <edgar.gabriel@amd.com> * Remove duplicate definitiion of DLSYM macros --------- Co-authored-by: Omri Mor <omri50@gmail.com> Co-authored-by: Edgar Gabriel <edgar.gabriel@amd.com> [ROCm/rocshmem commit: `3cfe76522e`]	2025-10-16 15:53:01 -04:00
Dimple Prajapati	6c4325d131	Add host API for enqueuing barrier on given stream (#274 ) * add host API for enqueuing barrier on given stream [ROCm/rocshmem commit: `a44b581997`]	2025-10-15 14:29:07 -07:00
akolliasAMD	fc73e4f858	added dl on the list of linked libraries (#275 ) [ROCm/rocshmem commit: `4ecdbc026c`]	2025-10-10 09:12:59 -06:00
Aurelien Bouteiller	225746b0f0	Make ROCSHMEM_DISABLE_MIXED_IPC a synonym for ROCSHMEM_RO_DISABLE_IPC, ROCSHMEM_DISABLE_IPC (#273 ) * Make ROCSHMEM_DISABLE_IPC a synonym for ROCSHMEM_RO_DISABLE_IPC * Introduce ROCSHMEM_DISABLE_MIXED_IPC and deprecate old variants [ROCm/rocshmem commit: `db8e5f1086`]	2025-10-09 19:57:53 -04:00
Aurelien Bouteiller	8837414042	Cleanup/wg init (#260 ) * remove wg_init and wg_finalize from functional tests * Remove wg_init and wg_finalize from examples * deprecate wg_init/finalize * Updated docs * Typo in documentation --------- Co-authored-by: Yiltan <yiltan@amd.com> [ROCm/rocshmem commit: `6e7277b544`]	2025-10-07 14:34:18 -04:00
Edgar Gabriel	192c549d40	allow all three backends to co-exist in a single build (#270 ) * add support for compiling all backends also include the logic to select backends either based on user requests or through some heuristics * checkpoint for compiling all backends * final checkpoint all tests seem to pass when compiling all three backends simultaneasly and forcing to use any of the three Backends. * update PR to new envvar system [ROCm/rocshmem commit: `a1269e3db5`]	2025-10-07 10:49:20 -05:00
Allen Hubbe	4b80581422	gda ionic: restore functionality of ionic gda in rocshmem (#269 ) * Revamp findibverbs to find ionic again * gda ionic: rename ionic_sq_buf ionic_cq_buf Avoid duplicating member names used by mlx5 gda. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda: move spin lock to util.hpp Move spin lock out of ionic gda to util.hpp. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: assume latest fwabi changes There is no firmware abi compatibility in this ionic gda code yet, so assume we are using the latest firmware abi as of now. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: allow doorbell with incomplete wqes Use spin lock to ensure doorbell is only written with an increasing producer index. Ring the doorbell after this wave has initialized its wqes. Wqes of other waves might not be fully initialized, but firmware will not process them until the phase/color flag is updated in the respecitve wqes. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: poll cq for additional completions Keep polling the cq for more than just the minimum number of completions for this wave of threads to make progress, as long as the cq is not empty. A part of wave-optimized cq polling, at the expense of one wave polling additional completions, it was observed that nearly all other waves avoid taking the cq lock at all. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda: max_rd_atomic in rts transition In modify_qp(RTS), specify max_rd_atomic, not max_dest_rd_atomic. By not speicfying max_rd_atomic (rather, max_rd_atomic=zero), the local nic may get stuck transmitting the first read or atomic request. One read or atomic request is greater than the initiator depth of zero. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: allow specifying traffic class Allow specifying a traffic class. The network might have a specific traffic class configured as no-drop, for example. Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> * gda ionic: tweak uxdma assignment The ideal arrangement will have an equal number of QPs active on each uxdma pipeline. Pre-rebase, the better arrangement for rocshmem funcitonal test benchmarks was [0, 1], [1, 0], [0, 1], [1, 0], ... Now, following changes that add 'ROCSHMEM_GDA_ALTERNATE_QP_PORTS=1' by default, the better arrangement is [0, 1], [0, 1], [0, 1], [0, 1], ... Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> --------- Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Co-authored-by: Aurelien Bouteiller <abouteil@amd.com> Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> [ROCm/rocshmem commit: `c84bbc250b`]	2025-10-07 10:08:19 -04:00
Omri Mor	5bc35a7eb6	Unify environment variable management (#235 ) * Add environment variable configuration infrastructure - Namespace rocshmem::envvar - Track all config env vars in per-category lists - Remove duplicates from list of allowed env var types - Reject negative inputs for unsigned integer types - Accept empty strings for std::string - Print error source location using C++20 std::source_location - Unit tests * Port environment variables - ROCSHMEM_UNIQUEID_WITH_MPI - ROCSHMEM_RO_DISABLE_IPC - ROCSHMEM_BOOTSTRAP_TIMEOUT - ROCSHMEM_BOOTSTRAP_HOSTID - ROCSHMEM_BOOTSTRAP_SOCKET_IFNAME - ROCSHMEM_RO_PROGRESS_DELAY - ROCSHMEM_BOOTSTRAP_SOCKET_FAMILY - ROCSHMEM_MAX_NUM_CONTEXTS + Merge the independent per-backend copies into a single variable that is used by all three backends (IPC, RO, GDA). + Set default to 32 (for GDA); prior default for IPC and RO was 1024. - ROCSHMEM_MAX_NUM_HOST_CONTEXTS - ROCSHMEM_MAX_WF_BUFFERS - ROCSHMEM_SQ_SIZE - ROCSHMEM_RO_NET_CPU_QUEUE + Renamed from RO_NET_CPU_QUEUE + Change env var input type to bool, default to false + Invert code logic: setting RO_NET_CPU_QUEUE to anything would /disable/ a variable gpu_queue, which defaulted to true. Variable is now named config::ro::net_cpu_queue, with all prior checks for gpu_queue inverted. - ROCSHMEM_USE_IB_HCA - ROCSHMEM_HEAP_SIZE + Defaults to 1L << 30 i.e. 1 GiB, from default heap size in memory/heap_memory.hpp. - ROCSHMEM_MAX_NUM_TEAMS + Unlike other env vars, this can be referenced from devices. + Function currently narrows from size_t to int: uses need to be audited for safety and correctness in using size_t directly. - ROCSHMEM_GDA_ALTERNATE_QP_PORTS * New env var ROCSHMEM_DEBUG - Debug levels: + NONE + VERSION + WARN + INFO + TRACE - Currently unused - will be added later - Mirrors RCCL debug control * Remove rocshmem::rocshmem_env_config * Change interface for GetClosestNicToGpu to accept const char instead of char: the pointed-to string does not need to be modified - Files were not audited for inclusion of util.hpp only for env vars --------- Signed-off-by: Omri Mor <Omri.Mor@amd.com> [ROCm/rocshmem commit: `a0fcbf8d35`]	2025-10-06 10:05:57 -07:00

1 2 3 4 5 ...

413 Cometimentos