rocm-systems

Author	SHA1	Message	Date
BertanDogancay	08a7be231b	Merge remote-tracking branch 'nccl/master' into develop	2025-08-28 15:46:28 -05:00
Avinash	a0ec15bafe	[build] Disable MSCCL++ compilation by default (#1879 ) * Enable MSCCLPP on request * Updating docs and README * Updates to CHANGELOG.md * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Updates to CHANGELOG.md * Update CHANGELOG.md Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> * Update CHANGELOG.md Github didn't take the edit to my suggestion properly. --------- Co-authored-by: amd <amd@super3.amd.com> Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>	2025-08-28 08:52:12 -06:00
Nilesh M Negi	d73cee7588	[AzureCI] Switch to ROCm 6.4.1 and add rccl-tests (#1782 ) * Use ROCm 6.4.1 for testing * Extend RCCL-Tests to multi-node * Add HSA_NO_SCRATCH_RECLAIM to UT runs * Limit to single-node rccl-tests for now	2025-08-27 21:07:53 -05:00
jonatluu	4699bff790	fix lintian warning package-contains-timestamped-gzip (#1865 ) * fix lintian warning package-contains-timestamped-gzip * fix lintian warning	2025-08-27 13:29:07 -04:00
Geo Min	f404624d9e	[TheRock CI] Adding single node tests for RCCL (#1876 ) * Add single-node testing * Adding single node test * Adding quotes * fix typo * Adding test flag * No MPI * Adding openmpi install * Adding comment * PR comments * Missing proj * Adding half * Adding rocr runtime * Adding them all' * new sha * Fixing script * Removing confusing skip test case * Adding docs * Update .github/workflows/therock-test-packages-single-node.yml Co-authored-by: Marius Brehler <marius.brehler@amd.com> --------- Co-authored-by: Marius Brehler <marius.brehler@amd.com>	2025-08-27 08:13:10 -07:00
Nusrat Islam	df448862c3	Device allocation tracker (#1878 ) * alloc: add memory allocation tracker * alloc: add tracker for ncclCuMemAlloc() APIs * alloc: add null pointer check during free	2025-08-27 09:30:51 -05:00
Kapil S. Pawar	c9becd89cd	Code coverage tests for param.cc (#1872 ) * Added code coverage unit tests for param.cc * Updated ParamTests.cpp and removed ParamTestsConfFile.txt * Updated ParamTests.cpp * Removed NCCL_LOG_INFO and added sample cofig file --------- Co-authored-by: Pawar <kpawar@ctr2-alola-ctrl-01.amd.com>	2025-08-27 09:30:37 -05:00
ishkool	c288fbf1b2	Code coverage tests for net_socket.cc (#1840 ) * Code coverage UTs for net_socket.cc * Addressed review comments --------- Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>	2025-08-27 09:24:21 -05:00
Marius Brehler	221205ebd4	Bump TheRock version used for testing (#1885 )	2025-08-27 16:22:27 +02:00
Mustafa Abduljabbar	277747c199	[Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861 ) * Support pipelining codegen and template specialization * Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16) * Remove need for FUNC_INDEX_TOTAL * Add pipeline field to device function key construction logic * Avoid unneeded codegen for LL/LL64 kernels * Modify conditions and add pipeline dtypes env * Optimize selection for both gfx942 and gfx950 * Increase pipeline bitfield width * Use __forceinline__ for all device functions * Realign reduceCopy with original form * Add opt-out option to enable perf debugs * Remove force-reduce-pipelining option from README * Update CHANGELOG.md --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>	2025-08-26 15:03:54 -04:00
Nusrat Islam	b882af9ffd	fixup: remove extra semicolon (#1881 )	2025-08-26 10:57:25 -05:00
Jeffrey Novotny	64f8e01b76	Docs: Fix formatting for Docker guide (#1882 ) * Docs: Fix formatting for Docker guide * Incorporate feedback	2025-08-26 10:18:32 -04:00
Mustafa Abduljabbar	dfad51e3c9	Support gfx950 in topo_expl and resolve dependency on FMT (#1829 ) * Support gfx950 in topo_expl * Fix dependencies and fetch fmt from sources * Remove third_party folder in make clean * Add empty target when fmt is found * Add MI350 example * Update README.md --------- Co-authored-by: isaki001 <ioannissakiotis@gmail.com>	2025-08-26 10:11:38 -04:00
Nusrat Islam	5e7937effb	Add direct allgather algorithm (#1868 ) * add direct allgather algorithm * minor fix * add debug print for memory allocation tracker * add message size threshold for direct allgather * scatter transfers across ranks * update changelog * minor fix * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * enable direct AG when pxn is ON on MI300X or MI350 --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>	2025-08-25 07:55:10 -05:00
corey-derochie-amd	b88c134874	Changed `TestBedChild` to avoid hang if the call fails (#1875 ) Changed `TestBedChild` protocol to send the result code before the return value to avoid hanging if the call fails. Switched `TestBedChild::GetUniqueId` to use this.	2025-08-23 00:17:34 -05:00
Nilesh M Negi	bf6660ee4e	[BUILD] Populate host_table entries only for 1 unroll (#1871 )	2025-08-23 00:15:38 -05:00
awelling2801	a1a65c65c4	Added new tests for rccl_wrap - rcclUpdateThreadThreshold (#1855 ) * Added tests for rccl_wrap - rcclUpdateThreadThreshold * Skipped tests gtest_skip added * Added tests for new functions rcclSetP2pNetChunkSize and rcclSetPxn --------- Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>	2025-08-21 16:39:53 -05:00
Marius Brehler	5ae5eb9440	Add a badge for TheRock CI (#1874 ) Adds a badge for TheRock CI and moves the existing badge to the top.	2025-08-21 21:54:37 +02:00
Geo Min	f9a957bbab	[TheRock CI] Adding TheRock RCCL tests (#1873 ) * First commit for rccl multi node test workflow * Adding workflow dispatch * Added branch based pull trigger * Changed typo in branch name * Add input variables to push * Removed input variables to push * Added self hosted runner for Vultr cloud * Skipping build and only running test * Changed test runner label name * Made changes to executable paths in test script * Made changes to run * Made changes to cd into cvs dir * This is a dummy commit * Added cmake options * Modified build options * Commiting build changes * Adding rccl and rccl-tests * Re-ordering rccl and rccl-tests * adding --global command * modified cmake command * modified script paths * Testing OIDC for rccl repo * Testing OIDC for rccl repo * Testing build and upload workflow * use default env variable for AMDGPU families on push workflow trigger * Adding cleanup and correct role * Adding additional yml files * Fixing typo'; * Adding new sha * Adding correct gpu target * Adding back venv bin activate * Adding workflow dispatch for tests * Testing * Adding cat * Adding cat * Adding rocm dir change * Adding checkout * cat with sudo * rccl checkout * correcting branch * removing sudo * trying to adjust correct path' * Adding output dir path * Use docker container with pre-installed MPI * Adding back build steps * Fixing SHA * Adding exclusion logic: * Adding test * Adding CI check * Removing testing * Limit to build only rccl, rccl-tests and required dependencies * Adding test * Removing test * Removing quote * Reverting test * PR comments --------- Co-authored-by: arravikum <arravikum@amd.com> Co-authored-by: Marius Brehler <marius.brehler@amd.com>	2025-08-20 15:07:23 -07:00
Arm Patinyasakdikul	28a83c3ea6	Removing "Could not find any local path from gpu X to net." warning (#1866 ) * Removing "Could not find any local path from gpu X to net." warning to avoid confusion.	2025-08-20 16:52:35 -05:00
Arm Patinyasakdikul	9d3acffa5f	Test: delete child object to address memory leak. (#1863 )	2025-08-20 10:15:03 -05:00
Arm Patinyasakdikul	fb882e80f6	Remove noinline attribute from reduceCopyPacks and (#1864 ) reduceCopyPacksWithBias.	2025-08-19 20:24:31 -05:00
Atul Kulkarni	231449c896	Added new code owners (#1869 )	2025-08-19 16:32:25 -05:00
Mustafa Abduljabbar	c1b3cd8911	Have ncclDevFuncId use 64-Bit keyed map with field packing (#1857 ) - Updated ncclDevFuncId to use a hash-based lookup with std::unordered_map. - Keys are now 64-bit integers, which pack coll, algo, proto, devRedOp, and type fields. - Improved flexibility and maintainability by moving away from row-based indexing. - Added error handling for missing keys in the hash map. - Aligned key generation logic with generate.py and updated generate.py.	2025-08-19 16:41:19 -04:00
Nusrat Islam	6ade5065b4	device: optimize threadfence for ll64 protocol (#1858 ) * device: optimize threadfence for ll64 protocol * device: use __atomic_signal_fence() --------- Co-authored-by: Nusrat Islam <nusislam@useocpslog-003.amd.com>	2025-08-18 09:16:41 -05:00
ishkool	876f985e0f	Code Coverage: Proxy.cc tests (#1818 ) * Proxy.cc tests * Update ProxyTest.cpp Cleaned up the code. * Update ProxyTests.cpp Bring back deleting dynamically allocated memory	2025-08-15 19:06:32 -05:00
Atul Kulkarni	84f3cc6a02	Added new unit tests for src/enqueue.cc (#1853 )	2025-08-15 18:26:26 -05:00
ishkool	6453273aa6	Code Coverage Unit Tests for comm.h (#1783 ) * File containing test for comm.h * Update CommTest.cpp Added gtest API for assert * Update CommTest.cpp Adding copyright * Update CommTest.cpp Removing info and tested as not required. * Update and rename CommTest.cpp to CommTests.cpp * Update CMakeLists.txt	2025-08-15 17:44:24 -05:00
Nilesh M Negi	c3b8de4ec8	[DEVICE] Use noinline for LLGenericOp only on gfx950 (#1849 )	2025-08-15 15:15:02 -05:00
isaki001	44121db890	[TUNING] gfx950 16N tuning (#1835 ) * change gfx950 algo/proto selection for multinode allreduce, allgather, reduceScatter * gfx950 tuning: enable tuning for broadcast, allreduce starts LL128 earlier and switches to ring earlier, change LL128 start for allgather and reduceScatter * lower LL128 threshold * update reduceScatter LL128 min to match LL max for consistency * enable multinode PXN and increase chunksize for gfx950 * change LL128 start to 128KB, adjust ring-start according to node-count * disable code-path for fused-AR on LL128 for gfx950 * use LL128 starting from 1KB for multinode allgather on gfx950 * start LL128 earlier for multinode reduceScatter on gfx950 * start LL128 earlier for multinode broadcast on gfx950 * set multinode allreduce to start simple on 64MB for gfx950 * start LL128 from 1KB for multinode broadcast on gfx950 * setting multinode AR to use tree instead of ring at 16MB, 64MB, 128MB * set multinode broadcast to use LL for up to 256KB depending on node-count for gfx950 * adjust algo for 32MB multinode allreduce on gfx950 * make 32MB tree LL128 for multinode AR on gfx950 * make sure ring is not picked on 2N allreduce on small sizes	2025-08-15 15:12:45 -05:00
alex-breslow-amd	1aa2570b48	Disable the __threadfence on the sender side of the simple protocol when possible. (#1830 ) Leverages the traits of extended-scope fine-grain memory to get rid of a device-scope acquire-release fence. This improves throughput for single node workloads on gfx942 and gfx950 for some input sizes (e.g., ~32 MiB to about 256 MiB) when using the simple protocol. Multinode workloads on MI300X see a smaller but statistically significant uplift for some message sizes. Runtime disablement is supported via setting the environment variable RCCL_GFX942_CHEAP_FENCE_ON to 0.	2025-08-15 07:54:54 -07:00
mberenjk	c61152baa4	Added useAcc as a template parameter to address the performance regression (#1856 ) * Added useAcc as a template parameter to address the 2% performance regression in allreduceWithBias --------- Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>	2025-08-14 15:58:54 -05:00
Adel Johar	aaf8613b76	Docs: Add environment variables reference page	2025-08-14 09:55:28 +02:00
Karthikeyan Arumugam	6d41e5ba99	Add cstring header explictly as it is removed from HIP (#1859 )	2025-08-13 15:14:22 -07:00
Rahul Vaidya	ee9ed3ef87	[BUILD] Fix UT packaging on Debian family OS (#1854 ) * Fix UT packaging on Debian family OSes Signed-off-by: ravaidya <ravaidya@amd.com> * Split OR condition when performing Debian checks Signed-off-by: ravaidya <ravaidya@amd.com> --------- Signed-off-by: ravaidya <ravaidya@amd.com>	2025-08-11 17:03:16 -05:00
Chris Sosa	53977821b5	Add CI Badge for tracking CI status in prep for gating changes (#1851 ) This PR is intended to move RCCL to gating changes on CI failures. Right now, only build/unittests run per PR consistently. We should eventually add all single and multi-node test status badges once those tests are running in presubmit and continuously on develop	2025-08-11 14:02:46 -07:00
Nilesh M Negi	5036d0e713	[BUILD] Fix UT packaging on Debian OS (#1848 )	2025-08-11 09:43:26 -05:00
Rahul Vaidya	cbbc713b03	Fix rccl-UnitTests packaging on Debian systems (#1846 ) Signed-off-by: ravaidya <ravaidya@amd.com>	2025-08-08 12:28:56 -05:00
isaki001	74d82a8145	enable more events for LL128 NPKIT trace collection (#1827 )	2025-08-07 11:19:36 -05:00
awelling2801	82bea39280	Created coverage tests for rccl_wrap (#1694 ) * Created coverage tests for rccl_wrap RCCL_EXPOSE_STATIC off by default Coverage tests for rccl_wrap.cc * Remove RCCL_EXPOSE_STATIC dependency * Removed Rcclwrap.RcclGetAlgoInfoTest * Remove comments * Corrected RCCL_EXPOSE_STATIC definition logic --------- Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com> Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>	2025-08-06 14:48:00 -05:00
Avinash	3f8cac388e	Compiler warnings fix 2 (#1801 ) * Changes to device code * Changes to src/misc * Changes to graph * src/include changes * src/transport changes * changes in init, enqueue, proxy * Changes to CMakeLists.txt * Additional changes to device code * Additional changes to net.cc * adding 'compiler warning' tag to ease upstream merge' * typo correction * Addessing comments * Additional changes for new commits	2025-08-05 17:36:23 -05:00
Arm Patinyasakdikul	6fc228e247	Disable context tracking for the current version. (#1839 )	2025-08-04 10:48:00 -05:00
Atul Kulkarni	0e7d7da55d	Add unit tests for graph/xml.cc & graph/xml.h (#1833 ) * Added new binary for executing unit tests Added new unit tests for argcheck.cc and alt_rsmi.cc files Modified the method to execute unit tests to cover static methods by using a bash script to convert static to non-static functions and variables on the fly restricted to debug build type. * Added new unit tests for src/transport/shm.cc * Added new unit tests for graph/xml.cc	2025-08-01 14:20:27 -05:00
Atul Kulkarni	e2c9f2feab	Update help text in README (#1837 )	2025-08-01 14:19:27 -05:00
awelling2801	5ecc1b7ede	Added tests for coll_reg (#1700 ) Changes to coll_reg Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>	2025-07-31 13:49:23 -05:00
dependabot[bot]	32e95963dc	Bump urllib3 from 2.2.2 to 2.5.0 in /docs/sphinx (#1751 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.2 to 2.5.0. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.2.2...2.5.0) --- updated-dependencies: - dependency-name: urllib3 dependency-version: 2.5.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-31 11:25:45 -06:00
dependabot[bot]	1acc3eb6c1	Bump rocm-docs-core from 1.18.2 to 1.22.0 in /docs/sphinx (#1836 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.2 to 1.22.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.2...v1.22.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-version: 1.22.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-31 11:15:01 -06:00
awelling2801	7320752bf3	Added tests for transport.cc (#1725 ) Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>	2025-07-31 11:04:28 -05:00
Rahul Vaidya	0adc5edc74	Fix RHEL10 packaging for rcclras and rccl-UnitTests (#1831 ) Signed-off-by: ravaidya <ravaidya@amd.com>	2025-07-31 11:00:49 -05:00
Nilesh M Negi	bd55f876e9	[DEVICE] Add unroll=2 for gfx950 multi-node (#1824 )	2025-07-31 02:35:26 -05:00

1 2 3 4 5 ...

1885 Commits