rocm-systems

Autor(a)	SHA1	Mensagem	Data
Nusrat Islam	83f8b191ff	ext-src: fix mscclpp allreduce for non-multiple of 128 message sizes (#1556 )	2025-02-21 11:58:10 -06:00
gilbertlee-amd	ddc5d58b93	Rail optimized trees (#1540 ) * Allow disabling rail-optimized trees via RCCL_DISABLE_RAIL_TREES, Graphviz-friendly output via RCCL_OUTPUT_TREES	2025-02-20 15:18:29 -07:00
Nilesh M Negi	159587be5c	[TOOLS] Update rcclDiagnostics script (#1557 ) * [TOOLS] Update rcclDiagnostics script Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> * Fix typo in valid_marketing_names list Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> --------- Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>	2025-02-20 16:11:05 -06:00
akolliasAMD	aedbc95735	reverted the syncLDS back to syncthreads (#1554 )	2025-02-19 10:44:32 -07:00
Wenkai Du	baaa2ac64d	Insert barrier after loading work items to LDS (#1551 )	2025-02-18 10:17:27 -08:00
Wenkai Du	32dc7ef47c	Enable GDRCopy only on gfx94x (#1550 ) * Enable GDRCopy only on gfx94x * Use cudaFree instead of hipFree * Add warning if failed to get device property * Remove extra return	2025-02-17 13:28:19 -08:00
Sohaib Nadeem	2f1c0bb213	Remove COMPILING_TARGETS from CMakeLists.txt (#1533 ) COMPILING_TARGETS is not actually used for --offload-arch option, instead GPU_TARGETS is being used implicitly when we call find_package(hip REQUIRED) (See hip-config-amd.cmake).	2025-02-16 21:46:37 -06:00
Nikhil-Nunna	4ba94d6662	Env conf debug (#1534 ) * Initial Script ready for review * Added RCCL-tests and RCCL versions * Added output folder and README * Base format built * Added ROCm version * Added function to center titles and Vram information * Added HIP version * Cleaned formatting * UCX version and MPI version * Added NUMA balancing * Added rocminfo * Removed notes * Changed regex for broadcom Nic * Removed note by the ACS info * Added Hostname to summary and details * Print summary to terminal * Added argparse * Added flags and readme * Added GPU ID * fixed spelling * renamed script again * Added file descriptor and locked mem checks * Added file descriptor and locked mem checks * Removed extra spaces from summary table * printing output file location * Removed sudo in code and ACS flag	2025-02-14 17:31:18 -06:00
Pedram Alizadeh	0e5f4d0662	reverting the (Reduce NPKit latency overhead in MSCCL kernel) PR #893 (#1525 )	2025-02-14 11:03:43 -05:00
corey-derochie-amd	824b81c034	Revert "replacing rccl_float8 with hip_fp8 and address compatibility issue (#…" (#1545 ) This reverts commit `d437d6e41c`.	2025-02-13 10:00:22 -07:00
mberenjk	d437d6e41c	replacing rccl_float8 with hip_fp8 and address compatibility issue (#1538 ) * replacing rccl_float8 with hip_fp8 and address compatibility issue with gfx942 --------- Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>	2025-02-13 10:34:17 -06:00
Wenkai Du	ebf7e2305e	Print KL/CL/KE events for all warps (#1544 ) * Print KL/CL/KE events for all warps * Fix count off-by-one issue * Fix opCount in KE and restore CPU thread option * Simplify count calculation	2025-02-12 13:36:31 -08:00
Wenkai Du	f5b15f27a9	Move collective trace to HBM and fix log issue (#1542 )	2025-02-11 11:40:14 -08:00
rahulc1984	92ac136db5	Make rccl version detection robust. (#1517 ) * Accept an EXPLICIT_ROCM_VERSION and use that vs inspecting the environment if provided. * Use CMake's built in file reading support vs execute_process (without error checking) to avoid silent but deadly later failures. * Properly quote some comparisons to avoid syntax errors if they happen to have an empty string. * Guard against ROCM_PATH being an empty string, avoiding stray path extensions to root directories, etc. Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>	2025-02-11 10:48:22 -07:00
corey-derochie-amd	42ab425037	Switched from `cmake_host_system_information` feature to a manual parse (#1518 ) * Switched cmake_host_system_information feature to a manual parse to remain cmake 3.5 compliant. * Updating minimum cmake to 3.16 to conform with the rest of ROCm. This change still applies.	2025-02-11 08:51:39 -07:00
Nilesh M Negi	4e406acc43	[UT] Include iomanip if not defined (#1510 ) * [UT] Include iomanip if not defined Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> * Remove include guards `iomanip.h` has pre-defined include guards. These are not needed. Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> --------- Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>	2025-02-11 08:48:47 -07:00
Dingming Wu	e8fb1335fd	Replace atomicAdd with _hip_atmoc_fetch_add in getting colltrace tail position (#1539 )	2025-02-10 08:53:25 -08:00
gilbertlee-amd	6cb0599e38	Updating topology explorer (#1536 )	2025-02-07 08:44:04 -07:00
Vijay Srinivasan	3494f52d40	Adding AINIC Network Plugin check (#1528 ) - Adding AINIC network plugin check to pass unused parameter to pass the channelId to the network plugin layer	2025-02-06 23:37:53 -06:00
Nikhil-Nunna	8abc729f9e	Merge pull request #1535 from Nikhil-Nunna/add-codeowner Added @Nikhil-Nunna as a code owner	2025-02-05 14:41:46 -06:00
Nikhil-Nunna	fd3422afdb	Added Nikhil-Nunna to codeowners	2025-02-05 14:28:00 -06:00
AbandiGa	e92a103bad	Adding @AbandiGa (myself) as code owner (#1532 ) Signed-off-by: AbandiGa <galaband@amd.com>	2025-02-05 13:23:25 -06:00
Wenkai Du	a12bf32475	Reset barrier and make barrier_next thread local (#1531 )	2025-02-05 09:06:48 -08:00
Wenkai Du	d00e903d72	Revert "Remove unused code path (#1527 )" (#1530 ) This reverts commit `091bf899a1`.	2025-02-04 13:14:43 -08:00
Edgar Gabriel	3646b1de43	update CODEOWNERS (#1529 ) * update CODEOWNERS Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>	2025-02-04 11:54:42 -07:00
Wenkai Du	091bf899a1	Remove unused code path (#1527 )	2025-02-04 10:24:56 -08:00
Bertan Dogancay	387c973b5d	[P2P] Have connIdx for both send and recv (#1524 )	2025-02-04 11:53:20 -05:00
isaki001	3398fa78fe	non-hipGraph MSCCL++ tests for allReduce and allGather (#1503 ) * working tests for a single message size * move call_RCCL routine StandaloneUtils, create .cpp file for StandaloneUtils so that it can be included in several tests * simplify test invocation * remove unecessary logs and exit from ncclCommRegister * set expected results for allGather * skip test if nranks doesn't match number of gpus, call getAndDistributeNCCLid only from parent process * fix improper size of expected-results vector * Removing unused changes. * Refactored to create a new file for the forked collectives call, as StandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions. * Apply suggestions from code review Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> --------- Co-authored-by: isaki001 <isakioti@banff-pla-r27-38.pla.dcgpu> Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> Co-authored-by: Corey Derochie <corey.derochie@amd.com>	2025-02-04 09:11:32 -06:00
isaki001	19105206f6	Update MSCCL++ register/deregister (#1523 ) * erase handle key from mscclpp communicator during deregistration * remove check on buffer size being a multiple of 32 from registration/deregistration routines since these checks are applied during enqueue * add check for greater than zero buffer size in mscclpp registration	2025-02-04 09:09:56 -06:00
Bertan Dogancay	5804603632	[BUILD] Fix unsupported arguments in generator (#1519 ) * Fix unsupported arguments in generator * Get ROCM_PATH as env variable	2025-02-03 14:51:55 -05:00
Wenkai Du	a5c6b547a2	Add back opCount and channel ID to debug trace (#1520 )	2025-02-03 08:55:27 -08:00
Jeffrey E Erickson	7af21dd996	modify max memory to use free (#1513 )	2025-02-03 09:35:02 -06:00
Jeffrey Novotny	134f736882	Fix broken link to install instructions (#1515 )	2025-02-03 10:14:40 -05:00
Mustafa Abduljabbar	dc75209dd7	Add IB verbs logging and enable traces through install.sh (#1511 ) * Add IB Verbs logging * Simplify tracing and undo debug.h changes * Update debug.h * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Exchange remote comm device index	2025-01-31 12:35:39 -05:00
Wenkai Du	caba0bc049	Add HDP flush for gfx940 (#1434 ) * Fix collective trace * Use nontemporal for st_global * Fix previous commit * Add HDP flush to data receive path * Fix previous commit * Control flushing by NCCL_NET_FORCE_FLUSH and RCCL_NET_HDP_FLUSH * Introduce RCCL_NET_HDP_FLUSH and RCCL_NET_GDR_FLUSH Both are on by default. Turn both off will skip all flush will likely result in data error. * Enable GDR copy by default * Remove GDR flush env var because it is disabled by GDC flush * Output kernel collective trace at comm destroy by default * Limit kernel timeout messages to 100 * Use system relaxed atomic for loadInt * Refine timeout messages and use atomic for setting offset from CPU * Add kernel trace for barrier timeout * Add backup barrier to avoid race in atomicAdd * Use different counters for different warps * Rework barrier implementation * Fix for other GFX * Use __hip_atomic_store and __hip_atomic_load * Fix bug in previous commit * Don't reset barrier values in running kernel * Update trace format * Fix typo * Switch back to hip_atomic_fetch_add * Use same barrier implementation for all GFX * Remove extra threadfence * Turn off HDP flush by default Please use RCCL_NET_HDP_FLUSH=1 to switch on HDP flush * Remove unnecessary changes from alterative barrier implementation * Added back __threadfence_block * Revert back to threadfence for gfx other than gfx94x	2025-01-31 07:51:10 -08:00
dependabot[bot]	ad8012f2fc	Bump rocm-docs-core from 1.14.1 to 1.15.0 in /docs/sphinx (#1514 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.14.1 to 1.15.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.14.1...v1.15.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-01-30 17:15:17 -07:00
Bertan Dogancay	ecf31da14f	Add ncclDataType_t as type to ROCTX (#1512 )	2025-01-30 13:46:48 -05:00
Arm Patinyasakdikul	6b2b87c9f8	Make proxy dump print out meaningful information. (#1504 ) * Make proxy dump print out meaningful information. fixed: HPEXA-63 * printout raw data instead.	2025-01-29 16:48:49 -06:00
Bertan Dogancay	35fe9e06f3	[Profiler] Enable ROCTX during build by default (#1506 ) * Enable ROCTX during build by default * Check for roctx support in cmake	2025-01-29 11:29:46 -05:00
corey-derochie-amd	bd0f5cccbe	Disabled MSCCL++ feature except when building on Ubuntu or CentOS host systems (#1505 ) * Added condition for MSCCL++ to only build on an Ubuntu host system. * Added CentOS to the supported OS list	2025-01-29 08:54:09 -07:00
Nusrat Islam	7ac82248de	Tune allreduce performance in CPX mode (single OAM) (#1508 )	2025-01-29 08:58:48 -06:00
dependabot[bot]	f84625a1cc	Bump rocm-docs-core from 1.13.0 to 1.14.1 in /docs/sphinx (#1496 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.13.0 to 1.14.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.13.0...v1.14.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-01-23 17:10:36 -07:00
Bertan Dogancay	dd185f26d2	Fix ROCTX call for MSCCL (#1502 )	2025-01-23 16:00:07 -07:00
Bertan Dogancay	27b3921ab0	Merge pull request #1426 from BertanDogancay/nccl-2.22-sync [SYNC] 2.22.3-1	2025-01-23 13:14:05 -05:00
BertanDogancay	36343be84f	Merge remote-tracking branch 'nccl/master' into develop	2025-01-23 12:08:46 -06:00
corey-derochie-amd	b6377e0b8c	Changed working dir for the submodule command and extended it to the json repo (#1495 ) This allows it to work when the sub repos don't exist.	2025-01-23 09:34:25 -07:00
corey-derochie-amd	f77308a2fe	Removing duplicate definitions of `INC_COLL_TRACE` and `traceData` macros (#1500 ) They are nearly identical, except the common.h definition sets `collTrace->channelId`.	2025-01-22 16:50:27 -07:00
Bertan Dogancay	5afe900efd	Only look for librccl .co files in StackSize test (#1499 ) Co-authored-by: BertanDogancay <bertan.dogancay>	2025-01-22 16:48:10 -07:00
isaki001	ff130cce7a	fix scatter_perf crash (#1493 ) * fix scatter_perf crash * Update src/misc/msccl/msccl_lifecycle.cc Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> * Update src/misc/msccl/msccl_lifecycle.cc Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> * More buffsRegisteredNonGraphMode spelling fixes. --------- Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>	2025-01-21 09:24:32 -06:00
isaki001	d89432e8c8	update mscclpp (#1488 ) * update commit hash for mscclpp submodule * update mscclpp submodule * remove print messages in cmake * add back some print messages, update MSCLPP CMAKE_ARGS * enable MSCCL++ patches regardless of finding mscclpp_nccl package	2025-01-20 08:06:43 -06:00

1 2 3 4 5 ...

1644 Cometimentos