76333 Commitit

Tekijä SHA1 Viesti Päivämäärä
Dingming Wu b00ee4c83c Increment opCount for intra-node comms as well (#2024)
* Enhance logging in NCCL initialization
It's convenient to log comms obj and default channels together for debugging

* Add opCount to collDevWork and update increment logic
Added opCount to collDevWork and incremented it when proxyOpQueue is empty (e.g., for intra-node comms)

* Clarify opCount increment logic in enqueue.cc
Updated comment to clarify incrementing opCount for intranode communications.

* Refactor NCCL_INIT logging format
Updated logging format for NCCL_INIT to improve clarity.

* Remove duplicate INFO logging in init.cc
2025-11-10 11:23:49 -06:00
jamessiddeley-amd 42cc721a4b [rocprof-compute] remove references to --kernel-names (#1543)
* remove references to --kernel-names

* ruff format

* remove redundant comments

* update docs and roofline image

* added two output lines to docs
2025-11-10 11:47:39 -05:00
Mark Meserve 60b81681c0 rocprofiler-sdk: attach: rocprofv3-attach py improvements (#1365)
* attach: rocprofv3-attach py improvements

- Handle error status during detachment
- Add detection and error for changing rocprofv3 configuration on reattachment
- Add and improve console messages during attachment and detachment
- Documentation update pass
2025-11-10 09:43:00 -06:00
Julia Jiang 68c2a2b86b SWDEV-565694 - Fix config errors while building HIP documentation (#1767) 2025-11-10 10:30:36 -05:00
Mark Meserve f6b7019470 rocprofiler-sdk: fix formatting from 9f940c7 (#1599) (#1763) 2025-11-10 09:17:48 -06:00
Mark Meserve 11d12a82fb rocprofiler-sdk: attach: fix test permissions (#1528)
* attach: fix test permissions

- Test is now skipped if insufficient permissions detected
- Should fix test (for now) in Azure CI pipeline
- Add more extensive permission checking for the tests
- Add default parameters to prevent running rm -rf on a root directory
- Add use for unused LOG_LEVEL parameter
2025-11-10 09:15:50 -06:00
usrihari123 5feec0513d Fix clang format (#1715) 2025-11-10 09:15:42 -06:00
Rakesh Roy 9cac2e46e4 SWDEV-565668 - Bump minor version for ROCm 7.2 (#1762)
Additionally remove cmake option HIP_OFFICIAL_BUILD
2025-11-10 18:55:52 +05:30
Junhua Shen 9da1572c42 libhsakmt: Refactor for Multi-KFD Context Support (Multiple KFD FDs per Process) (#1701)
* Introduce HsaKFDContext structure and infrastructure for multiple KFD contexts, enabling
   independent contexts within a single process.
* Refactor core components (queue, event, FMM, topology) to be context-aware,
   using explicit HsaKFDContext parameters instead of global state.
* Replace global hsakmt_kfd_fd with context-specific file descriptors, ensuring full context isolation.
* Maintain backward compatibility by redirecting legacy APIs to use the primary context.

This refactoring establishes a foundation for multi-context support while preserving existing functionality.

Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>
2025-11-10 11:19:58 +08:00
Jin Jung 324a5519b9 SWDEV-563842 - Fix Memory Address Offset Bug (#1749)
* SWDEV-563842 - Fix Memory Address Offset Bug

* Revert "SWDEV-563842 - Fix Memory Address Offset Bug"

This reverts commit 477958dc48300ee1fe0166aa6f0d3d8125b91f5e.

* SWDEV-563842 - Fix Memcpy Address Offset Bug

* SWDEV-563842 - Find Memcpy Device Address Offset

* Revert "SWDEV-563842 - Find Memcpy Device Address Offset"

This reverts commit 6c75a9e5b58b7dfabb9e3f91fa3dd892d42639cc.

* Revert "SWDEV-563842 - Fix Memcpy Address Offset Bug"

This reverts commit 0b89072a988074aa4da4e8fc7ba04c554f31ed44.

* SWDEV-563842 - MemObjMap_ Offset Support

This patch fixes the buffer offset handling bug.

* Revert "SWDEV-563842 - MemObjMap_ Offset Support"

This reverts commit 37fce3382465e3420721e5277377f943ec2b30a1.

* SWDEV-563842 - External Memory Buffer View
2025-11-09 12:52:35 -08:00
Dana Robinson 237f64065f Fix typo in CONTRIBUTING.md (#315)
[ROCm/rocshmem commit: 65790c1b4f]
2025-11-09 12:58:19 -06:00
Dana Robinson 65790c1b4f Fix typo in CONTRIBUTING.md (#315) 2025-11-09 12:58:19 -06:00
Victor Zhang 7580052878 SWDEV-564318 - Add support for allocating uncached device memory (#1670) 2025-11-09 12:51:41 -05:00
Gerardo Hernandez 99cab3500d SWDEV-561284 - Fix use of uninitialized memory in Unit_hipMemVmm_Basic and Unit_hipMemVmm_Uncached (#1677) 2025-11-09 12:12:24 +00:00
SaleelK 738bb19835 clr: Increase kernelArg/managedBuffer size (#1586)
* Increase the buffer to 4MB. That can help kernel launches limited by a deep kernel pipeline

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-08 18:32:43 -08:00
ajanicijamd 2f9017f706 Fix build failure with Clang 20. (#1667)
* Modified for Clang

* Updated timemory version so it compiles with Clang 20

* Using TBB version 2018.6 for both GCC and Clang builds
2025-11-08 11:36:12 -05:00
Pengda Xie 93947241d0 SWDEV-556684 - HSAIL cleanup (#1657) 2025-11-08 02:22:03 -08:00
Pengda Xie 5dd15e22ca SWDEV-559514 - Add queue validation to submitMarker sync path (#1308) 2025-11-08 02:21:36 -08:00
lancesix f7ffcd1402 clr: SWDEV-547890 - Bump PAL API version to 954 (#1680)
* clr: Adjust call to ICmdBuffer::CmdCopyMemoryToImage for PAL >= 955

PAL starting versino 955 adds a new argument to
ICmdBuffer::CmdCopyMemoryToImage.  Adjust teh callsite to account
fort his.

* clr: Handle new GpuUtil::TraceSessionState cases for PAL >= 939

Starting PAL API version 939, GpuUtil::TraceSessionState changes its
possible values.  Adjust for it.

* clr: require PAL version 954

Bump the PAL required vesion to 954, as this is required for proper
debugger support.
2025-11-08 00:52:04 +00:00
Pratik Basyal 0325de6538 [ROCm Systems Profiler] Path issue note added to Profiling python script (#1766)
* Note added to Profiling python script

* Doxygen reverted

* Update projects/rocprofiler-systems/docs/how-to/profiling-python-scripts.rst

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-07 18:49:23 -05:00
Jin Jung 291ff6c468 SWDEV-558855 - Enable Interop Map Buffer on Windows (#1748)
* Support Windows HANDLE in interop_map_buffer

* Refactored Windows HANDLE in interop_map_buffer

* ROCr System Dependent Handle Type

* Fix for ROCr Handle Conversion Bug

* Remove Windows Header
2025-11-07 12:47:01 -08:00
Jimbo 2006a411e5 SWDEV-561611 - fix codeql errors by increasing printf buffer sizes (#1507)
* SWDEV-561611 - fix codeql errors by increasing printf buffer sizes

* Replace sprintf with snprintf to prevent potential buffer overflow

---------

Co-authored-by: cadolphe-amd <chris.adolphe@amd.com>
2025-11-07 15:42:56 -05:00
Bertan Dogancay b955a7df40 [GEN/BUILD] Refactor generator script and reduce build time for old archs. (#2030)
[ROCm/rccl commit: b1e680adc0]
2025-11-07 15:15:25 -05:00
Bertan Dogancay b1e680adc0 [GEN/BUILD] Refactor generator script and reduce build time for old archs. (#2030) 2025-11-07 15:15:25 -05:00
David Yat Sin de3b7322f2 rocr/hsakmt: Fix asan compile errors - KFDQMTest (#1638)
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-07 14:52:36 -05:00
David Yat Sin 48cb61f378 rocr: Separate Linux coredump implementation (#1588)
Remove libamdhsacode/win32/elf.h due to license restrictions.

Separate Linux coredump implementation because we do not have the ELF
definitions on Windows.

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-07 14:52:08 -05:00
Bertan Dogancay 524453baea [Launch] Enable Implicit order launch with serial mode (#2033)
[ROCm/rccl commit: a9bb7e9807]
2025-11-07 13:29:53 -05:00
Bertan Dogancay a9bb7e9807 [Launch] Enable Implicit order launch with serial mode (#2033) 2025-11-07 13:29:53 -05:00
Avinash 5ca67dc803 Empty kernel test enhancements [tools] (#1999)
* Initial commit

* Improvements-1

* Initial commit for PR

* Updates warning, run.sh, decoupled loops

* Forcing seq cst for CPU timimg

[ROCm/rccl commit: 85baa0d113]
2025-11-07 12:28:06 -06:00
Avinash 85baa0d113 Empty kernel test enhancements [tools] (#1999)
* Initial commit

* Improvements-1

* Initial commit for PR

* Updates warning, run.sh, decoupled loops

* Forcing seq cst for CPU timimg
2025-11-07 12:28:06 -06:00
Larry Meadows e6fc009b28 SWDEV-552584 fix racy null pointer exception for ompt_callback_task_schedule for ompt-task_early_fulfill tasks (#980)
* Fix for SWDEV-552584
    Two calls to ompt_callback_task_scheduled were issued for the same
    prior task. One of them was ompt_task_complete, which causes
    internal storage to be release and a pointer zeroed. The other
    was ompt_task_early_fulfill, which attempted to reference the
    pointer. The callbacks could come in any order as they were
    from different threads, thus causing a null pointer
    dereference on occasion.  The code was changed to do nothing
    for the early_fulfill. Additional null pointer checks were
    added.

* formatting

* Update ompt.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-07 12:15:48 -06:00
Milan Radosavljevic d9b00da102 Add clean up of buffered_storage files (#1738)
* Add clean up of buffered_storage files

* Add step to workflows to test for remaining temp files after tests

* Applied suggestions from code review

* add deletion of all cache files

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-07 11:51:09 -05:00
Ethan Trinh 6b73f6ab5c fix texture operator error (#1719) 2025-11-07 11:34:58 -05:00
Yiannis Papadopoulos 30785f8d18 rocr: Assume KFD in hsa_amd_interop functions (#1138) 2025-11-07 09:38:06 -06:00
Yiltan 740cbe6098 Use dlopen for libnuma (#312)
[ROCm/rocshmem commit: 80f0a39866]
2025-11-07 10:12:11 -05:00
Yiltan 80f0a39866 Use dlopen for libnuma (#312) 2025-11-07 10:12:11 -05:00
Edgar Gabriel 3c25349ec1 initial commit for gfx12 support (#305)
[ROCm/rocshmem commit: d185fe3555]
2025-11-07 08:54:03 -06:00
Edgar Gabriel d185fe3555 initial commit for gfx12 support (#305) 2025-11-07 08:54:03 -06:00
Milan Radosavljevic a9082a7158 ROCpd schema fetching from rocprofiler-sdk (#1501)
- Integrate rocprofiler-systems with rocprofiler-sdk-rocpd to fetch schema
- If rocprofiler-sdk-rocpd is not availabe, use embedded schema files. With this we provide rocpd format support even if ROCm is not available
- Include detection in CMake if rocprofiler-sdk-rocpd package is available (and valid), and build database class upon that
- Update embedded schema that is used as a fallback.
- Update some validation tests to account for schema changes.
2025-11-07 09:45:29 -05:00
Ben Richard b299eece9b Fix bug in rocprof-compute parsing (#1664)
Were not handling the case where the eval result is None e.g. some
columns have a peak value, but it is unused, so we use 'None', which
evaluates to the None object.

Return empty string in this case.
2025-11-07 09:33:43 -05:00
Edgar Gabriel 5e6a4e15f6 disable memory tests (#310)
disable fine-grain and coarse-grain memory testst until a fix is
available in ROCm 7.1 and/or our CI image. Otherwise we might miss other
errors due to constant CI failures.

[ROCm/rocshmem commit: 4fc5541d78]
2025-11-07 08:04:31 -06:00
Edgar Gabriel 4fc5541d78 disable memory tests (#310)
disable fine-grain and coarse-grain memory testst until a fix is
available in ROCm 7.1 and/or our CI image. Otherwise we might miss other
errors due to constant CI failures.
2025-11-07 08:04:31 -06:00
Gopesh Bhardwaj fabdab7aa4 [aqlprofile] Adding Strix Halo support (#1477)
* Adding Strix Halo support

* copilot review feedback

* Addressing feedback
2025-11-07 00:46:17 -06:00
Gopesh Bhardwaj 06bf110c84 Adding counters support for strix halo (#1358)
* Adding counters support for strix halo

* Updated coutners list

* Added missing counter info

* Updated arch support
2025-11-07 00:45:03 -06:00
Jason Bonnell 6e195ded9b Update rocprofiler_config_interfaces.cmake to use different elf naming (#1722)
* Update rocprofiler_config_interfaces.cmake to use different elf naming

* try out conditional for libelf

* run cmake-format to fix formatting issue

* Remove libelf.patch file from therock-ci-windows.yml

* Remove libelf patch from therock-ci-linux.yml as well
2025-11-06 23:50:02 -05:00
habajpai-amd 590c6c3b4f fix: null pointer after delete in get_stream_id (#1720) 2025-11-06 23:43:34 -05:00
Ghadeer Ahmed H Alabandi 5b66480595 [NET] Enable capping the number of QPs created for send/recv colls (#1998)
[ROCm/rccl commit: 45991fadad]
2025-11-07 00:47:01 +00:00
Ghadeer Ahmed H Alabandi 45991fadad [NET] Enable capping the number of QPs created for send/recv colls (#1998) 2025-11-07 00:47:01 +00:00
David Galiffi 89cf46eb55 Removing jlumbroso/free-disk-space action from workflows (#1700) 2025-11-06 18:11:09 -05:00
Allen Hubbe 5e82060ba0 gda: fix getmem_nbi_wg source and dest (#311)
A copy paste mistake in a previous commit caused source and dest to
be reversed.  Correct the source and dest params.

Fixes: e8a7371007

Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>

[ROCm/rocshmem commit: e2dcf99456]
2025-11-06 16:21:20 -06:00