Commit Graph

68426 Commits

Author SHA1 Message Date
Scott Todd fa772be675 Reapply amdgpu-windows-interop revert. (#1893)
## Overview and rationale

This reverts https://github.com/ROCm/rocm-systems/pull/1886, which...
* Re-applies https://github.com/ROCm/rocm-systems/pull/1866
* Reverts https://github.com/ROCm/rocm-systems/pull/1728

(So it restores the [`amdgpu-windows-interop/`](https://github.com/ROCm/rocm-systems/tree/develop/shared/amdgpu-windows-interop) folder back to the state from a few weeks ago)

The rationale for this change is at https://github.com/ROCm/rocm-systems/pull/1866:
> Last PAL update broke applications on gfx12 Windows.

## Cross-repository change details

That PR failed to build but was merged with this explanation:

> TheRock CI Windows build fails as expected with this revert.
> 
> References to these PAL members need to be stripped out in a patch on TheRock.
> 
> ```
> 11.3	C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(152): error C2039: 'RegisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4	C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> 11.4	C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(195): error C2039: 'UnregisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4	C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> ```

The patch in TheRock was updated in https://github.com/ROCm/TheRock/pull/2154. This rolls forward by updating the ref for TheRock.

That original PR could have been sequenced differently to avoid a build break - perhaps by
* Pointing to a branch in TheRock with the patch rebased
* Deleting the patch in the workflows here but holding a local copy of the path to be applied in workflows
* Landing the patch as a normal commit instead of carrying it at all

## Test plan

1. Watch TheRock CI here (https://github.com/ROCm/rocm-systems/actions/runs/19447202693/job/55644411119?pr=1893)
2. Build locally:
    
    ```bash
    # In rocm-systems
    git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0001-Revert-SWDEV-543498-Some-compute-Ubertrace-profiles-.patch
    git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0003-Use-is_versioned-true-consistently-in-both-Comgr-Loa.patch
    git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0006-Explicitly-load-libamdhip64.so.7.patch
    # Note: the build fails with the observed errors if patch 0001 is not applied!
    
    # In TheRock
    cmake -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_C_COMPILER=cl.exe -DCMAKE_CXX_COMPILER=cl.exe \
      -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
      -DPython3_EXECUTABLE=d:/projects/TheRock/.venv/Scripts/python \
      -DTHEROCK_ROCM_SYSTEMS_SOURCE_DIR=d:/projects/TheRock/../rocm-systems \  # IMPORTANT
      -DTHEROCK_AMDGPU_FAMILIES=gfx110X-all \
      -DBUILD_TESTING=ON \
      -DTHEROCK_ENABLE_ALL=ON \
      -Damd-llvm_BUILD_TYPE=RelWithDebInfo \
      -S D:/projects/TheRock \
      -B D:/projects/TheRock/build \
      -G Ninja
    
    cmake --build D:/projects/TheRock/build --target hip-clr
    # [build] Build finished with exit code 0
    cmake --build D:/projects/TheRock/build --target ocl-clr+dist
    # [build] Build finished with exit code 0
    ```
2025-11-18 07:17:06 -08:00
vedithal-amd 44a32e23ac [rocprofiler-compute] Bump version and update changelog ahead of ROCm 7.2 release (#1908) 2025-11-18 10:04:28 -05:00
Ameya Keshava Mallya 0e28188fdc Merge remote-tracking branch 'origin/develop' into preserved/amdsmi 2025-11-18 01:16:00 +00:00
Ameya Keshava Mallya 8eceb6e5eb Merge commit 'a044536b8d690a9ae5962a93e7596d9eec2030b7' into develop 2025-11-18 01:14:31 +00:00
Sajina PK f6183e3563 [Rocprofiler-systems]: Documentation addition for xgmi and pcie metrics feature (#1798)
* Documentation addition for xgmi and pcie metrics feature

Add documentation to provide details about How to get collect XGMI and PCIe interconnect metrics.

* Apply suggestions from code review

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Update projects/rocprofiler-systems/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Update projects/rocprofiler-systems/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-17 18:34:28 -05:00
ammallya 53c9b9655d Adding amdsmi label (#1894) 2025-11-17 14:53:43 -08:00
Ameya Keshava Mallya ac9e029c3e Add 'projects/amdsmi/' from commit 'b4b3539631460b986dddc86a2303cef11cd38816'
git-subtree-dir: projects/amdsmi
git-subtree-mainline: 0633d8d8ce
git-subtree-split: b4b3539631
2025-11-17 22:28:37 +00:00
Scott Todd 0633d8d8ce Revert "Revert "Update amdgpu-windows-interop with latest changes 20251105 (#…" (#1886)
Reverts ROCm/rocm-systems#1866 (re-landing https://github.com/ROCm/rocm-systems/pull/1728)

This broke Windows builds at https://github.com/ROCm/rocm-systems/actions/workflows/therock-ci.yml?query=branch%3Adevelop+event%3Apush, I think intentionally? We need a plan for rolling out such changes without build breaks.

Sample logs: https://github.com/ROCm/rocm-systems/actions/runs/19371422209/job/55428130376#step:14:6597
```
[ocl-clr] [134/153] Building CXX object rocclr\CMakeFiles\rocclr.dir\device\pal\palubercapturemgr.cpp.obj
[ocl-clr] FAILED: rocclr/CMakeFiles/rocclr.dir/device/pal/palubercapturemgr.cpp.obj 
[ocl-clr] ccache "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\bin\Hostx64\x64\cl.exe"  /nologo /TP -DATI_OS_WIN -DCL_TARGET_OPENCL_VERSION=220 -DCL_USE_DEPRECATED_OPENCL_1_0_APIS -DCL_USE_DEPRECATED_OPENCL_1_1_APIS -DCL_USE_DEPRECATED_OPENCL_1_2_APIS -DCL_USE_DEPRECATED_OPENCL_2_0_APIS -DCOMGR_DYN_DLL -DGPUOPEN_CLIENT_INTERFACE_MAJOR_VERSION=42 -DHAVE_CL2_HPP -DLITTLEENDIAN_CPU -DOPENCL_C_MAJOR=2 -DOPENCL_C_MINOR=0 -DOPENCL_MAJOR=2 -DOPENCL_MINOR=1 -DPAL_BUILD_RDF=1 -DPAL_CLIENT_INTERFACE_MAJOR_VERSION=932 -DPAL_DEVELOPER_BUILD=0 -DPAL_GPUOPEN_OCL -DPAL_KMT_BUILD=1 -DROCCLR_VERSION_GITHASH=\"38294ab\" -DWITH_PAL_DEVICE -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\.. -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\compiler\lib -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\compiler\lib\include -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\compiler\lib\backends\common -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\elf -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\include -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\opencl\khronos\headers\opencl2.2\CL -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\opencl\khronos\headers\opencl2.2\CL\.. -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\opencl\khronos\headers\opencl2.2\CL\..\.. -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\opencl\khronos\headers\opencl2.2\CL\..\..\.. -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\opencl\khronos\headers\opencl2.2\CL\..\..\..\.. -IC:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\opencl\khronos\headers\opencl2.2\CL\..\..\..\..\amdocl -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\core -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\util -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\shared\inc -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\shared\devdriver\shared\legacy\inc -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\shared\devdriver\third_party\dd_crc32\inc -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\shared\metrohash\src -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\sc\HSAIL\ext\loader -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\hsail-compiler\lib\loaders\elf\utils\libelf\..\..\..\..\..\lib\loaders\elf\utils\common -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\hsail-compiler\lib\loaders\elf\utils\libelf\..\..\..\..\..\lib\loaders\elf\utils\common\win32 -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\hsail-compiler\lib\loaders\elf\utils\libelf\..\..\..\..\..\lib\loaders\elf\utils\libelf -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\sc\HSAIL\ext\libamdhsacode -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\sc\HSAIL\ext\libamdhsacode\..\..\include -IC:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\sc\HSAIL\ext\libamdhsacode\..\..\hsail-tools\libHSAIL -external:IB:\build\compiler\amd-comgr\dist\include -external:W0 /DWIN32 /D_WINDOWS /EHsc /DWIN32 /D_WINDOWS  /EHsc /O2 /Ob2 /DNDEBUG -std:c++20 -MD /wd4267 /wd4244 /wd4996 /MT /showIncludes /Forocclr\CMakeFiles\rocclr.dir\device\pal\palubercapturemgr.cpp.obj /Fdrocclr\CMakeFiles\rocclr.dir\rocclr.pdb /FS -c C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp
[ocl-clr] cl : Command line warning D9025 : overriding '/MD' with '/MT'
[ocl-clr] C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(152): error C2039: 'RegisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
[ocl-clr] C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
[ocl-clr] C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(195): error C2039: 'UnregisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
[ocl-clr] C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
[ocl-clr] [135/153] Building CXX object rocclr\CMakeFiles\rocclr.dir\device\pal\paldevicegl.cpp.obj
```
2025-11-17 14:27:09 -08:00
Gavini, Sumanth c20a8a9a84 [SWDEV-563788] - Fix: amdsmitst crash from kernel in xcp_metrics read (#826)
Use fork/waitpid to isolate API call and detect SIGKILL from kernel

Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>

[ROCm/amdsmi commit: a044536b8d]
2025-11-17 16:24:45 -06:00
Gavini, Sumanth a044536b8d [SWDEV-563788] - Fix: amdsmitst crash from kernel in xcp_metrics read (#826)
Use fork/waitpid to isolate API call and detect SIGKILL from kernel

Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
2025-11-17 16:24:45 -06:00
randyh62 92b3629b25 Update environment.yml (#1884)
Update path to requirements.txt
2025-11-17 12:10:56 -08:00
Ramalingam, Muthusamy b4b3539631 [SWDEV-560044]: [AMDSMI][CPU] Update AMDSMI as per latest ESMI Driver (#763)
[AMDSMI][CPU] Update AMDSMI as per latest ESMI Driver,
1) hsmp_acpi
2) amd_hsmp
3) hsmp_common

Signed-off-by: Muthusamy Ramalingam <muthusamy.ramalingam@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: ssaka_amdeng <SitharamMurthy.Saka@amd.com>
2025-11-17 13:45:43 -06:00
Ramalingam, Muthusamy 3659db6f21 [SWDEV-560044]: [AMDSMI][CPU] Update AMDSMI as per latest ESMI Driver (#763)
[AMDSMI][CPU] Update AMDSMI as per latest ESMI Driver,
1) hsmp_acpi
2) amd_hsmp
3) hsmp_common

Signed-off-by: Muthusamy Ramalingam <muthusamy.ramalingam@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: ssaka_amdeng <SitharamMurthy.Saka@amd.com>

[ROCm/amdsmi commit: b4b3539631]
2025-11-17 13:45:43 -06:00
Saeed, Oosman 05ea00dcc4 [SWDEV_562432] update inband CPER meta data to be more consistent with OOB (#824)
* Added Product Serial Number to the raw_bytes cper entries
* Added Product Serial Number to the Python API return
---------

Signed-off-by: Saeed, Oosman <Oosman.Saeed@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
2025-11-17 13:25:56 -06:00
Saeed, Oosman 6ba1f066ad [SWDEV_562432] update inband CPER meta data to be more consistent with OOB (#824)
* Added Product Serial Number to the raw_bytes cper entries
* Added Product Serial Number to the Python API return
---------

Signed-off-by: Saeed, Oosman <Oosman.Saeed@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 05ea00dcc4]
2025-11-17 13:25:56 -06:00
Billakanti, Koushik 23f68555db [SWDEV-563828] Fix incorrect help text for --perf-determinism (#812)
* [SWDEV-563828] Fix incorrect help text for --perf-determinism flag to indicate it expects GFXCLK frequency in MHz

---------

Signed-off-by: Billakanti, Koushik <Koushik.Billakanti@amd.com>
2025-11-17 12:55:14 -06:00
Billakanti, Koushik 72d9e8a607 [SWDEV-563828] Fix incorrect help text for --perf-determinism (#812)
* [SWDEV-563828] Fix incorrect help text for --perf-determinism flag to indicate it expects GFXCLK frequency in MHz

---------

Signed-off-by: Billakanti, Koushik <Koushik.Billakanti@amd.com>

[ROCm/amdsmi commit: 23f68555db]
2025-11-17 12:55:14 -06:00
Milan Radosavljevic db111129ab [rocprof-sys] Add test to check perfetto files have been merged (#1863) 2025-11-17 11:50:40 -05:00
David Galiffi 828921c616 [rocprofiler-systems] Update CHANGELOG.md with 7.1.1 notes (#1844)
* Update CHANGELOG.md

* Update projects/rocprofiler-systems/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-11-17 11:47:08 -05:00
jamessiddeley-amd d49e2e35fd [rocprof-compute] Automate ctest coverage and test cases on runners with CDash (#1481)
* Add nightly coverage workflow

* ruff formatting

* temp workflow testing

* restore workflow file

* add workflow condition

* update workflow file

* update workflow file

* fix typo in run-ci.py

* edit run-ci.py

* add python deps install

* add python deps install

* add python deps install

* add python deps install

* check if enable coverage is on when using workflow

* remove github CI breakdown and fix enable coverage

* set cache variables must be set before dashboard starts

* Update run-ci.py

* Update run-ci.py to fix ctest cache

* Update rocprofiler-compute-code-coverage.yml to install tests

* Update rocprofiler-compute-code-coverage.yml

* Restore workflow file

* Update run-ci.py

* Simplify workflow build command

* Update run-ci.py to build tests

* edited run-ci script

* edit ctest configure commands

* edit ctest configure commands to be on one line

* edit ctest configure command to include path to amdclang++

* update clang check in tests/cmakelists.txt

* update rocm

* update rocm

* update rocm version 7.0.2

* update tests/CMakeLists.txt

* use tarball instead for rocm install

* apt install rocm-dev instead for 7.0.0 release

* workflow tweaks

* update to use new 'tools' dir

* install rocm-dev

* add CMAKE_CXX_COMPILER as clang

* update tests/cmakelists.txt

* update cdasg site and build names

* remove run automatically on pull requests

* ruff format

* increased timeouts for tests

* add back reruns for workflow testing

* fix typo

* rename workflow "nightly" -> "code"

* added tracks to keep track of gpu (325 vs 355)

* remove test_db_connector.py

* revert build names and tracking

* update workflow pushes

* CMake format

* changed parallel level back to 1
2025-11-17 09:24:24 -05:00
Gopesh Bhardwaj 75ad45d5f1 Added missing license (#1861) 2025-11-17 11:16:09 +05:30
Sajina PK 09b8342e22 [Rocprofiler-systems] : Add XGMI and PCIe metrics to the profiling data (#1628)
* Add XGMI and PCIe metrics to the profiling data

Add support for AMD XGMI (GPU-to-GPU interconnect) and PCIe
metrics:
  * XGMI link width in bits
  * XGMI link speed in GT/s
  * Per-link read bandwidth (KB)
  * Per-link write bandwidth (KB)

- Add new categories for PCIe metrics:
  * PCIe link width
  * PCIe link speed in GT/s
  * Accumulated bandwidth (MB)
  * Instantaneous bandwidth (MB/s)

* Fix VCN/JPEG insert logic

* Modify the gpu_metrics struct to accomodate XCP structure

* Add ctest automation for gpu interconnect metrics

* Refactor to move gpu_metrics struct and serialization to another file

* Possible fix for timeout in CI

Fix redundant skip check in ctest
Add xgmi and pcie option in rocprof-sys-avail.

* Change2: Address review comments

Change ctest sampling to avoid timeout
Change variable name and code structuring

* Add option in ctest to run rocprof-sys-run without rewrite

Run transferbench with rocprof-sys-run without sampling

* Change3: Fix sample insert bug and address review comments

xgmi and pci support check
renaming variables
additional hip_api validation in rocpd

* Reduce the load from the trnasferBench sample

The CI builds were timing out when flushing a big temporary file to the
DB: (2720824.23 KB / 2720.82 MB / 2.72 GB)...
2025-11-14 19:42:33 -05:00
Bindhiya Kanangot Balakrishnan e1b3d5f02e [SWDEV-538483] Fix C++17 fs linking for GCC<9.0
Added check for GCC versions prior to 9.0 and
link with libstdc++fs when needed. This fixes
undefined symbols on older systems like Deb10
with GCC 8.3.0.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-11-14 17:59:07 -06:00
Bindhiya Kanangot Balakrishnan 40aa184c2a [SWDEV-538483] Fix C++17 fs linking for GCC<9.0
Added check for GCC versions prior to 9.0 and
link with libstdc++fs when needed. This fixes
undefined symbols on older systems like Deb10
with GCC 8.3.0.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: e1b3d5f02e]
2025-11-14 17:59:07 -06:00
JC c9dd49c48a Fix SignatureDoesNotMatch by aws credential option (#1867)
Replicating https://github.com/ROCm/TheRock/pull/2147#discussion_r2528008441
## Motivation

Fixes https://github.com/ROCm/TheRock/issues/875 which is the issue where Windows builds would fail randomly when uploading to s3 with the `SignatureDoesNotMatch` error as a result of special characters existing in the AWS Access Keys generated by the `configure-aws-credentials` action that is passed through Windows environment variables to `aws-cli`. More details below.

## Technical Details

https://github.com/ROCm/TheRock/issues/875#issuecomment-3530851762
In summary, in Windows workflows, the `special-characters-workaround` option is set to true for the `configure-aws-credentials` action which will regenerate access keys until there are no special characters that may not be passable through windows environment variables correctly.

## Test Plan

Observe CI.

## Test Result

TBD.

## Submission Checklist

- [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2025-11-14 14:04:54 -08:00
David Yat Sin 9535b7fcbe rocr: Fix exception on AsyncEventControl init (#1852)
* rocr: Fix exception on AsyncEventControl init

Fix exception on init when compiling with in release mode.

* rocr: Fix crash when interrupts are disabled

Fix segfault due to assert for signal->EopEvent() being false when
HSA_ENABLE_INTERRUPT=0. Use Signal::WaitMultiple(..) when interrupt is
disabled.

---------

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-14 12:45:34 -08:00
Alysa Liu 2327cd35c8 rocr: Fix VMM cpu mapping clean up (#1831)
Remove CPU mapping before calling RemoveAccess().
2025-11-14 13:52:45 -05:00
German Andryeyev ff4782620e SWDEV-547108 - Fix PAL build with HSA backend (#1850)
When hip is built with HSA backend then the headers from ROCR will be used, but
scratch_backing_memory_byte_size is a part of amd_queue_v2_t structure
2025-11-14 12:28:03 -05:00
Joseph Macaranas 598ca70861 Revert "Update amdgpu-windows-interop with latest changes 20251105 (#1728)" (#1866)
- Reverts #1728
- Last PAL update broke applications on gfx12 Windows.
- Will need to reapply a patch to ubertrace when bumping submodule on TheRock.
2025-11-14 11:48:10 -05:00
marandje 5616a255e2 SWDEV-515530 - Re-enable passing tests (#1013) 2025-11-14 16:36:44 +01:00
Swati Rawat cb257ab9f7 [rdc] Replace readme link rdc -> rocm-systems/projects/rdc (#1758)
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-11-14 13:19:26 +01:00
amilanov-amd 738bf16008 [hip-tests] Tag multigpu tests with Catch2 tags (#1315) 2025-11-14 13:00:30 +01:00
venkatesh-amd f7249e092b SWDEV-533237 : Added test cases for hipOccupancyAvailableDynamicSMemPer… (#716)
* SWDEV-533237 Added test cases for hipOccupancyAvailableDynamicSMemPerBlock API

* SWDEV-533237 : Added test cases for hipOccupancyAvailableDynamicSMemPerBlock

* SWDEV-533237 : Addressed review comments for hipOccupancyAvailableDynamicSMemPerBlock aip test cases

---------

Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com>
2025-11-14 15:41:45 +05:30
Bindhiya Kanangot Balakrishnan 1b027d15bd [SWDEV-566930] Fix numa CSV output
Handled numa data - including cpu and socket list, bitmask,
and affinity for csv format.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-11-13 21:52:49 -06:00
Bindhiya Kanangot Balakrishnan 5e56281eb0 [SWDEV-566930] Fix numa CSV output
Handled numa data - including cpu and socket list, bitmask,
and affinity for csv format.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 1b027d15bd]
2025-11-13 21:52:49 -06:00
Kanangot Balakrishnan, Bindhiya f8e4771363 [SWDEV-538483] Add NPM API's and CLI (#817)
* Added Python & C API's for new node devices. Currently these are functional for node 0 only.
 - amdsmi_get_node_handle
 - amdsmi_get_npm_info
* Added `amd-smi node` CLI for Node Power Management

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-11-13 21:51:31 -06:00
Kanangot Balakrishnan, Bindhiya 072daa28d5 [SWDEV-538483] Add NPM API's and CLI (#817)
* Added Python & C API's for new node devices. Currently these are functional for node 0 only.
 - amdsmi_get_node_handle
 - amdsmi_get_npm_info
* Added `amd-smi node` CLI for Node Power Management

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: f8e4771363]
2025-11-13 21:51:31 -06:00
Milan Radosavljevic a77be32660 Prevent duplicated sdk events (#1826) 2025-11-13 22:36:36 -05:00
David Galiffi 540eda3865 [rocprof-sys] Forward ctest labels from the execution test to the validation test. (#1697)
* Forward ctest labels from the execution test to the validation test.

* Adjust test validation parameters for amid_smi samples

The actual number of samples will vary depending on the GPU. This test
is just to validate the presence of the samples
2025-11-13 21:49:07 -05:00
Milan Radosavljevic 833c250c27 Add clean up fixture for trace cache temporary files (#1836)
* Add clean up fixture for trace cache tmp files

* Switch to bash instead of cmake running command
2025-11-13 21:01:04 -05:00
Matt Arsenault 4830979f0e SWDEV-548892 - Stop using ocml fma wrappers (#1702)
Directly use elementwise builtin
2025-11-13 16:20:27 -08:00
Matt Arsenault 42e91b8934 SWDEV-548892 - Stop using ocml sqrt wrappers (#1716) 2025-11-13 16:19:44 -08:00
Charis Poag 00a893d299 Fix PPT - reset calls, unit format, and get_power_cap()
Changes:
  - Simplified reset calls
  - Updated static limit N/A values to all possible data
  (helps csv format be consistent)
  - Unit format was broken on static
  - get_power_cap() had min/max values swapped, and the return
    was missing two fields
  - Updated changelog to reflect all changes

Change-Id: I23713471b984f52085372486c6e6ff852e2f42f8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-11-13 15:40:13 -06:00
Charis Poag e53645e871 Fix PPT - reset calls, unit format, and get_power_cap()
Changes:
  - Simplified reset calls
  - Updated static limit N/A values to all possible data
  (helps csv format be consistent)
  - Unit format was broken on static
  - get_power_cap() had min/max values swapped, and the return
    was missing two fields
  - Updated changelog to reflect all changes

Change-Id: I23713471b984f52085372486c6e6ff852e2f42f8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 00a893d299]
2025-11-13 15:40:13 -06:00
Kian Cossettini 65b607b0bd [rocprofiler-systems] Add rocprof-sys-build to gitignore (#1829)
* Add rocprof-sys-build to gitignore

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-13 16:22:19 -05:00
Julia Jiang 5599e8b1de SWDEV-561500 - Update change log and port 7.1.1 to develop branch (#1688)
* SWDEV-561500 - Porting changelog(up to 7.1.1) to develop branch

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md
2025-11-13 12:22:34 -08:00
gabrpham_amdeng 46e5f157f0 Fix PCIe Levels converting to csv
Change-Id: Id69b8bbc167887673a88e13eb497bdeac6dd0425
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-11-13 13:08:12 -06:00
gabrpham_amdeng 3527ecf8dd Fix PCIe Levels converting to csv
Change-Id: Id69b8bbc167887673a88e13eb497bdeac6dd0425
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>


[ROCm/amdsmi commit: 46e5f157f0]
2025-11-13 13:08:12 -06:00
gabrpham_amdeng 18faddf6f3 Added support for configuring PPT1 power cap
- Updated python integration test to account for PPT1 support changes
  - Updated set/reset power-cap input format
  - Adjusted python API and updated C++ API test

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
2025-11-13 13:08:12 -06:00
gabrpham_amdeng 351b6f96ae Added support for configuring PPT1 power cap
- Updated python integration test to account for PPT1 support changes
  - Updated set/reset power-cap input format
  - Adjusted python API and updated C++ API test

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f


[ROCm/amdsmi commit: 18faddf6f3]
2025-11-13 13:08:12 -06:00