Граф коммитов

1911 Коммитов

Автор SHA1 Сообщение Дата
Milan Radosavljevic 940488ed58 [rocprofiler-systems] Fix naming and description of process_page category (#2606) 2026-01-15 16:10:50 +01:00
Milan Radosavljevic 318d13870f [rocprofiler-systems] Update logging to use spdlog library (#2428)
## Motivation

- Structured logging with proper log levels (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Better performance through compile-time formatting
- Consistent formatting using fmt library
- Runtime log level control via arguments and environment variables
- Easier maintenance and debugging capabilities

## Technical Details

- Added spdlog as a submodule and integrated it into CMake build system
- Created new `rocprofiler-systems-logger` library wrapping spdlog functionality
- Replaced custom logging macros (`ROCPROFSYS_VERBOSE`, `ROCPROFSYS_DEBUG`, `ROCPROFSYS_FATAL`, `ROCPROFSYS_REQUIRE`, `ROCPROFSYS_CI_THROW`, etc.) with spdlog equivalents (`LOG_DEBUG`, `LOG_WARNING`, `LOG_CRITICAL`, etc.)
- Implemented log level control through command-line arguments and environment variables
- Converted assertion macros to proper error handling with exceptions and std::abort()
2026-01-14 15:27:51 -05:00
Joseph Narlo 499127c0b9 [SWDEV-553434] No direct way to get the BASEBOARD temperature info (#2502)
* [SWDEV-553434] No direct way to get the BASEBOARD temperature info. Need to iterate all gpus

Signed-off-by: amd-josnarlo <josnarlo.amd.com>

---------

Signed-off-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
2026-01-14 13:52:58 -06:00
David Yat Sin a3b445118d SWDEV-519413 - Ignore ROCr shutdown events (#1616)
ROCr now reports a shutdown event, but this is not a fatal error. Ignore
this event.
2026-01-14 11:28:03 -08:00
xuchen-amd 71b9ea6ba0 [rocprofiler-compute] improve config management system (#2359) 2026-01-14 13:20:27 -05:00
Luca Bruni d7ff927690 [clr] Fix device printf pointer advancement issue with string format specifiers (#1313) 2026-01-14 13:05:25 -05:00
habajpai-amd bad8d915c3 Fix: Add visibility hidden to devInfoTypesStrings to prevent symbol interposition (#2575) 2026-01-14 09:48:49 -08:00
pghoshamd d2a1fc945e SWDEV-569319 Fix dangling reference warning (#2509)
* SWDEV-569319 Fix dangling reference warning

* fix nullptr warning

* use emplace

* return regular pointer
2026-01-13 15:39:03 -06:00
hongkzha-amd 9dc2488b6b rocrtst: Add test cases for interrupt disabled mode (#2385)
Add explicit test cases to verify ROCr functionality with interrupts
disabled (HSA_ENABLE_INTERRUPT=0). This ensures compatibility with
virtio, dtif, and WSL configurations which require interrupt-disabled
mode.

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
2026-01-13 12:10:11 -06:00
hongkzha-amd b3c4e94e70 rocr: Improve memory protection and WSL compatibility (#2274)
* rocr: Add ProtectMemory API and use it in RemoveAccess
Replace munmap + mmap with mprotect when removing memory access.
This improves performance by 5-10x, ensures atomicity (no race
condition window), and prepares for WSL/DXG compatibility fixes.

Suggested-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: Skip CPU mapping operations on WSL
On WSL, CPU cannot access GPU VRAM due to platform restrictions.
CPU access would fault-in system RAM instead, causing data corruption
and memory leaks. Return HSA_STATUS_ERROR to fail fast rather than
silently creating broken mappings. GPU-to-GPU mappings remain functional.

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: reduce ifdef linux
v2: Fix IsDXG check logic

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

---------
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
2026-01-13 12:08:20 -06:00
Jan Stephan 2e8c863341 Use doxysphinx includes for enums, macros and global types (#2273)
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
2026-01-13 17:33:49 +01:00
Jan Stephan 88584f3c0d Fix wrong call to executable (#2290)
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
2026-01-13 17:31:10 +01:00
Adam Pryor 9425a2f687 [SWDEV-569427] Fix segfault calling bad page info (#2547) 2026-01-13 09:44:49 -06:00
AidanBeltonS 607d66e87c Add messages to static asserts to prevent warnings (#1011) 2026-01-13 14:02:36 +00:00
Fábio Mestre 09a01ee11c Replace usages of __ockl_clz with builtins (#2234) 2026-01-13 11:15:46 +01:00
Fábio Mestre 61325db1c8 Fix AMD_LOG_LEVEL_SIZE env variable (#2463)
AMD_LOG_LEVEL_SIZE is being used in a global variable.
This always uses the default value of 2048 because the
HIP runtime doesn't have the opportunity to load
environment variables at the point where global variables
are initialized.

The solution is to use AMD_LOG_LEVEL_SIZE inside
truncate_log_file() function.
2026-01-13 09:57:49 +00:00
Jan Stephan 35a5274b84 CSS: Don't reference images that aren't generated by Doxygen (#2295)
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
2026-01-13 10:11:57 +01:00
David Galiffi 2daec0e4d0 Revert 63713f01e0 (#2585)
## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
Remove Fortran example due to Palamida scan violation.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->
Revert 63713f01e0.
New test to be added later.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-12 23:44:26 -05:00
randyh62 21b6021848 Restore Lane masks bit shift content (#2411)
Co-authored-by: Christophe Paquot <35546540+chrispaquot@users.noreply.github.com>
2026-01-12 19:01:19 -05:00
Jin Jung d4758bc29e SWDEV-570501 - Add Windows support for hipGraphicsGLRegisterBuffer (#2323) 2026-01-12 13:10:46 -06:00
SaleelK e6e0378acd clr: Always query new engine for intergpu copies (#2559) 2026-01-12 11:01:02 -08:00
Mythreya Kuricheti 36d9d33d90 Users/mkuriche/rocprofiler sdk fmt build fix memory header (#2537)
* [rocprofiler-sdk] Fix fmt::join build errors

- remedy use of fmt::join without include <fmt/ranges.h>

* include memory header

* Disable FMT build for SDK CI

* Add -DROCPROFILER_BUILD_FMT=OFF to sanitizer steps

* Add temporary workaround for rccl.h issue

* Add ROCPROFILER_INTERNAL_RCCL_API_TRACE to SDK CI builds

* disable clang-tidy for vendored includes

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
2026-01-12 12:59:47 -05:00
Andrei Kochin 5e15839611 Revert "SWDEV-566854 - Improve memory object handling (#1939)" (#2572)
This reverts commit 39d8432893.

rocprim failures were introduced with the commit.

Based on the @erman-gurses investigation:

Based on the list here:
2789ea4...050e88e

https://github.com/ROCm/TheRock/actions/runs/20864279671 -> e005f84 (FAILED)
https://github.com/ROCm/TheRock/actions/runs/20867580342 -> 39d8432 (FAILED)
https://github.com/ROCm/TheRock/actions/runs/20870979894 -> 88f4bb1 (PASSED)
https://github.com/ROCm/TheRock/actions/runs/20872795557 -> 11d9472 (PASSED)
So the issue comes from this commit SWDEV-566854 - Improve memory object handling (#1939) SHA: 39d8432
2026-01-12 12:09:16 -05:00
Istvan Kiss 11c294d586 Update HIP definition (#2134)
* Update what is hip

* Update HIP runtime page

* Update images

* Remove omnitrace

* Quick fix

* Feedback fixes

* Minor  fixes

* Update SAXPY tutorial

Signed-off-by: Jan Stephan <jan.stephan@amd.com>

---------

Signed-off-by: Jan Stephan <jan.stephan@amd.com>
Co-authored-by: Adel Johar <adel.johar@amd.com>
Co-authored-by: Jan Stephan <jan.stephan@amd.com>
2026-01-12 14:44:21 +01:00
Honglei Huang 054bf836f1 [rocr/libhskamt/virtio] Add some apis into libhsakmt virtio (#2457)
* libhsakmt/virtio: Add alloc memory align api

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Rename CLGL BO to AMDGPU BO

Rename VHSA_BO_CLGL to VHSA_BO_AMDGPU to support generic AMDGPU buffer objects, not just CL/GL interop.

* libhsakmt/virtio: Add atomic helpers and node lookup

Add vhsakmt_atomic_inc/dec macros and vhsakmt_get_node_by_id helper function.

* libhsakmt/virtio: Add AMDGPU device initialization support

Add vamdgpu_device_initialize and vamdgpu_device_deinitialize functions.

* libhsakmt/virtio: Add AMDGPU device handle and DRM command support

Add vamdgpu_device_get_fd, vdrmCommandWriteRead and update vhsaKmtGetAMDGPUDeviceHandle.

* libhsakmt/virtio: Add AMDGPU BO free and CPU map support

Add vamdgpu_bo_free and vamdgpu_bo_cpu_map functions.

* libhsakmt/virtio: Add AMDGPU BO import and export support

Add vamdgpu_bo_import, vamdgpu_bo_export and vhsakmt_bo_from_resid functions.

* libhsakmt/virtio: Add AMDGPU BO VA operation support

Add vamdgpu_bo_va_op function.

* libhsakmt/virtio: Add dma buf export support

Add vhsaKmtExportDMABufHandle API in virtio driver to support export
feature.

* libhsakmt/virtio: Fix potential deadlock in userptr deregistration

Refactor vhsakmt_deregister_userptr_non_svm to avoid calling
vhsakmt_destroy_userptr while holding the bo_handles_mutex lock.
Previously, destroying userptrs directly while iterating the tree
could cause deadlock issues due to nested locking.

- Move interval tree removal from vhsakmt_destroy_userptr to caller
- Collect BOs to free in a temporary array during tree traversal
- Destroy BOs after releasing the mutex to avoid lock contention
- Use dynamic array with realloc to handle arbitrary number of BOs

Signed-off-by: Honglei Huang <honghuan@amd.com>

* rocr: driver/virtio: Implement DMA-BUF import/export and memory mapping APIs

Implement the missing DMA-BUF handling and memory mapping functions
in the virtio KFD driver to enable cross-process memory sharing:

- ExportDMABuf: Export HSA memory as DMA-BUF file descriptor
- ImportDMABuf: Import DMA-BUF fd as shareable buffer object
- Map: Map imported buffer into virtual address space with permissions
- Unmap: Unmap buffer from virtual address space
- ReleaseShareableHandle: Free imported buffer object

Also add drm_perm() helper to convert HSA access permissions to
AMDGPU VM page flags (READABLE/WRITEABLE).

These APIs enable IPC memory sharing between HSA processes through
DMA-BUF mechanism in virtualized environments.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add register memory APIs

Add two new memory registration functions to the virtio HSA KMT library:

1. vhsaKmtRegisterMemory: A simplified wrapper for vhsaKmtRegisterMemoryWithFlags
   that uses default CoarseGrain memory flags.

2. vhsaKmtRegisterMemoryToNodes: A stub implementation for registering memory
   to specific nodes. Returns HSAKMT_STATUS_NOT_IMPLEMENTED as it's currently
   not used in ROCR.

Changes:
- Added function declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Exported symbols in libhsakmt_virtio.ver

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add graphics handle registration and mapping APIs

- Add vhsaKmtRegisterGraphicsHandleToNodesExt() with flags support
- Add vhsaKmtMapGraphicHandle() and vhsaKmtUnmapGraphicHandle() stubs
- Refactor existing registration API to use extended version

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add virtio support for queue APIs

Implement vhsaKmtUpdateQueue, vhsaKmtSetQueueCUMask,
vhsaKmtAllocQueueGWS and vhsaKmtGetQueueInfo functions
with virtio protocol extensions and symbol exports.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add new virtio API support for model, SMI, and XNACK mode

Add three new API functions to the virtio backend:
- vhsaKmtModelEnabled: Check if pre-silicon model is enabled (returns false for virtio)
- vhsaKmtOpenSMI: Open SMI interface for a node (not yet supported in virtio)
- vhsaKmtSetXNACKMode: Set XNACK mode via virtio control command

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add shared memory support for virtio backend

Implement shared memory APIs for the virtio backend to enable
memory sharing between processes:

- Add vhsaKmtShareMemory() to share memory regions and create
  shared memory handles
- Add vhsaKmtRegisterSharedHandle() to register shared memory
  handles in the current process
- Add vhsaKmtRegisterSharedHandleToNodes() for node-specific
  shared memory registration

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add memory management APIs for virtio

Add the following new memory management APIs to virtio implementation:
- vhsaKmtSetMemoryUserData: Set user data for memory pointer
- vhsaKmtSetMemoryPolicy: Configure memory policy for nodes
- vhsaKmtSVMGetAttr: Get SVM (Shared Virtual Memory) attributes
- vhsaKmtSVMSetAttr: Set SVM attributes
- vhsaKmtReplaceAsanHeaderPage: ASAN header page replacement (stub)
- vhsaKmtReturnAsanHeaderPage: ASAN header page return (stub)

Changes include:
- Added API declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Extended protocol definitions in hsakmt_virtio_proto.h
- Added user_data field to vhsakmt_bo structure
- Exported new symbols in libhsakmt_virtio.ver

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add SPM APIs

Add three new SPM-related APIs to the virtio interface:
- vhsaKmtSPMAcquire: Acquire SPM resources on a preferred node
- vhsaKmtSPMRelease: Release SPM resources on a preferred node
- vhsaKmtSPMSetDestBuffer: Set destination buffer for SPM data with
  optional userptr support and data loss detection

These APIs extend the virtio command protocol with new query types:
- VHSAKMT_CCMD_QUERY_SPM_ACQUIRE
- VHSAKMT_CCMD_QUERY_SPM_RELEASE
- VHSAKMT_CCMD_QUERY_SPM_SET_DST_BUFFER

The implementation includes proper buffer management for both
direct BO access and userptr fallback for smaller buffers.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add virtio stub for hsaKmtAisReadWriteFile API

Add vhsaKmtAisReadWriteFile stub implementation for the virtio backend
to support AIS (Accelerated I/O Service) file read/write operations.
This stub currently returns HSAKMT_STATUS_NOT_IMPLEMENTED.

Changes include:
- Add vhsaKmtAisReadWriteFile declaration in hsakmt_virtio.h
- Add stub implementation in hsakmt_virtio_memory.c
- Export the symbol in libhsakmt_virtio.ver

Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>

* libhsakmt/virtio: Add vamdgpu_bo_query_info and vamdgpu_bo_set_metadata APIs

Implement two new virtio wrapper functions for AMDGPU buffer object operations:

1. vamdgpu_bo_query_info: Query buffer object information including
   allocation parameters, memory usage, and metadata.

2. vamdgpu_bo_set_metadata: Set metadata for a buffer object, allowing
   applications to attach custom data to GPU memory allocations.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add ProcessVMRead/Write stub implementations for virtio

Add vhsaKmtProcessVMRead and vhsaKmtProcessVMWrite stub functions
to the virtio interface. These APIs return HSAKMT_STATUS_NOT_IMPLEMENTED
since they are not supported in the baremetal implementation, matching
the behavior of the deprecated hsaKmtProcessVMRead/Write APIs.

Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>

---------

Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
Co-authored-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
2026-01-09 18:18:53 -08:00
vedithal-amd c5bfb37289 Improve documentation for standalone binary creation (#2446)
* Add cmake based instructions to create standalone binary

* Specify standalone binary extraction path in doc.

* Add documentation to explain how to specify self-extraction path
  when building the standalone binary where contents of the binary
  are extracted during execution

* Pin Nuitka to version 2.6 for consistency in building standalone binary
2026-01-09 17:40:47 -05:00
vedithal-amd f073f1adf2 Fix test for data imputation for iteration mulitplexing (#2564) 2026-01-09 12:42:14 -05:00
Xie, AlexBin 5279150964 SWDEV-574457 - hiptest vm fault (#2497) 2026-01-09 12:39:12 -05:00
Jason Bonnell 788bcdddd0 Update SDK Dockerfile.ci for Ubuntu (#2539)
* Add verbose output for submods step

* Remove git config setting

* Determine git version

* Try different git install

* Update Dockerfile.ci

* Revert git location in Ubuntu jobs

* Update RHEL and SLES sections to use 2.52 as well

* Add git --version to each step, fix typo in SLES Docker
2026-01-09 12:10:36 -05:00
AidanBeltonS 3309d7176b SWDEV-557148 - Set primary context when device set (#1161)
* SWDEV-557148 - Set activate context when device set

* clang-format

* Check for active status

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-09 17:09:40 +00:00
Sajina PK b3f59a37e4 [Rocprofiler-system]: Fix GPU event enumeration for rocprof-sys-avail and CLI option for parsing GPU HW Counters (#2476)
## Motivation

The `rocprof-sys-avail -H -c GPU` command is returning blank output which is expected to display a list of available GPU hardware counters instead.
The `rocprof-sys-sample` and `rocprof-sys-run` is missing the `--gpu-events` option for specifying GPU counter events during profiling.

## Technical Details

The initialize_event_info() function had a logic bug where it only called set_agents() if the agent_manager was empty, but the actual issue was that the gpu_agents and cpu_agents vectors were empty even when agents were discovered.
Fixed the conditional logic to properly call set_agents() when gpu_agents and cpu_agents are empty, regardless of the agent_manager state.

Added the `--gpu-events (-G)` option which sets the `ROCPROFSYS_ROCM_EVENTS` environment variable to the specified values.

Fixes an issue where unsupported GPU/APU arch is being skipped gracefully - more details about this issue in the below comment.
2026-01-09 11:59:45 -05:00
vedithal-amd ebe22b5907 Add pre-processor guards for rocflop (#2534) 2026-01-09 09:06:52 -05:00
vedithal-amd d65de0a203 Performance optimization of analysis database (#2557)
* Replace O(n^2²) nested loop with O(1) dictionary lookup when associating
metric values with metrics. Pre-group values by (metric_id, kernel_name)
to eliminate redundant iteration over entire values dataframe for each
metric-kernel combination.

* This optimization significantly improves database write performance for
workloads with large numbers of metrics and kernels.
2026-01-09 09:06:33 -05:00
vedithal-amd 51ba3c3a53 [rocprofiler-compute] Standalone roofline should create HTML instead of PDF (#2535)
* Standalone roofline should create HTML instead of PDF

* Eiminate the dependency on kaleido and plotly_get_chrome by moving
  towards plotly native HTML image roofline chart generation

* Address review comments
2026-01-09 09:05:49 -05:00
Apurv Mishra be375c2dbf rocr: Add support for Mipmapped Array (#1847)
SWDEV-539526 - Add support for Mipmapped Array in Rocr

Add support for Mipmapped Array functionality in Rocr Runtimeenabling GPU applications to work with multi-level texture mipmaps. The implementation introduces new public APIs for creating, querying, and managing mipmapped arrays across different GPU architectures.

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
Co-authored-by: taosang2 <tao.sang@amd.com>
2026-01-08 17:14:39 -06:00
Mario Limonciello 8b529e7b29 Run pre-commit's whitespace related hooks on projects/rocr-runtime/samples (#2126)
In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2026-01-08 15:36:57 -05:00
cfallows-amd ae1abe4254 [rocprofiler-compute] Update .config_hashes.json (#2530)
config_hashes json had mismatched md5s for the delta_hash values, regenerated the file with the existing files in develop branch.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2026-01-08 14:33:36 -05:00
Yiannis Papadopoulos e8fef02e5a rocr/aie: Use util/os to get system memory (#2520) 2026-01-08 12:40:22 -06:00
Yazen AL Musaffar d8a914d8cc comment update for wrong units associated with RDC (#2299)
* comment update for wrong units associted with RDC_FI_GPU_MEMORY_CUR_BANDWIDTH

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* Update rdc.h

---------

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
2026-01-08 12:14:51 -06:00
vedithal-amd 769d3dd67a [rocprofiler-compute] Data imputation strategy for iteration multiplexing (#2468)
* Data imputation strategy for iteration multiplexing

* Implement data imputation methodology to handle missing counter values
  in case of iteration multiplexing

* Enable dispatch filtering with iteration multiplexing since we are no
  longer merging dispatches

* Bugfix to prevent check for missing counter values when using csv
  format when profiling with iteration multiplexing

* Move warning and info message in case of iteration multiplexing to
  sanitize function which comes earlier in analyze mode

* Address review comments

* Fix typo in documentation

* Move profiling config init. after path check in sanitize()

* Graceful handling of dispatches with all counters empty within data
  imputation logic

* Improve info message for iteration multiplexing based analysis

* Ensure proper error message when trying to run iteration multiplexing with attach/detach

* fix test case
2026-01-08 12:01:51 -05:00
systems-assistant[bot] 53c56fca5f [SWDEV-558534] AMD-SMI bad pages add flag to convert to hex (#1900)
* Simplify hex flag check for bad page info
* moved the hex help text up with the other help text

---------

Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Authored-by: Koushik Billakanti <Koushik.Billakanti@amd.com>
Co-authored-by: Koushik Billakanti <Koushik.Billakanti@amd.com>
2026-01-08 10:21:10 -06:00
Bindhiya Kanangot Balakrishnan 8326c33d33 [SWDEV-573540] Add DRM-based wake for suspended AMD GPUs (#2510)
Implements automatic device wake using getDRMDeviceId() DRM call when GPUs
are detected in low-power state. This ensures rocm-smi can access device
information on suspended GPUs.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2026-01-08 10:19:45 -06:00
koushikbillakanti-amd ac1fa8dccb [SWDEV-567284] AMDSMI conceptual documentation for setting perf determinism (#2529)
Authored-by: Koushik Billakanti <kbillaka@amd.com>
2026-01-08 08:04:23 -06:00
Alexandra Sidorova 38a359f5f3 [CLR] prevent compilation errors for non-HIP compilers in amd_hip_mx_common.h and amd_hip_ocp_types.h (#2448)
Co-authored-by: Andrei Kochin <andrei.kochin@amd.com>
2026-01-08 17:49:13 +04:00
SaleelK 6b28faa532 clr: Implement per-stream SDMA engine affinity for improved copy performance (#2480)
Problem:
The existing SDMA engine selection logic had several issues:
1. Same VirtualGPU/stream could use different SDMA engines for consecutive
   async copies since copy_engine_status may report engines as busy
2. Busy and Preferred engine check for every copy
3. No global tracking of which VirtualGPU uses which engine, leading to
   suboptimal resource allocation

Solution:
Implemented a global SDMA engine allocator with per-stream affinity:

- Added Device::SdmaEngineAllocator to manage VirtualGPU → engine assignments
  * Maintains global map of active assignments
  * Enforces exclusivity: different streams use different engines (except
    inter-GPU copies where preferred engines are prioritized for optimal
    hardware paths like XGMI links)
  * Thread-safe allocation/release with Monitor lock

- Modified VirtualGPU to cache assigned engine locally (assigned_sdma_engine_)
  for fast lookup without map access on hot path

- Refactored rocrCopyBuffer() to:
  1. Check local cached engine first → use if assigned
  2. Call AllocateSdmaEngine() if not assigned → cache result

- Moved HSA API queries (memory_copy_engine_status, memory_get_preferred_copy_engine)
  into AllocateEngine() for cleaner separation of concerns

- Engine release on HostQueue::finish() instead of only VirtualGPU destruction
  * Improves engine utilization by releasing earlier
  * Added virtual ReleaseSdmaEngines() method to device::VirtualDevice

- Added future path for simple round-robin allocation (kUseSimpleRR) for
  next-gen GPUs with uniform SDMA bandwidth (disabled by default)

Cleanup:
- Removed selectSdmaEngine() helper (logic moved to allocator)
- Removed getSdmaRWMasks() (allocator accesses maxSdmaReadMask_/WriteMask_ directly)
- Removed unused sdmaEngineReadMask_/WriteMask_ member variables from DmaBlitManager

Benefits:
- Ensures consistent per-stream SDMA engine usage
- Prevents cross-stream contention and engine thrashing
- Prioritizes hardware-optimal paths for inter-GPU transfers
- Better resource utilization through earlier release
- Cleaner, more maintainable code structure
2026-01-07 19:37:45 -08:00
Flora Cui be04fa8250 rocr: reorder HsaNodeProperties to improve compatibility (#2447)
Signed-off-by: Flora Cui <flora.cui@amd.com>
2026-01-08 09:56:39 +08:00
David Galiffi cb17e59a57 [rocprofiler-systems] Improve build time by refactoring RCCL test cmake (#1656)
Improve cmake configuration time by making sure the rccl-tests are built during the build phase rather than the configuration phase.
2026-01-07 19:51:54 -05:00
anujshuk-amd c35a7dd8cb [rocprofiler-systems] Update timemory submodule (#2440)
- Fixes SWDEV-559349 
- Fix build failure caused by correct libunwind not being found in some environments.
- Updated the `timemory` submodule to commit `24407d37ab85c46ba6c18fba9498320f825ee4e4 `.
2026-01-07 19:35:23 -05:00
Ajay GunaShekar 95ab459a4c Use static catch2.lib instead of catch2.dll (#2419)
* Use static catch2.lib instead of catch2.dll

Using catch2.dll incraeses execution time by 12x

* handle debug option for static catch2

* SWDEV-573539 - skip atomics on windows since its taking a very long time to execute

mlsejenkins needs newer cmake but compiler breaks with newer versions
so skipping on windows can be a workaround for now

---------

Co-authored-by: Joseph Macaranas <145489236+jayhawk-commits@users.noreply.github.com>
2026-01-07 14:35:25 -08:00
Alysa Liu 5be4fddf06 kfdtest: Support blit kernel copy (#677)
Add support for blit kernel copy.
Add GpuMemCopyTest test for KFDQMTest.
2026-01-07 16:48:11 -05:00