* byteswap<T> returns by value
* replace hand-rolled implementations with Clang __builtin_bswap<N> intrinsics
* new high-level interface endian::to_be, endian::from_be, etc. to indicate conversion direction
[ROCm/rocshmem commit: cf8b72a047]
* byteswap<T> returns by value
* replace hand-rolled implementations with Clang __builtin_bswap<N> intrinsics
* new high-level interface endian::to_be, endian::from_be, etc. to indicate conversion direction
* attach: Formalize ROCAttach API
- Make ROCAttach public with public headers
- Change detach to take a PID
- attach and detach are now reentrant
- Cleanup of states and signal handling in ptrace session
- Fixes mixed up definition of ROCPROF_ATTACH_TOOL_LIBRARY
- ROCPROF_ATTACH_TOOL_LIBRARY now always means the tool library loaded by the attachment target
- ROCPROF_ATTACH_LIBRARY refers to the library used to perform attachment
- Add direct call of rocprof-attach
- Fix python library call of rocprof-attach
- Function now named attach(), changed from main()
* attach: rocprof-compute ROCAttach updates
- Update to new library names
- Correct usage of C lib detach
* attach: add test for rocattach
- Disable ASan, TSan, and UBSan for the new parallel-attach test
- Lower log level for LSan tests, existing behavior from other tests
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Motivation
We wish to avoid triggering full Jenkins runs for docs-only PRs, as this takes up testing resources and slows development time. rocm_ci_caller.yml already excludes some docs-only changes, but this can be improved to exclude them along more paths.
Technical Details
The checks that rocm_ci_caller.yml uses to determine if a changed file in a PR is worth a Jenkins run has been increased to exclude more paths and more file suffixes.
JIRA ID
AIROCDOC-78, AIROCDOC-424
Test Plan
Created a test branch users/dsclear/shorten_workflows_test_root with the changes in this PR, branched from develop.
Branched users/dsclear/shorten_workflows_test_bin_3 and users/dsclear/shorten_workflows_test_text_3 from users/dsclear/shorten_workflows_test_root.
Modified users/dsclear/shorten_workflows_test_bin_3 to add two .h files, and submitted a PR into users/dsclear/shorten_workflows_test_root (Test PR, do not merge. Test PR to test Jenkins CI/CD modifications. #2613).
Modified users/dsclear/shorten_workflows_test_text_3 to add a new .txt file, and submitted a PR into users/dsclear/shorten_workflows_test_root (Test PR, do not merge. Test PR to test Jenkins CI/CD modifications (docs only). #2614).
Test Result
The test PR in step 3 caused rocm_ci_caller.yml to attempt to trigger Jenkins, as this is a 'non-docs' change.
The test PR in step 4 had the attempt to trigger Jenkins skipped, as this is a 'docs-only' change.
* Fix the amdgpu version string comparison
The intention behind it was to avoid showing the string if it's not
got information.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
* Display the kernel version in amd-smi output
This is an interesting debugging point, especially in the case of
not having a DKMS package installed.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Moving os_kernel_version to static --driver
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
---------
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
Increased threshold from 2100 μs to 3100 µs to accommodate
gpu_metric read time variation across Navi systems.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Problem:
When TheRock-based PyTorch package is installed along with amdsmi, importing
torch causes a double-free crash on exit (GitHub issue ROCm/TheRock#2269).
Root cause:
Both librocm_smi64.so and libamd_smi.so export the C++ static member
'amd::smi::Device::devInfoTypesStrings'. When libraries are loaded with
RTLD_GLOBAL, the dynamic linker resolves libamd_smi.so's reference to this
symbol to the one in librocm_smi64.so. This causes:
1. librocm_smi64.so registers its destructor for devInfoTypesStrings
2. libamd_smi.so also registers a destructor, but for the SAME address
3. On exit, both destructors run on the same object -> double-free
Fix:
Change devInfoTypesStrings from a class static member to a file-local static
variable. This ensures the symbol has internal linkage and is not exported,
preventing the symbol collision.
Changes:
- rocm_smi_device.h: Remove static member declaration
- rocm_smi_device.cc: Change from 'Device::devInfoTypesStrings' to file-local
'static const std::map<...> devInfoTypesStrings'
- rocm_smi.cc: Remove the global alias to the (now removed) class member
Tested on gfx1151. `import torch` crashed on exit before the fix, and doesn't crash after the fix.
* Add hipDeviceAttributeExpertSchedMode
---------
Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
* Update hipDeviceAttributeExpertSchedMode unit test
* Move check to ROCr from thunk interface
* Revert unrelated whitespace changes
* Revert version bump
---------
Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
## Motivation
- Structured logging with proper log levels (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Better performance through compile-time formatting
- Consistent formatting using fmt library
- Runtime log level control via arguments and environment variables
- Easier maintenance and debugging capabilities
## Technical Details
- Added spdlog as a submodule and integrated it into CMake build system
- Created new `rocprofiler-systems-logger` library wrapping spdlog functionality
- Replaced custom logging macros (`ROCPROFSYS_VERBOSE`, `ROCPROFSYS_DEBUG`, `ROCPROFSYS_FATAL`, `ROCPROFSYS_REQUIRE`, `ROCPROFSYS_CI_THROW`, etc.) with spdlog equivalents (`LOG_DEBUG`, `LOG_WARNING`, `LOG_CRITICAL`, etc.)
- Implemented log level control through command-line arguments and environment variables
- Converted assertion macros to proper error handling with exceptions and std::abort()
* [SWDEV-553434] No direct way to get the BASEBOARD temperature info. Need to iterate all gpus
Signed-off-by: amd-josnarlo <josnarlo.amd.com>
---------
Signed-off-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
Add explicit test cases to verify ROCr functionality with interrupts
disabled (HSA_ENABLE_INTERRUPT=0). This ensures compatibility with
virtio, dtif, and WSL configurations which require interrupt-disabled
mode.
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
* rocr: Add ProtectMemory API and use it in RemoveAccess
Replace munmap + mmap with mprotect when removing memory access.
This improves performance by 5-10x, ensures atomicity (no race
condition window), and prepares for WSL/DXG compatibility fixes.
Suggested-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
* rocr: Skip CPU mapping operations on WSL
On WSL, CPU cannot access GPU VRAM due to platform restrictions.
CPU access would fault-in system RAM instead, causing data corruption
and memory leaks. Return HSA_STATUS_ERROR to fail fast rather than
silently creating broken mappings. GPU-to-GPU mappings remain functional.
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
* rocr: reduce ifdef linux
v2: Fix IsDXG check logic
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
---------
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
* Adding working single node tests
* Revert to old docker sha
* adding back no perf tests
---------
Co-authored-by: Aravind Ravikumar <arravikum@amd.com>
[ROCm/rccl commit: 4b295c9893]
* Adding working single node tests
* Revert to old docker sha
* adding back no perf tests
---------
Co-authored-by: Aravind Ravikumar <arravikum@amd.com>
AMD_LOG_LEVEL_SIZE is being used in a global variable.
This always uses the default value of 2048 because the
HIP runtime doesn't have the opportunity to load
environment variables at the point where global variables
are initialized.
The solution is to use AMD_LOG_LEVEL_SIZE inside
truncate_log_file() function.
## Motivation
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
Remove Fortran example due to Palamida scan violation.
## Technical Details
<!-- Explain the changes along with any relevant GitHub links. -->
Revert 63713f01e0.
New test to be added later.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Sets heavy GitHub CI workflows to not trigger on docs-only changes.
Specifically, sets azure-ci-dispatcher.yml and therock-ci.yml, as well as many rocprofiler workflows, to not trigger when the change consists entirely of docs-only files.
* Fix typo in matrix definition for aqlprofile-continuous_integration.yml
* Update ROCM_VERSION to 7.1.1
* Minor changes to core-rpm step
* Add working-directory to test steps
* Revert changes
* Add set -v to rpm test step
* Remove Python venv line from rpm test step
* [rocprofiler-sdk] Fix fmt::join build errors
- remedy use of fmt::join without include <fmt/ranges.h>
* include memory header
* Disable FMT build for SDK CI
* Add -DROCPROFILER_BUILD_FMT=OFF to sanitizer steps
* Add temporary workaround for rccl.h issue
* Add ROCPROFILER_INTERNAL_RCCL_API_TRACE to SDK CI builds
* disable clang-tidy for vendored includes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
* Update what is hip
* Update HIP runtime page
* Update images
* Remove omnitrace
* Quick fix
* Feedback fixes
* Minor fixes
* Update SAXPY tutorial
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
---------
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
Co-authored-by: Adel Johar <adel.johar@amd.com>
Co-authored-by: Jan Stephan <jan.stephan@amd.com>
* libhsakmt/virtio: Add alloc memory align api
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Rename CLGL BO to AMDGPU BO
Rename VHSA_BO_CLGL to VHSA_BO_AMDGPU to support generic AMDGPU buffer objects, not just CL/GL interop.
* libhsakmt/virtio: Add atomic helpers and node lookup
Add vhsakmt_atomic_inc/dec macros and vhsakmt_get_node_by_id helper function.
* libhsakmt/virtio: Add AMDGPU device initialization support
Add vamdgpu_device_initialize and vamdgpu_device_deinitialize functions.
* libhsakmt/virtio: Add AMDGPU device handle and DRM command support
Add vamdgpu_device_get_fd, vdrmCommandWriteRead and update vhsaKmtGetAMDGPUDeviceHandle.
* libhsakmt/virtio: Add AMDGPU BO free and CPU map support
Add vamdgpu_bo_free and vamdgpu_bo_cpu_map functions.
* libhsakmt/virtio: Add AMDGPU BO import and export support
Add vamdgpu_bo_import, vamdgpu_bo_export and vhsakmt_bo_from_resid functions.
* libhsakmt/virtio: Add AMDGPU BO VA operation support
Add vamdgpu_bo_va_op function.
* libhsakmt/virtio: Add dma buf export support
Add vhsaKmtExportDMABufHandle API in virtio driver to support export
feature.
* libhsakmt/virtio: Fix potential deadlock in userptr deregistration
Refactor vhsakmt_deregister_userptr_non_svm to avoid calling
vhsakmt_destroy_userptr while holding the bo_handles_mutex lock.
Previously, destroying userptrs directly while iterating the tree
could cause deadlock issues due to nested locking.
- Move interval tree removal from vhsakmt_destroy_userptr to caller
- Collect BOs to free in a temporary array during tree traversal
- Destroy BOs after releasing the mutex to avoid lock contention
- Use dynamic array with realloc to handle arbitrary number of BOs
Signed-off-by: Honglei Huang <honghuan@amd.com>
* rocr: driver/virtio: Implement DMA-BUF import/export and memory mapping APIs
Implement the missing DMA-BUF handling and memory mapping functions
in the virtio KFD driver to enable cross-process memory sharing:
- ExportDMABuf: Export HSA memory as DMA-BUF file descriptor
- ImportDMABuf: Import DMA-BUF fd as shareable buffer object
- Map: Map imported buffer into virtual address space with permissions
- Unmap: Unmap buffer from virtual address space
- ReleaseShareableHandle: Free imported buffer object
Also add drm_perm() helper to convert HSA access permissions to
AMDGPU VM page flags (READABLE/WRITEABLE).
These APIs enable IPC memory sharing between HSA processes through
DMA-BUF mechanism in virtualized environments.
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add register memory APIs
Add two new memory registration functions to the virtio HSA KMT library:
1. vhsaKmtRegisterMemory: A simplified wrapper for vhsaKmtRegisterMemoryWithFlags
that uses default CoarseGrain memory flags.
2. vhsaKmtRegisterMemoryToNodes: A stub implementation for registering memory
to specific nodes. Returns HSAKMT_STATUS_NOT_IMPLEMENTED as it's currently
not used in ROCR.
Changes:
- Added function declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Exported symbols in libhsakmt_virtio.ver
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add graphics handle registration and mapping APIs
- Add vhsaKmtRegisterGraphicsHandleToNodesExt() with flags support
- Add vhsaKmtMapGraphicHandle() and vhsaKmtUnmapGraphicHandle() stubs
- Refactor existing registration API to use extended version
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add virtio support for queue APIs
Implement vhsaKmtUpdateQueue, vhsaKmtSetQueueCUMask,
vhsaKmtAllocQueueGWS and vhsaKmtGetQueueInfo functions
with virtio protocol extensions and symbol exports.
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add new virtio API support for model, SMI, and XNACK mode
Add three new API functions to the virtio backend:
- vhsaKmtModelEnabled: Check if pre-silicon model is enabled (returns false for virtio)
- vhsaKmtOpenSMI: Open SMI interface for a node (not yet supported in virtio)
- vhsaKmtSetXNACKMode: Set XNACK mode via virtio control command
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add shared memory support for virtio backend
Implement shared memory APIs for the virtio backend to enable
memory sharing between processes:
- Add vhsaKmtShareMemory() to share memory regions and create
shared memory handles
- Add vhsaKmtRegisterSharedHandle() to register shared memory
handles in the current process
- Add vhsaKmtRegisterSharedHandleToNodes() for node-specific
shared memory registration
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add memory management APIs for virtio
Add the following new memory management APIs to virtio implementation:
- vhsaKmtSetMemoryUserData: Set user data for memory pointer
- vhsaKmtSetMemoryPolicy: Configure memory policy for nodes
- vhsaKmtSVMGetAttr: Get SVM (Shared Virtual Memory) attributes
- vhsaKmtSVMSetAttr: Set SVM attributes
- vhsaKmtReplaceAsanHeaderPage: ASAN header page replacement (stub)
- vhsaKmtReturnAsanHeaderPage: ASAN header page return (stub)
Changes include:
- Added API declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Extended protocol definitions in hsakmt_virtio_proto.h
- Added user_data field to vhsakmt_bo structure
- Exported new symbols in libhsakmt_virtio.ver
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add SPM APIs
Add three new SPM-related APIs to the virtio interface:
- vhsaKmtSPMAcquire: Acquire SPM resources on a preferred node
- vhsaKmtSPMRelease: Release SPM resources on a preferred node
- vhsaKmtSPMSetDestBuffer: Set destination buffer for SPM data with
optional userptr support and data loss detection
These APIs extend the virtio command protocol with new query types:
- VHSAKMT_CCMD_QUERY_SPM_ACQUIRE
- VHSAKMT_CCMD_QUERY_SPM_RELEASE
- VHSAKMT_CCMD_QUERY_SPM_SET_DST_BUFFER
The implementation includes proper buffer management for both
direct BO access and userptr fallback for smaller buffers.
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add virtio stub for hsaKmtAisReadWriteFile API
Add vhsaKmtAisReadWriteFile stub implementation for the virtio backend
to support AIS (Accelerated I/O Service) file read/write operations.
This stub currently returns HSAKMT_STATUS_NOT_IMPLEMENTED.
Changes include:
- Add vhsaKmtAisReadWriteFile declaration in hsakmt_virtio.h
- Add stub implementation in hsakmt_virtio_memory.c
- Export the symbol in libhsakmt_virtio.ver
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
* libhsakmt/virtio: Add vamdgpu_bo_query_info and vamdgpu_bo_set_metadata APIs
Implement two new virtio wrapper functions for AMDGPU buffer object operations:
1. vamdgpu_bo_query_info: Query buffer object information including
allocation parameters, memory usage, and metadata.
2. vamdgpu_bo_set_metadata: Set metadata for a buffer object, allowing
applications to attach custom data to GPU memory allocations.
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add ProcessVMRead/Write stub implementations for virtio
Add vhsaKmtProcessVMRead and vhsaKmtProcessVMWrite stub functions
to the virtio interface. These APIs return HSAKMT_STATUS_NOT_IMPLEMENTED
since they are not supported in the baremetal implementation, matching
the behavior of the deprecated hsaKmtProcessVMRead/Write APIs.
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
---------
Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
Co-authored-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
* Add cmake based instructions to create standalone binary
* Specify standalone binary extraction path in doc.
* Add documentation to explain how to specify self-extraction path
when building the standalone binary where contents of the binary
are extracted during execution
* Pin Nuitka to version 2.6 for consistency in building standalone binary
* ROCSHMEM linking/building to match MSCCL++ style
* add rocSHMEM as a submodule
* Move rocSHMEM submodule to ext-src/rocSHMEM
* Adding submodule support proper, as well as a patch for rocshmem
* Cleaning up INCLUDE_DIR vs INCLUDE_DIRS mixup
* updating patch file
* Pointing rocshmem submodule to edgars fixup patch
* Adding IBVERBS link to the submodule build
* More IBVERBS patching
* pin rocshmem submodule to b534423
* Adding IPC support in rocSHMEM build
* updating rocshmem submodule to resolve CQ errors
* Updating submodule to include recent a2a optimizations
* invoke rocshmem alltoall from rccl
* Updating submodule to CQ error number hang
* Updating submodule to include a2a improvements and bug fixes
* Updating submodule to point to Yiltan's fork and doorbell ring removal commit
* Updating hash to correspond with submodule change
* Updating to no-ctx wg call and updating submodule
* copy-in/copy-out using multiples CUs
* Updating rocSHMEM submodule to include doorbell improvs
* updating gitmodule to point to upstream
* code cleanup and adjust threashold
* guard rocshmem a2a invocation
* Only build with rocshmem when specified
* code cleanup
* address review comments
* Removing debugging failure case
Signed-off-by: Thomas Huber <thomas.huber@amd.com>
* whitespace fix
* Adding rocshmem compile guard
* Removing unneccesary comment
Signed-off-by: Thomas Huber <thomas.huber@amd.com>
* remove commented lines
* address review comments
* cleanup
---------
Signed-off-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k12-27.cs-aus.dcgpu>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-03.cs-aus.dcgpu>
[ROCm/rccl commit: 27648b0900]
* ROCSHMEM linking/building to match MSCCL++ style
* add rocSHMEM as a submodule
* Move rocSHMEM submodule to ext-src/rocSHMEM
* Adding submodule support proper, as well as a patch for rocshmem
* Cleaning up INCLUDE_DIR vs INCLUDE_DIRS mixup
* updating patch file
* Pointing rocshmem submodule to edgars fixup patch
* Adding IBVERBS link to the submodule build
* More IBVERBS patching
* pin rocshmem submodule to b534423
* Adding IPC support in rocSHMEM build
* updating rocshmem submodule to resolve CQ errors
* Updating submodule to include recent a2a optimizations
* invoke rocshmem alltoall from rccl
* Updating submodule to CQ error number hang
* Updating submodule to include a2a improvements and bug fixes
* Updating submodule to point to Yiltan's fork and doorbell ring removal commit
* Updating hash to correspond with submodule change
* Updating to no-ctx wg call and updating submodule
* copy-in/copy-out using multiples CUs
* Updating rocSHMEM submodule to include doorbell improvs
* updating gitmodule to point to upstream
* code cleanup and adjust threashold
* guard rocshmem a2a invocation
* Only build with rocshmem when specified
* code cleanup
* address review comments
* Removing debugging failure case
Signed-off-by: Thomas Huber <thomas.huber@amd.com>
* whitespace fix
* Adding rocshmem compile guard
* Removing unneccesary comment
Signed-off-by: Thomas Huber <thomas.huber@amd.com>
* remove commented lines
* address review comments
* cleanup
---------
Signed-off-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k12-27.cs-aus.dcgpu>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-03.cs-aus.dcgpu>