Although the value is correct; there is no source of truth between
kernel and userspace. This leads to problems if the kernel has strict
restrictions (such as kernel 6.17 or earlier). The restrictions were
lifted in 6.17.9 and and 6.18, but there is no guarantee userspace is
using this.
So short term this value will be wrong. But on newer kernels the kernel
will communicate the right size and rocr-runtime will be adjusted to
use that.
Link: https://github.com/ROCm/TheRock/pull/2505
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* When writing bulk packets always invalidate packet headers, Its
possible that the CP fetcher can have multiple packets in flight. In
such cases we may end up with a malformed packet because the writes are
not complete yet CP finds a valid header.
* [hip] Docs: Overhaul HW implementation page
* Update hardware implementation and glossary
* Update programming model
* Add performance optimization
* Split into how-to and understanding
---------
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
Co-authored-by: Jan Stephan <jan.stephan@amd.com>
Co-authored-by: Julia Jiang <julia.jiang@amd.com>
* rocrtst: Updated CMakeFiles to find_package instead of hardcoded
This is to support TheROCK build environment
* rocrtst: Fix CMake to use find_package() instead of hardcoded ENV paths
Fixed CMake style issues from previos first commit's code review
* rocrtst: Fix rocrtst NUMA dependency detection to use find_package
Also added handling of missing headers
* rocrtst: Fix NUMA and hwloc detection for cross-platform builds
---------
Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
* [hip-tests] Update API coverage report generator
Updates the HIP API coverage tool. It now takes
extra arguments for the location of the catch test folder
and for the working directory. This avoids issues where the output
of the executable is dependent on the path where it is being
executed from.
Also updates CmakeLists.txt to integrate seamlessly with the
hip-tests project and avoid using commands which rely on
relative paths.
* Remove double new line
* Remove Cmake option to generate coverage
Removes Cmake option to generate coverage. Instead, explicitly removes
the gen_coverage target from all (this is already the default but
doing it explicitly prevents confusion).
* SWDEV-558848 - vmm api support for rocr on windows
* Fixes to VMM handle Map/Unmap Set/Get Access
* Fix GetShareableHandle to use pointer for shareable handle
* Update os specific map/unmap memory calls
* clang format update
* Minor syntax fixes from code review
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
* Remove TCP_TCP_LATENCY_sum counter for MI300
* Remove TCP_TCP_LATENCY_sum counter which is unsupported for MI300 per register specification
* Remove VL1 Lat metric from memory chart section (block 3) for MI 300
since it uses TCP_TCP_LATENCY_sum counter which is unsupported
* Remove references to TCP_TCP_LATENCY_sum
* Update CHANGELOG
* reword changelog
* Read the ids_flags when fetching GPU info
The ids_flags contains the flags that can help identify if a GPU
is a dGPU or an APU.
* Show correct memory pool for APUs
The kernel policy for APUs will be to choose the bigger pool of
memory (GTT or VRAM) for KFD work. Adjust the policy for the monitor
and default commands to show the right memory pool when using an APU.
* Add ls statement for debugging /opt directory file naming
* Update ROCM_VERSION from 7.0.0 to 7.1.1 in SDK CI
* Update amdgpu debian package for Ubuntu in Dockerfile.ci
* disable HIP/CLR build in codeql (#2242)
---------
Co-authored-by: Venkateshwar Reddy Kandula <Venkateshwarreddy.Kandula@amd.com>
* Change rocprofiler-sdk CMake compatibility to AnyNewerVersion
Update CMake package version compatibility from SameMinorVersion to
AnyNewerVersion to allow downstream packages (like RDC) to use newer
versions of rocprofiler-sdk without requiring exact minor version match.
This fixes compatibility issues where RDC requests 1.0.0 but finds 1.1.0.
* Update projects/rocprofiler-sdk/cmake/rocprofiler_config_install.cmake
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Change rocpd and roctx CMake compatibility to SameMajorVersion
Update COMPATIBILITY setting from SameMinorVersion to SameMajorVersion
for both rocpd and roctx packages to allow compatibility across major
version boundaries.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* SWDEV-554626 - return hipErrorInvalidDeviceFunction when we can not load module
Return correct error code when modules are empty
* Match the error codes
* Revert the error code
* Don't require powercap support
APUs don't necessarily support setting a power cap from sysfs.
Ignore failures of the file missing.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Show edge temperature in default output if hotspot is missing
APUs don't have a hotspot temperature, they have an edge though.
Use that.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Format all "power" keys as watts
There will be more power keys when APU support is added, so format
them properly.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Don't show power limit in output if it's invalid
APUs can't set power limit using power_cap1 interface. The limit
will be 0 and thus the UX looks weird in default output.
Only add the `/power_limit` if it's valid.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Unify sizes of `amdsmi_power_info_t`
Sizes are used inconsistently. This causes tools to not show
N/A when they should. Make them unified.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
These have an external source of truth.
Also drop the non-existent hipblaslt which isn't in rocm-systems.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* While reusing signals, its possible we can come across a timestamp
that can contain several signals, like when profiling a graph. Reading
timestamps from all signals can make the call severely CPU bound.
Instead cache only that signal so as to avoid the overhead for critical
path.
* Run pre-commit's whitespace related hooks on projects/rocr-runtime
In order for pre-commit to be useful, everything needs to meet a common
baseline.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Add missing semicolon which would block compilation on big endian CPUs
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>