Allow amdsmi to find libamd_smi.so and librocm-core.so relative to
amdsmi_wrapper.py location.
The amdsmi_wrapper.py file is located in
_rocm_sdk_core/share/amd_smi/amdsmi and the libraries are in
_rocm_sdk_core/lib/libamd_smi.so.26.
_rocm_sdk_core/lib/librocm-core.so.1.
[ROCm/amdsmi commit: ad20d57162]
The out of bound writes caused corruption in next field,
which was weight. Fixed by reading to a temp and then assigning
safely.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: a2aae5e8a9]
* Added amd-smi set --pcie command
* Removed current pcie level due to it not being static
* Added pcie information to static --bus
---------
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 9e3537d778]
- **Added evicted_time metric for kfd processes**.
- Time that queues are evicted on a GPU in milliseconds
- Added to CLI in `amd-smi monitor -q` and `amd-smi process`
- Added to C API and Python API:
- amdsmi_get_gpu_process_list()
- amdsmi_get_gpu_compute_process_info()
- amdsmi_get_gpu_compute_process_info_by_pid()
---------
Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
[ROCm/amdsmi commit: 2144cfbba4]
- `amdgpu-install` is no longer recommended. Link to separate driver
installation docs.
- add verify step
- update readme
- add package info
Signed-off-by: Park, Peter <Peter.Park@amd.com>
[ROCm/amdsmi commit: 12fb58c30b]
* Updates:
- [ASAN] GCC does not support `-shared-libsan flags`, so removed this one
- [Clang] Fixed refernces to local binding errors (name collision)
& other strict scope/structure/lamda binding errors
- [Clang] Fix rsmi_wrapper error: \"error: missing default argument on parameter \'args\'\"
- [ASAN] Fixed stack-buffer-overflow found in
`amdsmi_get_gpu_accelerator_partition_profile()`
Change-Id: I854007efb75d828dbb8088c0d56dbc125081f0f2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 00a04f5810]
* Solved error: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/argcomplete/bash_completion.d/python-argcomplete.sh'
[ROCm/amdsmi commit: 013d6cb511]
* [SWDEV-542718] Correct socket_affinity
Updated Socket affinity to show bitmask and expanded cpu list.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Update per-device local_cpulist for socket_affinity
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Added amdsmi_get_cpu_affinity_from_local_cpulist API.
Updated the wrapper.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Revert "Added amdsmi_get_cpu_affinity_from_local_cpulist API."
This reverts commit 9a2ef934b1787f8aa09d3e4efe02f897b4295215.
* Moved the changes to C API.
In case of SOCKET_SCOPE, use local_cpulist first.
If it is unavailable or not readable, fallback to
numa.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Addressed review comments
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 09a97f02ed]
* Update readme doc: amdsmi_get_afids_from_cper() input argument is only bytes, not a list of dicts each with keys “bytes” (List[int]) and “size” (int)
---------
Signed-off-by: Oosman Saeed <oossaeed@amd.com>
[ROCm/amdsmi commit: f7c9fe3011]
* Clarified comments regarding power limit retrieval and its support on virtualized systems.
* Change unsupported comment to UINT32_MAX
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 460cfcba1f]
* Updated groups printing
* added parameters to check_required_groups
* two device groups since kfd and render require the same group
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: ee1445e2cc]
Added `GPU LINK PORT STATUS` table to `amd-smi xgmi` command
The `amd-smi xgmi -s` or `amd-smi xgmi --source-status` will show `GPU LINK PORT STATUS` table.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 7ddd91653e]
Added the following API's to amdsmi_interface.py.
amdsmi_get_cpu_handle()
amdsmi_get_esmi_err_msg()
amdsmi_get_gpu_event_notification()
amdsmi_get_processor_count_from_handles()
amdsmi_get_processor_handles_by_type()
amdsmi_gpu_validate_ras_eeprom()
amdsmi_init_gpu_event_notification()
amdsmi_set_gpu_event_notification_mask()
amdsmi_stop_gpu_event_notification()
amdsmi_get_gpu_busy_percent()
Added additional return value to API amdsmi_get_xgmi_plpd().
The entry policies is added to the end of the dictionary to match API definition.
The entry plpds is marked for deprecation as it has the same information as policies.
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 7decbc67a1]
* Used KFD to determine linking between GPUs and PIDs rather than depend on fdinfo's per pid single gpu bdf info that we were getting.
Signed-off-by: adapryor <Adam.pryor@amd.com>
---------
Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: c967aead58]
* Added ability to format gpu_metrics v1_9
* New gpu_metrics format from the driver should allow amd-smi to parse with future compatibility guaranteed
---------
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Signed-off-by: adapryor <Adam.pryor@amd.com>
Co-authored-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: 5ef0b3c34d]
* conf: update RTD config to ub24.04 (doxygen 1.9.8) and py3.12
* update generate-docs workflow
* Update "modules" to "topics" due to Doxygen 1.9.8
* bump rocm-docs-core to 1.25.0 and pip-compile requirements.txt
* doxygen: fill in version string in Doxyfile from conf.py
* remove unneeded rocm-smi-lib tutorials
* remove wikipedia references in doxyfile to satisfy ci check
Signed-off-by: Park, Peter <Peter.Park@amd.com>
[ROCm/amdsmi commit: 311eade5b1]
This unbreaks having sources on one mount point and builds at another.
Signed-off-by: Marius Brehler <marius.brehler@amd.com>
Change-Id: I68363112382a95baaa867cad91e09bdec2b30d90
[ROCm/amdsmi commit: bd3579a1ac]
Having the SOVERSION derived from the git tags doesn't scale well
for distributions that don't have the git history while building
(such as a tarball).
As part of 8b96ee5 the strings are parsed from a header. Re-use
those.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
[ROCm/amdsmi commit: ccfdb65b6f]
Added back the temp-type map initialization to
RSMI_TEMP_TYPE_INVALID before probing hwmon files. This
prevents std::out_of_range for unsupported or absent
temperature sensor types.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 3e7e4ab1ac]