- Updated python integration test to account for PPT1 support changes
- Updated set/reset power-cap input format
- Adjusted python API and updated C++ API test
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
[ROCm/amdsmi commit: 18faddf6f3]
The argparse 'choices' parameter was receiving a comma-separated string
instead of a list, causing it to treat individual characters as valid
choices rather than complete tokens like 'SPX', 'DPX', etc.
Fixed by removing the unnecessary join() operation in
get_accelerator_choices_types_indices() to return the list directly.
This matches the pattern used by get_memory_partition_types().
Now 'amd-smi set -C DPX' and other partition commands work correctly.
[ROCm/amdsmi commit: f6b1cb9024]
* Changes:
- Modified attempting to open files to check
permissions -> check read access only.
Do not try to open all paths, may cause driver issues.
Read access is sufficient to check permissions.
Reason: GPUs which support partitioning (memory/compute),
logical devices will not be valid until configured.
See `sudo amd-smi set -h` or applicable APIs
to configure on supported hardware.
Example error dmesg output:
[965358.883112] amdgpu 0000:15:00.0: amdgpu: renderD153 partition 1 not valid!
[965358.883283] amdgpu 0000:15:00.0: amdgpu: renderD154 partition 2 not valid!
[965358.883438] amdgpu 0000:15:00.0: amdgpu: renderD155 partition 3 not valid!
[965358.883594] amdgpu 0000:15:00.0: amdgpu: renderD156 partition 4 not valid!
[965358.883749] amdgpu 0000:15:00.0: amdgpu: renderD157 partition 5 not valid!
[965358.883904] amdgpu 0000:15:00.0: amdgpu: renderD158 partition 6 not valid!
[965358.884060] amdgpu 0000:15:00.0: amdgpu: renderD159 partition 7 not valid!
---------
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: d73726698b]
Allow amdsmi to find libamd_smi.so and librocm-core.so relative to
amdsmi_wrapper.py location.
The amdsmi_wrapper.py file is located in
_rocm_sdk_core/share/amd_smi/amdsmi and the libraries are in
_rocm_sdk_core/lib/libamd_smi.so.26.
_rocm_sdk_core/lib/librocm-core.so.1.
[ROCm/amdsmi commit: ad20d57162]
The out of bound writes caused corruption in next field,
which was weight. Fixed by reading to a temp and then assigning
safely.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: a2aae5e8a9]
* Added amd-smi set --pcie command
* Removed current pcie level due to it not being static
* Added pcie information to static --bus
---------
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 9e3537d778]
- **Added evicted_time metric for kfd processes**.
- Time that queues are evicted on a GPU in milliseconds
- Added to CLI in `amd-smi monitor -q` and `amd-smi process`
- Added to C API and Python API:
- amdsmi_get_gpu_process_list()
- amdsmi_get_gpu_compute_process_info()
- amdsmi_get_gpu_compute_process_info_by_pid()
---------
Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
[ROCm/amdsmi commit: 2144cfbba4]
- `amdgpu-install` is no longer recommended. Link to separate driver
installation docs.
- add verify step
- update readme
- add package info
Signed-off-by: Park, Peter <Peter.Park@amd.com>
[ROCm/amdsmi commit: 12fb58c30b]
* Updates:
- [ASAN] GCC does not support `-shared-libsan flags`, so removed this one
- [Clang] Fixed refernces to local binding errors (name collision)
& other strict scope/structure/lamda binding errors
- [Clang] Fix rsmi_wrapper error: \"error: missing default argument on parameter \'args\'\"
- [ASAN] Fixed stack-buffer-overflow found in
`amdsmi_get_gpu_accelerator_partition_profile()`
Change-Id: I854007efb75d828dbb8088c0d56dbc125081f0f2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 00a04f5810]
* Solved error: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/argcomplete/bash_completion.d/python-argcomplete.sh'
[ROCm/amdsmi commit: 013d6cb511]
* [SWDEV-542718] Correct socket_affinity
Updated Socket affinity to show bitmask and expanded cpu list.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Update per-device local_cpulist for socket_affinity
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Added amdsmi_get_cpu_affinity_from_local_cpulist API.
Updated the wrapper.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Revert "Added amdsmi_get_cpu_affinity_from_local_cpulist API."
This reverts commit 9a2ef934b1787f8aa09d3e4efe02f897b4295215.
* Moved the changes to C API.
In case of SOCKET_SCOPE, use local_cpulist first.
If it is unavailable or not readable, fallback to
numa.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Addressed review comments
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 09a97f02ed]
* Update readme doc: amdsmi_get_afids_from_cper() input argument is only bytes, not a list of dicts each with keys “bytes” (List[int]) and “size” (int)
---------
Signed-off-by: Oosman Saeed <oossaeed@amd.com>
[ROCm/amdsmi commit: f7c9fe3011]
* Clarified comments regarding power limit retrieval and its support on virtualized systems.
* Change unsupported comment to UINT32_MAX
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 460cfcba1f]
* Updated groups printing
* added parameters to check_required_groups
* two device groups since kfd and render require the same group
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: ee1445e2cc]
Added `GPU LINK PORT STATUS` table to `amd-smi xgmi` command
The `amd-smi xgmi -s` or `amd-smi xgmi --source-status` will show `GPU LINK PORT STATUS` table.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 7ddd91653e]
Added the following API's to amdsmi_interface.py.
amdsmi_get_cpu_handle()
amdsmi_get_esmi_err_msg()
amdsmi_get_gpu_event_notification()
amdsmi_get_processor_count_from_handles()
amdsmi_get_processor_handles_by_type()
amdsmi_gpu_validate_ras_eeprom()
amdsmi_init_gpu_event_notification()
amdsmi_set_gpu_event_notification_mask()
amdsmi_stop_gpu_event_notification()
amdsmi_get_gpu_busy_percent()
Added additional return value to API amdsmi_get_xgmi_plpd().
The entry policies is added to the end of the dictionary to match API definition.
The entry plpds is marked for deprecation as it has the same information as policies.
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 7decbc67a1]
* Used KFD to determine linking between GPUs and PIDs rather than depend on fdinfo's per pid single gpu bdf info that we were getting.
Signed-off-by: adapryor <Adam.pryor@amd.com>
---------
Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: c967aead58]