* Add common module
* Added information to help with unknowns
* Allow paring of cmds
* change cmd print default
* Reduce cmds to be tested
---------
Signed-off-by: amd-josnarlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <joseph.narlo@amd.com>
fix: Add gpu_metrics 1.0 support which is still used by some hardware
Code changes related to the following:
* APIs
* Unit tests
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Fixed incorrect error code expectation in FrequenciesRead
test when calling amdsmi_get_gpu_pci_bandwidth() with nullptr
parameter.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Run pre-commit's whitespace related hooks on projects/amdsmi
In order for pre-commit to be useful, everything needs to meet a common
baseline.
* Add whitespace back to Changelog for formatting
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Use fork/waitpid to isolate API call and detect SIGKILL from kernel
Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
[ROCm/amdsmi commit: a044536b8d]
Added check for GCC versions prior to 9.0 and
link with libstdc++fs when needed. This fixes
undefined symbols on older systems like Deb10
with GCC 8.3.0.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
[ROCm/amdsmi commit: e1b3d5f02e]
- Updated python integration test to account for PPT1 support changes
- Updated set/reset power-cap input format
- Adjusted python API and updated C++ API test
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
[ROCm/amdsmi commit: 18faddf6f3]
- **Added evicted_time metric for kfd processes**.
- Time that queues are evicted on a GPU in milliseconds
- Added to CLI in `amd-smi monitor -q` and `amd-smi process`
- Added to C API and Python API:
- amdsmi_get_gpu_process_list()
- amdsmi_get_gpu_compute_process_info()
- amdsmi_get_gpu_compute_process_info_by_pid()
---------
Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
[ROCm/amdsmi commit: 2144cfbba4]
* Updates:
- [ASAN] GCC does not support `-shared-libsan flags`, so removed this one
- [Clang] Fixed refernces to local binding errors (name collision)
& other strict scope/structure/lamda binding errors
- [Clang] Fix rsmi_wrapper error: \"error: missing default argument on parameter \'args\'\"
- [ASAN] Fixed stack-buffer-overflow found in
`amdsmi_get_gpu_accelerator_partition_profile()`
Change-Id: I854007efb75d828dbb8088c0d56dbc125081f0f2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 00a04f5810]
Added the following API's to amdsmi_interface.py.
amdsmi_get_cpu_handle()
amdsmi_get_esmi_err_msg()
amdsmi_get_gpu_event_notification()
amdsmi_get_processor_count_from_handles()
amdsmi_get_processor_handles_by_type()
amdsmi_gpu_validate_ras_eeprom()
amdsmi_init_gpu_event_notification()
amdsmi_set_gpu_event_notification_mask()
amdsmi_stop_gpu_event_notification()
amdsmi_get_gpu_busy_percent()
Added additional return value to API amdsmi_get_xgmi_plpd().
The entry policies is added to the end of the dictionary to match API definition.
The entry plpds is marked for deprecation as it has the same information as policies.
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 7decbc67a1]
- Changed amd-smi static --vbios to accept ifwi
- Change population logic for vbios version API
- Added IFWI boot_firmware to the CLI, C++, Rust, and Python API
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I4ea504d40a43cfb011ab38fc9a664ecf12d39c8a
[ROCm/amdsmi commit: cd21b5edcc]
* [SWDEV-531904] Unit and Integ Test Updates
Updated: unit_tests.py
- Removed redundant self.setUp() and self.tearDown() calls.
- Removed test_free_name_value_pairs() since is internal only.
Updated: integration_test.py
- Added logic to set AMDSMI_CLI_PATH from environment or default.
- Raise FileNotFoundError if path does not exist.
- Append CLI path to sys.path and handle ImportError with a clear message.
- Removed redundant @handle_exceptions function decorator.
- Removed redundant self.setUp() and self.tearDown() calls.
Updated: amdsmi_interface.py
- Removed POINTER conversion in amdsmi_get_gpu_pm_metrics_info() and amdsmi_get_gpu_reg_table_info()
All tests pass/skip
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
* Update tests/python_unittest/integration_test.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>
* Review Update 1
Modified: integration_test.py
- Added logic to properly loop through firmware list and display each name and version
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
* Skip xgmi_err tests + improve running output
Changes:
1. Now check for elevated permissions
2. Skip xgmi_error related SYSFS tests, refer to xgmi_read_write.cc
(both are skipped)
3. Added list of tests and provided a summary of additional output
provided
Change-Id: Iefc85c270faad89c625e2bd7af397d24faed2437
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
---------
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Co-authored-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 67eb541c15]
* Adjusted amdsmitst and reset command to account for separation of power profile and perf level behavior
* Updated test to reset power profile to previous user setting
* Removed performance level from reset_profile_results in reset --profile command
* Updated Changelog with change to reset profile behavior
---------
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
[ROCm/amdsmi commit: 954d4860c1]
amdsmi/tests/amd_smi_test/functional/memorypartition_read_write.cc:453:32: warning: the address of ‘orig_memory_partition’ will never be NULL [-Waddress]
453 | if ((orig_memory_partition == nullptr) ||
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
[ROCm/amdsmi commit: 66eb189396]
Adding a check to see if we're in guest -> allowing equal XCD values.
This is because in mVF configurations, we may not be able to read the gfx clock values.
Change-Id: I8e5d9627e061e98ec854734a91624c8077644a2a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: e12d270693]
Temporary solution until CQE can update how their containers are ran.
This is because the driver reload requires:
1) Containers must run serially
(i.e. no parallel containers running at the same time)
2) Containers must run with extra parameters:
`--cap-add=SYS_ADMIN -v /lib/modules:/lib/modules`
Change-Id: If6364c9e82da8404b73ac6a9688833f4d18693b0
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 425b05cb18]
Description:
- Added a new API `amdsmi_gpu_driver_reload()` to reload the AMD GPU driver independently.
- Updated CLI (`sudo amd-smi reset -r`) and Python bindings to support driver reload functionality.
- Removed automatic driver reload from `amdsmi_set_gpu_memory_partition()` and `amdsmi_set_gpu_memory_partition_mode()`.
- Enhanced CLI and test cases to allow users to control when the driver reload occurs.
- Updated documentation and changelog to reflect the new driver reload process.
- Improved error handling and logging for driver reload operations.
- Added progress bar and user confirmation prompts for driver reload commands.
* Update build/test strategy to only allow one test execution at a time
* Modify API verbage + modify systemctl error output
- Systemctl is typically not enabled on docker.
- And is an edge case for gpu being active process/etc for display devices.
* Remove AMDSMI_STATUS_AMDGPU_RESTART_ERR from the return values
* Move driver reload to after we save original compute partitions
---------
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: d24dc7ef89]
Add support to Query UBB/OAM temperature.
* Updated Python API with new temperature metrics enum
---------
Co-authored-by: Bill Liu <shuzhliu@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
[ROCm/amdsmi commit: abd3c02a3c]
Changes:
- Modified outputputs for amd-smi set/reset when in partitions
to display error codes
- Provided some general cleanup for the above ^
----------------------------------------------------
- Updated `amd-smi set -o <value>` / `amd-smi set --power-cap <value>` command to
allow setting power cap to values other than 0, provided the current power cap is not 0.
- Modified power_cap_read_write.cc:
- Added a check to ensure that the power cap can only be set to non-zero values if the current
power cap is not 0.
- Reset the power cap to the original value after the test to maintain state consistency.
Change-Id: If489bb35812ba4fc4cc34723b0dc39c99926e5d7
---------
Signed-off-by: Poag, Charis <Charis.Poag@amd.com>
[ROCm/amdsmi commit: ec055f2c2d]
New:
- gpu_cache_read.h and gpu_cache_read.cc
- Test reads GPU cache info and asserts valid structure
Updated:
- integration_test.py
- Added test_gpu_cache_info() and asserts valid structure
- test_get_gpu_compute_partition() to loop through all devices when test fail/pass
Added:
- test_get_gpu_compute_partition_returns_string() to integration_test.py
- This test displays the current compute partition for each bdf
---------
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 470c62f887]
* Add the API and CLI to show the board voltage.
---------
Change-Id: Icb25bd653bb1d004704b5a21b378ca31b2b242c7
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
[ROCm/amdsmi commit: 970560fc7c]