- Updates:
- Adding the following metrics to allow new calculations for violation status:
- Per XCP metrics gfx_below_host_limit_ppt_acc
- Per XCP metrics gfx_below_host_limit_thm_acc
- Per XCP metrics gfx_low_utilization_acc
- Per XCP metrics gfx_below_host_limit_total_acc
- Increasing available JPEG engines to 40. Current ASICs may not support all 40. These will be indicated as UINT16_MAX or N/A in CLI.
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
Change Versioning Scheme to match https://semver.org/
Dropping the year enum and API fields in a future release.
Should not impact library versioning since we are now starting from 25.2.0
---------
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
Change-Id: Id090e23f156926d08f9c0b781447388adf268cf6
Blacklisted TestVoltCurvRead for devices with gfx_target_version
90400, 90401 and 90402 as it is not supported on these systems.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* [SWDEV-508173] Updates include:
- Updating py-interface to import amdsmi_get_gpu_reg_table_info and amdsmi_get_gpu_pm_metrics_info.
- Updating the ctypes from byref to pointer.
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram
- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
This reverts commit 6e01df00ca.
Reason for revert: Because the gtest of amdsmi is different to other components so it was installed in a share/amdsmi/lib folder. It cannot be installed in a common folder such as /usr/local/bin or /usr/bin because all other components try to search those folder first.
This is breaking ROCmValidationSuite and other tools. Per Wang, Yanyao this should be reverted.
Change-Id: Id61bc6056fe41800e738616f39293e9b8762a377
TEST FAILURE message for `amdsmi_get_gpu_cpi_throughput` and
`amdsmi_get_gpu_pci_bandwidth` changed to WARNING to indicate that
pcie_bw and/or pp_dpm_pcie sysfs files may not be supported on respetive
devices.
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I1ad6e15eceacb5a00b022458ee5fb21df9d845c7
Changes:
- amdsmi_violation_status_t now includes current accumulated/counter
values
- Tests/wrapper now include added values
- Removed ASIC references in header for host/bm alignment
- Fix violation_status->per_hbm_thrm /
violation_status->active_hbm_thrm
calculations.
Change-Id: Ic86a7cbad5198a41018f82f6b588b83158d9ba0b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
amdsmi_get_link_topology_nearest() is used to retrieve
the set of GPUs that are nearest to a given device
at a specific interconnectivity level.
Code changes related to the following:
* API
* CLI
* Unit tests
* Examples
Header Unification Change: "/amdsmi/+/1122408"
Change-Id: Id0317797c652c267742513936d321677793ec634
Signed-off-by: Lang Yu <lang.yu@amd.com>
Updates:
- Added tests for these API calls:
amdsmi_get_socket_handles
amdsmi_get_processor_type
amdsmi_get_clk_freq
amdsmi_get_gpu_process_info
amdsmi_get_gpu_ras_block_features_enabled
amdsmi_get_gpu_ecc_count
amdsmi_get_gpu_memory_usage
amdsmi_get_gpu_vendor_name
amdsmi_get_utilization_count
- Added amdsmi_init() and amdsmi_shut_down() before and after each test.
- Updated README and removed all pytest references.
Change-Id: Ida0c165a466571b1df36c413161bd95c070f6ff1
Signed-off-by: Ryo Ficano <Ryo.Ficano@amd.com>
partition_id also removed from the `amdsmi_asic_info_t` struct and
supporting API has been added for querying partition information.
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Id5a6291a77d11bb97a1c7a200fc465898e86e081
Moving to TESTS_COMPONENT allows files to be placed
within the amd-smi-lib-test package.
Previously, was put within the amd-smi-lib package,
which will never be triggered for installation with
latest changes.
Change-Id: Id49dbe69bfc7d5bd1af403c28b946fe1edf64d8e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Changes:
- Updates to amdsmi_asic_info_t structure to include:
target_graphics_version, kfd_id, node_id, partition_id
- Updates to amd-smi static --asic to display new
samdsmi_asic_info_t fields
- Updates to gpu enumeration during amdsmi_init()
to discover all logical GPUs when in a non-SPX mode
(ex. DPX, TPX, QPX, or CPX)
- Updates to amdsmi_get_gpu_bdf_id(..) to include
partition_id details when in BDF or optional bits.
- bits [63:32] = domain
- bits [31:28] or bits [2:0] = partition id
- bits [27:16] = reserved
- bits [15:8] = Bus
- bits [7:3] = Device
- bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes
- C++/Python tests updated to reflect these outputs
Change-Id: I4be0ea35bb98f3109ae2ca9e82f6b21baa38de29
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
1. Add a API interface amdsmi_topo_get_p2p_status to retrieve
connection type and P2P capabilities between 2 GPUs.
2. Add getting p2p status test in hw_topology_read
to print P2P capability information.
3. Add below tables for cli topology sub commands:
- CACHE COHERANCY TABLE
- ATOMICS TABLE
- DMA TABLE
- BI-DIRECTIONAL TABLE
Change-Id: I199173030d4170115cea27c472958a4826e4e1bf
Signed-off-by: Tim Huang <tim.huang@amd.com>
Driver info `amdgpu_gpu_info.vram_bit_width` is exposed through amdsmi_get_gpu_vram_info().
Code changes related to the following:
* API
* CLI
* Unit tests
* Examples
Change-Id: I8abd8db7a603078b2b1c008b2685cecf35caf3d2
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
GPU Metrics info `gpu_metrics.vcn_activity` is exposed through amdsmi_get_utilization_count().
Code changes related to the following:
* API
* CLI
* Unit tests
Change-Id: I831b2a81bdc0e090a6698dcb689d10f91ed87dd9
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Validation requires running tests for MI300 systems, this update
removes the exclusion for these systems.
Change-Id: Idacf3e8bf0bd569f1cfa6192af47993eb5440ee6
Updates:
* CLI - Added AMDSMIHelpers.convert_SI_unit() to help
conversion of units
* API - Reverted to uW for power cap limits
* CLI - amd-smi static --limit now includes MIN_POWER
* Tests now are all using uW units to keep W conversion
to only happen in CLI
* Python API now reflects same units as uW (what is seen
in amdgpu driver)
* CLI - amd-smi metric --power:
Fixed power seen on gpu_metrics v1.3
Change-Id: I32d9ba78d0d8806772f0860f9a803a885b3f316a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Checks returned error by get_gpu_pci_bandwith() before assert
Code changes related to the following:
* Unit tests
Change-Id: I950eee5d92607eea08722af7d7c84e8457cd4e60
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Fixes asserts in unit tests, and 'pp_dpm_pcie' condition
Code changes related to the following:
* rsmi_dev_pci_bandwidth_set()
* Functional tests
Change-Id: Id5e6851393fa3b51bb8cad87daca1efaf500a7e0
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Checks and forces rereading gpu metrics unconditionally
Code changes related to the following:
* Device::dev_log_gpu_metrics()
* amdsmi_get_gpu_metrics_header_info()
Removed unintentionally during work on 'header cleanup Remove non-unified headers'
* Examples
* Unit tests
Change-Id: I83710e173c0f7102d0b7f865c18474c979a95cd8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards
Code changes related to the following:
* '_get_gpu_metrics_' APIs
* Functional tests
Change-Id: I2dd2ecde11c1d77e343e0ae0e10aeb9120ae9b99
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Changes:
* [CLI] Provide fix for "/opt/rocm/bin/amd-smi metric
TypeError: '>' not supported between instances of 'str' and 'i"
--> Python API was updated, CLI needed to reflect these changes
* [API] Updated amdsmi.h's with ROCm SMI
--> Incorrectly added mem_bandwidth_acc & mem_max_bandwidth
--> Realigned wrapper with updates
* [Test] Added metrics not shown in gpu_metrics_read.cc
Change-Id: Ia3a172377fd5a582254dd5a46d81dbec7e763cd9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
- CLI: Added average_power to display if current_power is empty
- CLI: fixed PCIe current_speed not displaying GT/s
- ROCm API: 1.3 & 1.4
-> commented out setting avg clocks to current clock value
(leave as max uint value, not re-assign; these are not same values)
-> commented out setting current_socket_power = average_power
(leave as max uint value, not re-assign; these are not same values)
-> For all non-array clocks, placed value in first
array[0] to keep outputs consistent
(helps xcd calc)
- ROCm API: rsmi_dev_metrics_curr_gfxclk_get fixed to count
XCDs using backwards compatible rsmi_dev_gpu_metrics_info_get.
- ^ Fixes XCD count overall + assigning clock[0] in 1.3 to curr
freq
- AMD SMI API: amdsmi_get_gpu_metrics_info() initialized all new
1.5 metric values for all lower metric tables
- AMD SMI API: wrapper -> fix is here + returns correct AMD SMI return
- AMD SMI API: wrapper -> now displays amdsmi return status as
string in logs
- gpu_metrics_read.cc -> now has better overview of backwards
compatible output
- gpu_metrics_read.cc -> Cleaned up output, added units, and
display all array output
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id5b60ded5b0ed2cdf0f96ca72c79e356f0410960