20 Коммитов

Автор SHA1 Сообщение Дата
Bindhiya Kanangot Balakrishnan e8c3b22734 [SWDEV-556483] Fix runtime PM suspend causing test failures (#1931)
Added runtime PM detection and DRM ioctl-based device wake
to handle GPUs in BACO state. Modified tests to wake
suspended devices before reading sysfs files.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-11-25 13:36:45 -06:00
Peter Park b4e336aef3 [rocm-smi-lib] docs: fix changelog heading ROCm 6.5.0 -> ROCm 7.0.0 (#363) 2025-08-18 11:46:02 -04:00
Charis Poag 5e31509711 Removed backwards compatibility for jpeg_activity/vcn_activity
Updated:
- Removed backwards compatibility for jpeg_activity/vcn_activity
- On supported ASICs users can use XCP (partition) stat values:
  jpeg_busy and vcn_busy

Change-Id: I78c403f8462668738ec57cac12b107f6a3989b18
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 1c6b2adae7]
2025-05-29 13:47:56 -05:00
Charis Poag aacf23778d [SWDEV-518325/SWDEV-518320/SWDEV-443309] Fix Partition Enumeration
* Changes:
  - Updates to DRM renderD* / card* pathing for partition devices
  - Now use KFD to discover AMD devices and populate accordingly
    Device MUST have an accessible KFD node (via cgroups)
  - Updated several ROCm SMI CLI outputs to handle SYSFS files
    which are not accessible on partition nodes
  - Added a new method to help get card/drm info
    (rsmi_dev_device_identifiers_get) from ROCm SMI

Change-Id: If844f27ffc595942272abe9c8167ed90a0b0e225
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: a0df877fdf]
2025-04-14 16:03:24 -05:00
Castillo, Juan 3aa80ec0e4 SWDEV-518214: GPU Metrics 1.8 (#31)
* SWDEV-518214: GPU Metrics 1.8 (#31)

- Updates:
    - Adding the following metrics to allow new calculations for violation status:
        - Per XCP metrics gfx_below_host_limit_ppt_acc
        - Per XCP metrics gfx_below_host_limit_thm_acc
        - Per XCP metrics gfx_low_utilization_acc
        - Per XCP metrics gfx_below_host_limit_total_acc
    - Increasing available JPEG engines to 40. Current ASICs may not support all 40. These will be indicated as UINT16_MAX or N/A in CLI.

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/rocm_smi_lib commit: f69e65f7bd]
2025-03-20 18:07:32 -05:00
Kanangot Balakrishnan, Bindhiya d2461a186b [SWDEV-481004] Fix for incorrect gfx_version number (#8)
The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/rocm_smi_lib commit: 94dca7073b]
2025-01-21 15:46:56 -06:00
Charis Poag 7b3c814501 [SWDEV-496693] GPU metrics 1.7
Changes:
    - Added new GPU metrics:
      1) XGMI link status - Up/Down; 1 = up; 0 = down
      2) Graphics clocks below host limit (per XCP)
         accumulators -> used to help calculate a violation status
      3) VRAM max bandwidth at max memory clock
    - Updated rocm-smi --showmetrics to include new metrics.
    Units/values reflect as indicated by driver, may differ
    from AMD SMI or other ROCm SMI interfaces which
    use these fields.
    - N/A fields means the device does not support providing this
    data.

Change-Id: I17b313345f15070a76b3a30dd8d5645d212d601b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 4de2168866]
2024-12-15 16:48:13 -05:00
Charis Poag 2258c26c53 [SWDEV-488276/SWDEV-497613] Update memory partition set functionality
Changes:
  - Added warning screen to ROCm SMI users
    setting memory partition
  - Added new API (rsmi_dev_memory_partition_capabilities_get)
    to retrieve memory partition capabilities
    (What users can set memory partition modes to)
  - Increased time-bar for CLI sets display to 40 seconds
  - API now waits until the driver reloads with SYSFS files active
  - [SWDEV-475712] [CLI/API] Fixed target_graphics_version field
    not properly displaying for MI2x or Navi 3x ASICs.
  - Updated tests

Change-Id: Iaf89d1b7ad9ceb449b289bc82ea198fe3b23992e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 46902274b6]
2024-11-12 12:18:44 -04:00
Peter Park 8af04d3531 Update changelog fmt to internal standard
Change-Id: Icdb7eb59c6770f46ddae401f23b84cd06e6d3b09


[ROCm/rocm_smi_lib commit: 568cc6e7c7]
2024-11-08 16:20:49 -05:00
Oliveira, Daniel d41fbc88ca [SWDEV-490187 / SWDEV-491215] Remove reset gpu partition + NPS test disabled
The reset gpu partition support for both compute and memory were removed

Code changes related to the following:
  * rsmi_dev_compute_partition_reset()
  * rsmi_dev_memory_partition_reset()
  * CLI
  * Unit tests
  * Documentation

Change-Id: I3fb8570dbf9e755ae70369587ef44bbf64e17fe8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: a1295714f2]
2024-10-21 14:22:57 -05:00
Charis Poag 0b40a73798 [SWDEV-422195/SWDEV-440985] GPU metrics 1.6 + --showmetrics
Changes:
- Added new GPU metrics:
  1) Violation status' (ex. PVIOL/TVIOL) accumulators
  2) XCP (Graphics Compute Partitions) statistics
  3) pcie other end recovery counter
- Added rocm-smi --showmetrics
Units/values reflect as indicated by driver, may differ
from AMD SMI or other ROCm SMI interfaces which
use these fields.
- N/A fields means the device does not support providing this
data.

Change-Id: Ia2cd3bb65c4f474ebdb39db8062ea716f2b4d8ee
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 0609cbf1d0]
2024-09-27 13:18:05 -04:00
Maisam Arif 02d2ceddaf Bump version tool:2.3.1+hash
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ic67456d7484c2f5a0ce0e086e56b29e20d9d9745


[ROCm/rocm_smi_lib commit: 055b023d2e]
2024-08-08 01:40:55 -05:00
Charis Poag 0d5c46fe52 [SWDEV-475552/SWDEV-475351] Fix segfault TestComputePartitionReadWrite
In order to check partition id's we must continue to check # of devices.
Since this fluctuates with partition updates
and there are drm minor limitations.

For the drm minor limitation of 64, user must remove other drivers
using PCIe space. You can see these by:
ls /sys/class/drm

Recommend: rmmod unneeded driver and reload amdgpu. In order to
ensure CPX can enumerate with all XCP (Graphic Cluster Partitions).

Change-Id: Ib663503f0b7264dce163f6ac2d50795fc8dc5eba
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: c11209f618]
2024-07-27 17:47:54 -05:00
Charis Poag 33eb3fa429 [SWDEV-463213] Add partition ID fallback + new API
Changes:
- Added rsmi_dev_partition_id_get() -> uses fallback described
  below for devices which support partition updates.
- Updated/added to tests for partitions to reflect these changes.

Due to driver changes in KFD, some devices may report bits [31:28] or [2:0].
bits [63:32] = domain
bits [31:28] = partition id
bits [27:16] = reserved
bits [15:8]  = Bus
bits [7:3] = Device
bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes

Change-Id: Ia5641cfb8dbe2d1bff52f8eb81d5a159954528d3
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 323ab1105d]
2024-06-27 17:27:01 -05:00
Oliveira, Daniel af02873dfb fix: [SWDEV-458862] [rocm/rocm_smi_lib]
Fixes reading pp_od_clk_voltage new variable format and size.

Code changes related to the following:
  * get_od_clk_volt_info()
  * get_od_clk_volt_curve_regions()
  * Unit tests
  * CLI options restored: --showclkvolt, --showvc, --showvoltagerange, --setvc
    * Rework: 162d1d24
  * Bump CLI version
  * CHANGELOG.md

Change-Id: I817ca224de923fdaa992df84592d63b4d5a12b22
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 8e6d66e15b]
2024-05-07 20:47:26 -05:00
Ori Messinger 02a862bde1 ROCm SMI LIB: Fix rsmiBindings.py.in Mismatch
This commit aligns the rsmiBindings.py.in file's
"notification_type_names" & "rsmi_evt_notification_type_t" with
those found in the rsmiBindings.py file.

Change-Id: I67f36606c505992fb98495651310bd70a1755033
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: 0c48cd9122]
2024-05-02 23:22:44 -05:00
Charis Poag 0025ebafca Add ROCm 6.1.1 changelog, ROCm SMI deprication, vbios fix
* Updates:
    - Add ROCm 6.1.1 Changelog updates
    - Add planned ROCm SMI deprication notice
    - Fix rocm-smi --showvbios showing extra errors
      for GPUs which do not have a VBIOS (MI300a ASICs)

Change-Id: I0e5ccfe2677f9c7909ca13863a920e323e82b439
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: f5c32b5415]
2024-03-30 00:11:09 -05:00
Charis Poag e9608d8963 Update ROCm 6.0/6.1 CHANGELOG.md & README.md
* Updates:
    - [CHANGELOG.md] Provide 6.1 and 6.0 changes
    - [README.md] Update readme with relavant changes
    - [CLI] Updated --showpower to expand on types of power provided to users

Change-Id: Ic653cc81f80b7973654e2c23e1ab70567b930aa7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: c5acd4ee88]
2024-03-20 00:17:33 -05:00
Sam Wu c785a58e99 [ROCDOC-95] Standardize documentation for ReadtheDocs
Apply the following changes to project documentation for ReadtheDocs:

add version number to documentation left navigation bar and page title
add an "About" section with a license page
enable htmlzip, pdf, epub formats when publishing on Read the Docs
set pdf title, author, copyright, and version
rename .sphinx/.doxygen to sphinx/doxygen
remove docBin from URL
update rocm-docs-core dependency
update dependabot config

Change-Id: Ife8c89a2e9323f436b3e54ef2a9e013c19b3b228


[ROCm/rocm_smi_lib commit: 67dc4b0f2a]
2024-01-11 17:47:58 -05:00
Bill(Shuzhou) Liu 84882db8fc Use Unified Changelog Template
The CHANGELOG.md is added to track changes.

Change-Id: I33547cb7f1596b4b8abf206aebdd664649d4d19f


[ROCm/rocm_smi_lib commit: b40933b895]
2023-02-21 14:27:55 -06:00