Γράφημα Υποβολών

130 Υποβολές

Συγγραφέας SHA1 Μήνυμα Ημερομηνία
Juan Castillo f8b8347627 [SWDEV-496693]GPU Metrics 1.7
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram

- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
2024-12-18 10:57:06 -06:00
Joe Narlo d0a7332d32 SWDEV-492272 [AMDSMI] Build/Compiler warnings messages
Fix compiler warnings

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I10657b8f3ef18a9b45311e8f6509958297a57823
2024-12-13 00:38:07 -05:00
Joe Narlo 3052ad4220 SWDEV-495787 [AMDSMI] Different license headers
Change copyrights to MIT and remove date

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I16f5b412f2b9ddefaaa1771aa714cc18829a1be4
2024-11-22 08:55:28 -05:00
gabrpham 00b3184e9f SWDEV-478748 Changed TestPciReadWrite Test Failure message to Warning
TEST FAILURE message for `amdsmi_get_gpu_cpi_throughput` and
`amdsmi_get_gpu_pci_bandwidth` changed to WARNING to indicate that
pcie_bw and/or pp_dpm_pcie sysfs files may not be supported on respetive
devices.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I1ad6e15eceacb5a00b022458ee5fb21df9d845c7
2024-10-18 16:32:57 -05:00
Charis Poag 3a4abbd8c0 [SWDEV-422195/SWDEV-440985] GPU metrics 1.6
Changes:
    - Added new GPU metrics:
      1) Violation status' (ex. PVIOL/TVIOL) accumulators
      2) XCP (Graphics Compute Partitions) statistics
      3) pcie other end recovery counter
    - CLI/API/tests changes were made accordingly

Change-Id: I589b9b1f570f25dda12d95bb501feca85da8b3bb
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-27 12:04:21 -05:00
Lang Yu 7a557b1c50 SWDEV-463405: Add amdsmi_get_link_topology_nearest support
amdsmi_get_link_topology_nearest() is used to retrieve
the set of GPUs that are nearest to a given device
at a specific interconnectivity level.

Code changes related to the following:
    * API
    * CLI
    * Unit tests
    * Examples

Header Unification Change: "/amdsmi/+/1122408"

Change-Id: Id0317797c652c267742513936d321677793ec634
Signed-off-by: Lang Yu <lang.yu@amd.com>
2024-09-26 16:43:27 -05:00
gabrpham 8bc4abc88b Corrected partition changes in header and wrapper
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Iafd7de8f08924873da841ee6eca62100a17b2b6c
2024-09-20 17:01:55 -05:00
gabrpham c9a489d437 Moved partition_id from static --asic-info to static --partition.
partition_id also removed from the `amdsmi_asic_info_t` struct and
supporting API has been added for querying partition information.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Id5a6291a77d11bb97a1c7a200fc465898e86e081
2024-09-20 03:48:42 -04:00
Maisam Arif 3b7f661e71 Moved KFD information to separate structure and API
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: If6eaea589edc704cf408d6391b5f2154134035e7
2024-09-20 03:48:42 -04:00
Eisuke Kawashima 1b6ec8df07 chore: unset executable permission
Change-Id: I06727774f3b1657a7955b172a40d0dfc9c76d6b9
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-16 17:34:39 -04:00
Maisam Arif 105db1afcd Udpated License Dates
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8ca199c129c06508bc3e23745ab5ac2d20dce928
2024-09-16 16:14:47 -04:00
Charis Poag a33e4c9e14 [SWDEV-483526] Fix MI3x partitions not showing all logical nodes
Changes:
- Updates to amdsmi_asic_info_t structure to include:
  target_graphics_version, kfd_id, node_id, partition_id
- Updates to amd-smi static --asic to display new
  samdsmi_asic_info_t fields
- Updates to gpu enumeration during amdsmi_init()
  to discover all logical GPUs when in a non-SPX mode
  (ex. DPX, TPX, QPX, or CPX)
 - Updates to amdsmi_get_gpu_bdf_id(..) to include
   partition_id details when in BDF or optional bits.
     - bits [63:32] = domain
     - bits [31:28] or bits [2:0] = partition id
     - bits [27:16] = reserved
     - bits [15:8]  = Bus
     - bits [7:3] = Device
     - bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes

- C++/Python tests updated to reflect these outputs

Change-Id: I4be0ea35bb98f3109ae2ca9e82f6b21baa38de29
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-11 16:35:17 -05:00
Tim Huang 260edaa752 [SWDEV-463402] - Support retrieving connection type and P2P capabilities between two GPUs
1. Add a API interface amdsmi_topo_get_p2p_status to retrieve
connection type and P2P capabilities between 2 GPUs.

2. Add getting p2p status test in hw_topology_read
to print P2P capability information.

3. Add below tables for cli topology sub commands:
  - CACHE COHERANCY TABLE
  - ATOMICS TABLE
  - DMA TABLE
  - BI-DIRECTIONAL TABLE

Change-Id: I199173030d4170115cea27c472958a4826e4e1bf
Signed-off-by: Tim Huang <tim.huang@amd.com>
2024-09-06 09:42:34 -04:00
Maisam Arif 97c487372f Clean up unused files & Update License info
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5b58e8fe3d9eeac207b07ce0fe4134dd717dbd90
2024-09-05 09:52:48 -04:00
gabrpham 95ca2b83a1 Changed power parameter in amdsmi_get_energy_count() to energy_accumulator
Issue linked here: https://github.com/ROCm/amdsmi/issues/38

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I622236eb3f0144aefeb6c82d2713b4822bfeeb11
2024-09-04 09:38:08 -04:00
Oliveira, Daniel 893f13ab98 SWDEV-463399: amdsmi_get_gpu_vram_info() adds bit-width
Driver info `amdgpu_gpu_info.vram_bit_width` is exposed through amdsmi_get_gpu_vram_info().

Code changes related to the following:
  * API
  * CLI
  * Unit tests
  * Examples

Change-Id: I8abd8db7a603078b2b1c008b2685cecf35caf3d2
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-08-27 18:22:50 -04:00
Oliveira, Daniel af3670d758 SWDEV-463372: amdsmi_get_utilization_count() adds decoder_activity
GPU Metrics info `gpu_metrics.vcn_activity` is exposed through amdsmi_get_utilization_count().

Code changes related to the following:
  * API
  * CLI
  * Unit tests

Change-Id: I831b2a81bdc0e090a6698dcb689d10f91ed87dd9
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-08-27 16:58:34 -05:00
Dalibor Stanisavljevic 7b2463abe0 SWDEV-457337 - Fix header alignment
Change-Id: I9f25f6c4f0d00c76b66d13162f30be11368f5b59
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2024-05-23 04:41:57 -04:00
Maisam Arif 7d999aa34c SWDEV-458102 - Updates to pp_od_clk_voltage parsing
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I650dae1a99856dcde914fe66917cf9111f3ce0e2
2024-05-15 03:18:24 -05:00
Maisam Arif 11c72946eb Revert "SWDEV-458102 - Deprecated Voltage Curve API"
This reverts commit 1423fb632e.

Change-Id: I8a3eaf0a9f28200e09fb35d5260fbc070fe8a4a9
2024-05-02 15:27:16 -05:00
Charis Poag c24d66740e SWDEV-450580 - Fix powercap set
Updates:
     * CLI - Added AMDSMIHelpers.convert_SI_unit() to help
       conversion of units
     * API - Reverted to uW for power cap limits
     * CLI - amd-smi static --limit now includes MIN_POWER
     * Tests now are all using uW units to keep W conversion
       to only happen in CLI
     * Python API now reflects same units as uW (what is seen
       in amdgpu driver)
     * CLI - amd-smi metric --power:
       Fixed power seen on gpu_metrics v1.3

Change-Id: I32d9ba78d0d8806772f0860f9a803a885b3f316a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-05-02 10:13:39 -05:00
Maisam Arif 1423fb632e SWDEV-458102 - Deprecated Voltage Curve API
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I111c3ce26d2ab66d5e755432f4b8a9bfa631f805
2024-05-02 02:53:29 -04:00
Oliveira, Daniel 08e2e21bab fix: [SWDEV-442525] [rocm/amd_smi_lib]
Fixes gpu_process_list

Code changes related to the following:
  * amdsmi_get_gpu_process_list()
  * CLI
  * Examples
  * Unit tests
  * Changelog
  * Readme
  * rocm_smi_lib commit: 677433b367

Change-Id: I9210fbca7a5da92d0a8b472b72ca82597c8e4fb5
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-03-27 16:48:24 -05:00
Oliveira, Daniel c6208c0db0 fix: [rocm/amd_smi_lib] Navi3X/Navi2X/MI100 amdsmitst 2 test cases fail when running
Checks returned error by get_gpu_pci_bandwith() before assert

Code changes related to the following:
  * Unit tests

Change-Id: I950eee5d92607eea08722af7d7c84e8457cd4e60
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-02-28 15:11:22 -06:00
Oliveira, Daniel 475424525e fix: [rocm/amd_smi_lib] TestFrequenciesRead & TestPciReadWrite test cases failed
Fixes asserts in unit tests, and 'pp_dpm_pcie' condition

Code changes related to the following:
  * rsmi_dev_pci_bandwidth_set()
  * Functional tests

Change-Id: Id5e6851393fa3b51bb8cad87daca1efaf500a7e0
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-02-22 03:40:50 -05:00
Oliveira, Daniel 78074d7d77 fix: [rocm/amd_smi_lib] amdsmi_get_gpu_activity gfx/memory activity does not update
Checks and forces rereading gpu metrics unconditionally

Code changes related to the following:
  * Device::dev_log_gpu_metrics()
  * amdsmi_get_gpu_metrics_header_info()
    Removed unintentionally during work on 'header cleanup Remove non-unified headers'
  * Examples
  * Unit tests

Change-Id: I83710e173c0f7102d0b7f865c18474c979a95cd8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-02-13 10:15:17 -06:00
Oliveira, Daniel 55734d2d7a fix: [rocm/amd_smi_lib] header cleanup Remove non-unified headers
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards

Code changes related to the following:
  * '_get_gpu_metrics_' APIs
  * Functional tests

Change-Id: I2dd2ecde11c1d77e343e0ae0e10aeb9120ae9b99
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-01-26 10:38:48 -05:00
Charis Poag 34bd26c68e Fix metric type error output + re-align with ROCm SMI metrics
Changes:
* [CLI] Provide fix for "/opt/rocm/bin/amd-smi metric
TypeError: '>' not supported between instances of 'str' and 'i"
--> Python API was updated, CLI needed to reflect these changes
* [API] Updated amdsmi.h's with ROCm SMI
--> Incorrectly added mem_bandwidth_acc & mem_max_bandwidth
--> Realigned wrapper with updates
* [Test] Added metrics not shown in gpu_metrics_read.cc

Change-Id: Ia3a172377fd5a582254dd5a46d81dbec7e763cd9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-01-24 21:23:40 -06:00
Charis Poag 5ff5af0b5a Fix GPU metric tests & cleanup test output
- CLI: Added average_power to display if current_power is empty
    - CLI: fixed PCIe current_speed not displaying GT/s
    - ROCm API: 1.3 & 1.4
                -> commented out setting avg clocks to current clock value
(leave as max uint value, not re-assign; these are not same values)
                    -> commented out setting current_socket_power = average_power
(leave as max uint value, not re-assign; these are not same values)
                    -> For all non-array clocks, placed value in first
                        array[0] to keep outputs consistent
                    (helps xcd calc)
      - ROCm API: rsmi_dev_metrics_curr_gfxclk_get fixed to count
        XCDs using backwards compatible rsmi_dev_gpu_metrics_info_get.
      - ^ Fixes XCD count overall + assigning clock[0] in 1.3 to curr
        freq
      - AMD SMI API: amdsmi_get_gpu_metrics_info() initialized all new
        1.5 metric values for all lower metric tables
      - AMD SMI API: wrapper -> fix is here + returns correct AMD SMI return
      - AMD SMI API: wrapper -> now displays amdsmi return status as
        string in logs
      - gpu_metrics_read.cc -> now has better overview of backwards
        compatible output
      - gpu_metrics_read.cc -> Cleaned up output, added units, and
        display all array output

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id5b60ded5b0ed2cdf0f96ca72c79e356f0410960
2023-12-19 14:18:15 -05:00
Maisam Arif d790ebc62b Refactor gpu_metrics usage in libraries
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I763638d4b546bf49b234e823df81028c357e8f49
2023-11-22 03:32:15 -06:00
Maisam Arif 5dba2f3120 Updated License Dates
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Id6fd66b03c602232ecc1a063a534a15fe3a03f56
2023-11-07 03:57:08 -05:00
Galantsev, Dmitrii 513dd8a445 Merge rocmsmi/amd-staging into amd-dev 20231103
Change-Id: Ie70ab54a63b25649b6b9d30620c5546dc66cd766
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-11-03 02:55:02 -05:00
Galantsev, Dmitrii 88d5e011e6 SWDEV-424983 - Fix supported metrics api checks
Change-Id: I5c95bb3057dd7546036cbd87bbf7025469d2b3d5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-30 17:28:59 -04:00
Galantsev, Dmitrii df4f5e8bf8 Merge rocmsmi/amd-staging into amd-dev 20231016
Change-Id: I137171162a64af4960d82336cc517c1b34a870f3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-16 14:31:13 -05:00
Galantsev, Dmitrii 6d72d65c48 Merge rocmsmi/amd-staging into amd-dev 20231010
Change-Id: I492562094a004eb78b2cc2b52d14d013d9f97112
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-11 18:58:12 -05:00
Galantsev, Dmitrii 3d3759061a Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I6037383a3efa777cc281a992fd9aa11d8e9ced28
2023-10-04 19:11:59 -05:00
Galantsev, Dmitrii 31cc2eecfb Merge remote-tracking branch 'rocmsmi/amd-staging' into HEAD
Change-Id: I0661926c10eef2bc32b83d9a63a3a6eb6991e781
2023-09-25 04:35:53 -05:00
Galantsev, Dmitrii 5c41319c83 Merge remote-tracking branch 'rocmsmi/amd-staging' into HEAD
Change-Id: I65ed7f3a0d1b6e58bc8377932d7c39db21d1b422
2023-09-21 23:43:20 -05:00
Maisam Arif d2ef113457 SWDEV-412847 - Changed junction to hotspot
Change-Id: I7f6c1a0a77e6a09d2a3e831463cf03e35266bf40
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-09-14 17:43:26 -05:00
Bill(Shuzhou) Liu b52034fed8 Add API for the memory type
Get the memory type from libdrm and add a new API.

Change-Id: I89327bca2ef860f2e3f4f6ca20def2331eba66c0
2023-09-07 13:05:58 -05:00
Bill(Shuzhou) Liu 9021ef96dc Support PCIe vendor name
Add the support for PCIe vendor name.

Change-Id: Ibc1d289a08731e4c5a14f992f3b0d31b51482396
2023-08-28 16:46:43 -05:00
Galantsev, Dmitrii 14190c5a94 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I7a35220a2283b92c5b4825ee99d6693401ef8e1e
2023-08-28 16:01:19 -05:00
Galantsev, Dmitrii 936719eeb6 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I9c38b4facd472b877d1ad133f3176a023c890955
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-23 16:04:15 -05:00
Maisam Arif ca59a60a9a Updated Versioning
corrected to amd-smi version from rocm-smi version
	Added newline characters in the gpu choices
	Updated cli versioning to 23.2.1.0 to match amd-smi

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ia6db3a281e2349e05a09209bdcfdfa5ac48e3a86
2023-08-01 14:28:27 -04:00
Bill(Shuzhou) Liu 4307330cb0 Fix unit test errors
Add unit test error handling for set freq and volt.

Change-Id: I5877f8300b942caac8f38e6efc03264bfc432def
2023-07-12 09:39:39 -04:00
Bill(Shuzhou) Liu 9e2fcd0e40 Fix fan write unit test failure
Even if fan speed can be read, sometimes the set is not supported.

Change-Id: I8584e6fe170c34144800af78d76f04234def11c8
2023-06-29 07:58:23 -05:00
Maisam Arif 9cebc93cee Cleaned up APIs
Change-Id: I93487e01d7126bdfa77439b571df927a6af3bb70
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2023-06-07 10:48:37 -04:00
Bill(Shuzhou) Liu 62ce965409 Clean up the APIs
Remove and rename APIs after review.

Change-Id: I5464f200eb605b366673f8abca95183c3837843b
2023-05-30 16:08:54 -04:00
Dalibor Stanisavljevic 1bc1d431d8 SWDEV-384793 - Clean up API
Change-Id: I441b315d32df59a454e06d521e5ca8b2c229451a
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-05-19 16:40:26 +02:00
Bill(Shuzhou) Liu dc4ba12e00 Return NOT_SUPPORT for set function in VM guest
Fix the unit tests which are fail in VM guest environment.

Change-Id: Id7c58887692bbdecba54f5d2d8463b292e19b4ad
2023-05-11 10:42:55 -05:00