İşleme Grafiği

146 İşleme

Yazar SHA1 Mesaj Tarih
Juan Castillo f8b8347627 [SWDEV-496693]GPU Metrics 1.7
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram

- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
2024-12-18 10:57:06 -06:00
Joe Narlo d0a7332d32 SWDEV-492272 [AMDSMI] Build/Compiler warnings messages
Fix compiler warnings

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I10657b8f3ef18a9b45311e8f6509958297a57823
2024-12-13 00:38:07 -05:00
Joe Narlo 3052ad4220 SWDEV-495787 [AMDSMI] Different license headers
Change copyrights to MIT and remove date

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I16f5b412f2b9ddefaaa1771aa714cc18829a1be4
2024-11-22 08:55:28 -05:00
Adam Pryor b7789d4699 Revert "[SWDEV-446215] Update cmake to put test libs in proper lib dir"
This reverts commit 6e01df00ca.

Reason for revert: Because the gtest of amdsmi is different to other components so it was installed in a share/amdsmi/lib folder. It cannot be installed in a common folder such as /usr/local/bin or /usr/bin because all other components try to search those folder first.

 

This is breaking ROCmValidationSuite and other tools. Per Wang, Yanyao this should be reverted.

Change-Id: Id61bc6056fe41800e738616f39293e9b8762a377
2024-11-15 15:08:12 -05:00
adapryor 6e01df00ca [SWDEV-446215] Update cmake to put test libs in proper lib dir
Change-Id: I2e91b904b3f869cdba717d872c10d799d0260c30
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-10-29 16:07:58 -04:00
gabrpham 00b3184e9f SWDEV-478748 Changed TestPciReadWrite Test Failure message to Warning
TEST FAILURE message for `amdsmi_get_gpu_cpi_throughput` and
`amdsmi_get_gpu_pci_bandwidth` changed to WARNING to indicate that
pcie_bw and/or pp_dpm_pcie sysfs files may not be supported on respetive
devices.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I1ad6e15eceacb5a00b022458ee5fb21df9d845c7
2024-10-18 16:32:57 -05:00
Charis Poag 3a4abbd8c0 [SWDEV-422195/SWDEV-440985] GPU metrics 1.6
Changes:
    - Added new GPU metrics:
      1) Violation status' (ex. PVIOL/TVIOL) accumulators
      2) XCP (Graphics Compute Partitions) statistics
      3) pcie other end recovery counter
    - CLI/API/tests changes were made accordingly

Change-Id: I589b9b1f570f25dda12d95bb501feca85da8b3bb
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-27 12:04:21 -05:00
Lang Yu 7a557b1c50 SWDEV-463405: Add amdsmi_get_link_topology_nearest support
amdsmi_get_link_topology_nearest() is used to retrieve
the set of GPUs that are nearest to a given device
at a specific interconnectivity level.

Code changes related to the following:
    * API
    * CLI
    * Unit tests
    * Examples

Header Unification Change: "/amdsmi/+/1122408"

Change-Id: Id0317797c652c267742513936d321677793ec634
Signed-off-by: Lang Yu <lang.yu@amd.com>
2024-09-26 16:43:27 -05:00
gabrpham 8bc4abc88b Corrected partition changes in header and wrapper
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Iafd7de8f08924873da841ee6eca62100a17b2b6c
2024-09-20 17:01:55 -05:00
gabrpham c9a489d437 Moved partition_id from static --asic-info to static --partition.
partition_id also removed from the `amdsmi_asic_info_t` struct and
supporting API has been added for querying partition information.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Id5a6291a77d11bb97a1c7a200fc465898e86e081
2024-09-20 03:48:42 -04:00
Maisam Arif 3b7f661e71 Moved KFD information to separate structure and API
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: If6eaea589edc704cf408d6391b5f2154134035e7
2024-09-20 03:48:42 -04:00
Eisuke Kawashima 1b6ec8df07 chore: unset executable permission
Change-Id: I06727774f3b1657a7955b172a40d0dfc9c76d6b9
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-16 17:34:39 -04:00
Maisam Arif 105db1afcd Udpated License Dates
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8ca199c129c06508bc3e23745ab5ac2d20dce928
2024-09-16 16:14:47 -04:00
Charis Poag a33e4c9e14 [SWDEV-483526] Fix MI3x partitions not showing all logical nodes
Changes:
- Updates to amdsmi_asic_info_t structure to include:
  target_graphics_version, kfd_id, node_id, partition_id
- Updates to amd-smi static --asic to display new
  samdsmi_asic_info_t fields
- Updates to gpu enumeration during amdsmi_init()
  to discover all logical GPUs when in a non-SPX mode
  (ex. DPX, TPX, QPX, or CPX)
 - Updates to amdsmi_get_gpu_bdf_id(..) to include
   partition_id details when in BDF or optional bits.
     - bits [63:32] = domain
     - bits [31:28] or bits [2:0] = partition id
     - bits [27:16] = reserved
     - bits [15:8]  = Bus
     - bits [7:3] = Device
     - bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes

- C++/Python tests updated to reflect these outputs

Change-Id: I4be0ea35bb98f3109ae2ca9e82f6b21baa38de29
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-11 16:35:17 -05:00
Tim Huang 260edaa752 [SWDEV-463402] - Support retrieving connection type and P2P capabilities between two GPUs
1. Add a API interface amdsmi_topo_get_p2p_status to retrieve
connection type and P2P capabilities between 2 GPUs.

2. Add getting p2p status test in hw_topology_read
to print P2P capability information.

3. Add below tables for cli topology sub commands:
  - CACHE COHERANCY TABLE
  - ATOMICS TABLE
  - DMA TABLE
  - BI-DIRECTIONAL TABLE

Change-Id: I199173030d4170115cea27c472958a4826e4e1bf
Signed-off-by: Tim Huang <tim.huang@amd.com>
2024-09-06 09:42:34 -04:00
Maisam Arif 97c487372f Clean up unused files & Update License info
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5b58e8fe3d9eeac207b07ce0fe4134dd717dbd90
2024-09-05 09:52:48 -04:00
gabrpham 95ca2b83a1 Changed power parameter in amdsmi_get_energy_count() to energy_accumulator
Issue linked here: https://github.com/ROCm/amdsmi/issues/38

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I622236eb3f0144aefeb6c82d2713b4822bfeeb11
2024-09-04 09:38:08 -04:00
Oliveira, Daniel 893f13ab98 SWDEV-463399: amdsmi_get_gpu_vram_info() adds bit-width
Driver info `amdgpu_gpu_info.vram_bit_width` is exposed through amdsmi_get_gpu_vram_info().

Code changes related to the following:
  * API
  * CLI
  * Unit tests
  * Examples

Change-Id: I8abd8db7a603078b2b1c008b2685cecf35caf3d2
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-08-27 18:22:50 -04:00
Oliveira, Daniel af3670d758 SWDEV-463372: amdsmi_get_utilization_count() adds decoder_activity
GPU Metrics info `gpu_metrics.vcn_activity` is exposed through amdsmi_get_utilization_count().

Code changes related to the following:
  * API
  * CLI
  * Unit tests

Change-Id: I831b2a81bdc0e090a6698dcb689d10f91ed87dd9
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-08-27 16:58:34 -05:00
Maisam Arif 2388ff7e3c Whitespace
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8335e617670a471a97bf54886b3221b6222e507f
2024-07-10 19:22:02 -05:00
Charis Poag 7194aaebf3 [SWDEV-455442/SWDEV-464645] Add back voltage curve testing for MI300
Validation requires running tests for MI300 systems, this update
removes the exclusion for these systems.

Change-Id: Idacf3e8bf0bd569f1cfa6192af47993eb5440ee6
2024-07-08 14:24:26 -05:00
Dalibor Stanisavljevic 7b2463abe0 SWDEV-457337 - Fix header alignment
Change-Id: I9f25f6c4f0d00c76b66d13162f30be11368f5b59
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2024-05-23 04:41:57 -04:00
Maisam Arif 7d999aa34c SWDEV-458102 - Updates to pp_od_clk_voltage parsing
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I650dae1a99856dcde914fe66917cf9111f3ce0e2
2024-05-15 03:18:24 -05:00
Maisam Arif 52843152a5 SWDEV-444567 - Added Ring Hang Event
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I2e73ba08ee0004f6f30660b2fa425ea94bafceca
2024-05-03 17:21:28 -04:00
Maisam Arif 11c72946eb Revert "SWDEV-458102 - Deprecated Voltage Curve API"
This reverts commit 1423fb632e.

Change-Id: I8a3eaf0a9f28200e09fb35d5260fbc070fe8a4a9
2024-05-02 15:27:16 -05:00
Charis Poag c24d66740e SWDEV-450580 - Fix powercap set
Updates:
     * CLI - Added AMDSMIHelpers.convert_SI_unit() to help
       conversion of units
     * API - Reverted to uW for power cap limits
     * CLI - amd-smi static --limit now includes MIN_POWER
     * Tests now are all using uW units to keep W conversion
       to only happen in CLI
     * Python API now reflects same units as uW (what is seen
       in amdgpu driver)
     * CLI - amd-smi metric --power:
       Fixed power seen on gpu_metrics v1.3

Change-Id: I32d9ba78d0d8806772f0860f9a803a885b3f316a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-05-02 10:13:39 -05:00
Maisam Arif 1423fb632e SWDEV-458102 - Deprecated Voltage Curve API
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I111c3ce26d2ab66d5e755432f4b8a9bfa631f805
2024-05-02 02:53:29 -04:00
Maisam Arif 1bd18c1a65 Added new ecc blocks and adjusted metric --ecc-block filtering
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ib2f69c7d59ee5108024794434fb202b5e4f58738
2024-04-18 15:01:41 -04:00
Oliveira, Daniel 08e2e21bab fix: [SWDEV-442525] [rocm/amd_smi_lib]
Fixes gpu_process_list

Code changes related to the following:
  * amdsmi_get_gpu_process_list()
  * CLI
  * Examples
  * Unit tests
  * Changelog
  * Readme
  * rocm_smi_lib commit: 677433b367

Change-Id: I9210fbca7a5da92d0a8b472b72ca82597c8e4fb5
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-03-27 16:48:24 -05:00
Oliveira, Daniel c6208c0db0 fix: [rocm/amd_smi_lib] Navi3X/Navi2X/MI100 amdsmitst 2 test cases fail when running
Checks returned error by get_gpu_pci_bandwith() before assert

Code changes related to the following:
  * Unit tests

Change-Id: I950eee5d92607eea08722af7d7c84e8457cd4e60
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-02-28 15:11:22 -06:00
Oliveira, Daniel 475424525e fix: [rocm/amd_smi_lib] TestFrequenciesRead & TestPciReadWrite test cases failed
Fixes asserts in unit tests, and 'pp_dpm_pcie' condition

Code changes related to the following:
  * rsmi_dev_pci_bandwidth_set()
  * Functional tests

Change-Id: Id5e6851393fa3b51bb8cad87daca1efaf500a7e0
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-02-22 03:40:50 -05:00
Oliveira, Daniel 78074d7d77 fix: [rocm/amd_smi_lib] amdsmi_get_gpu_activity gfx/memory activity does not update
Checks and forces rereading gpu metrics unconditionally

Code changes related to the following:
  * Device::dev_log_gpu_metrics()
  * amdsmi_get_gpu_metrics_header_info()
    Removed unintentionally during work on 'header cleanup Remove non-unified headers'
  * Examples
  * Unit tests

Change-Id: I83710e173c0f7102d0b7f865c18474c979a95cd8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-02-13 10:15:17 -06:00
Oliveira, Daniel 55734d2d7a fix: [rocm/amd_smi_lib] header cleanup Remove non-unified headers
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards

Code changes related to the following:
  * '_get_gpu_metrics_' APIs
  * Functional tests

Change-Id: I2dd2ecde11c1d77e343e0ae0e10aeb9120ae9b99
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-01-26 10:38:48 -05:00
Charis Poag 34bd26c68e Fix metric type error output + re-align with ROCm SMI metrics
Changes:
* [CLI] Provide fix for "/opt/rocm/bin/amd-smi metric
TypeError: '>' not supported between instances of 'str' and 'i"
--> Python API was updated, CLI needed to reflect these changes
* [API] Updated amdsmi.h's with ROCm SMI
--> Incorrectly added mem_bandwidth_acc & mem_max_bandwidth
--> Realigned wrapper with updates
* [Test] Added metrics not shown in gpu_metrics_read.cc

Change-Id: Ia3a172377fd5a582254dd5a46d81dbec7e763cd9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-01-24 21:23:40 -06:00
Galantsev, Dmitrii a60f5d2d4c SWDEV-409184 - Exclude some tests in VM
Change-Id: Ic196a113426fc63a0b2aadfa04ab4b10ed6434e3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-11 01:38:15 -06:00
Charis Poag 5ff5af0b5a Fix GPU metric tests & cleanup test output
- CLI: Added average_power to display if current_power is empty
    - CLI: fixed PCIe current_speed not displaying GT/s
    - ROCm API: 1.3 & 1.4
                -> commented out setting avg clocks to current clock value
(leave as max uint value, not re-assign; these are not same values)
                    -> commented out setting current_socket_power = average_power
(leave as max uint value, not re-assign; these are not same values)
                    -> For all non-array clocks, placed value in first
                        array[0] to keep outputs consistent
                    (helps xcd calc)
      - ROCm API: rsmi_dev_metrics_curr_gfxclk_get fixed to count
        XCDs using backwards compatible rsmi_dev_gpu_metrics_info_get.
      - ^ Fixes XCD count overall + assigning clock[0] in 1.3 to curr
        freq
      - AMD SMI API: amdsmi_get_gpu_metrics_info() initialized all new
        1.5 metric values for all lower metric tables
      - AMD SMI API: wrapper -> fix is here + returns correct AMD SMI return
      - AMD SMI API: wrapper -> now displays amdsmi return status as
        string in logs
      - gpu_metrics_read.cc -> now has better overview of backwards
        compatible output
      - gpu_metrics_read.cc -> Cleaned up output, added units, and
        display all array output

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id5b60ded5b0ed2cdf0f96ca72c79e356f0410960
2023-12-19 14:18:15 -05:00
Maisam Arif d790ebc62b Refactor gpu_metrics usage in libraries
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I763638d4b546bf49b234e823df81028c357e8f49
2023-11-22 03:32:15 -06:00
Maisam Arif 545e57d3e3 SWDEV-426130 - Updated firmware subcommand output
Corrected truncation
	corrected xgmi to ta_xgmi
	remapped smc(system management controller) to pm(power
management)

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I404cefa7b90a454d4f4b08f6490448b47cf32107
2023-11-14 11:56:43 -05:00
Maisam Arif 5dba2f3120 Updated License Dates
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Id6fd66b03c602232ecc1a063a534a15fe3a03f56
2023-11-07 03:57:08 -05:00
Galantsev, Dmitrii 513dd8a445 Merge rocmsmi/amd-staging into amd-dev 20231103
Change-Id: Ie70ab54a63b25649b6b9d30620c5546dc66cd766
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-11-03 02:55:02 -05:00
Galantsev, Dmitrii 88d5e011e6 SWDEV-424983 - Fix supported metrics api checks
Change-Id: I5c95bb3057dd7546036cbd87bbf7025469d2b3d5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-30 17:28:59 -04:00
Galantsev, Dmitrii df4f5e8bf8 Merge rocmsmi/amd-staging into amd-dev 20231016
Change-Id: I137171162a64af4960d82336cc517c1b34a870f3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-16 14:31:13 -05:00
Galantsev, Dmitrii 6d72d65c48 Merge rocmsmi/amd-staging into amd-dev 20231010
Change-Id: I492562094a004eb78b2cc2b52d14d013d9f97112
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-11 18:58:12 -05:00
Galantsev, Dmitrii 3d3759061a Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I6037383a3efa777cc281a992fd9aa11d8e9ced28
2023-10-04 19:11:59 -05:00
Galantsev, Dmitrii 6c8767a69a TESTS - Disable same tests as in rocm-smi
Change-Id: I2587baf8a76e4e3a54880e73941b1d973440e7d3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-03 09:58:58 -04:00
Galantsev, Dmitrii 871fae8b25 Upgrade to CXX-17 gtest-1.14 and cmake-3.14
Change-Id: I3bceb90f79235a9c0616c5d7ef9e37e458ffdce6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-29 13:18:48 -04:00
Galantsev, Dmitrii 31cc2eecfb Merge remote-tracking branch 'rocmsmi/amd-staging' into HEAD
Change-Id: I0661926c10eef2bc32b83d9a63a3a6eb6991e781
2023-09-25 04:35:53 -05:00
Galantsev, Dmitrii 5c41319c83 Merge remote-tracking branch 'rocmsmi/amd-staging' into HEAD
Change-Id: I65ed7f3a0d1b6e58bc8377932d7c39db21d1b422
2023-09-21 23:43:20 -05:00
Maisam Arif d2ef113457 SWDEV-412847 - Changed junction to hotspot
Change-Id: I7f6c1a0a77e6a09d2a3e831463cf03e35266bf40
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-09-14 17:43:26 -05:00
Bill(Shuzhou) Liu b52034fed8 Add API for the memory type
Get the memory type from libdrm and add a new API.

Change-Id: I89327bca2ef860f2e3f4f6ca20def2331eba66c0
2023-09-07 13:05:58 -05:00