İşleme Grafiği

88 İşleme

Yazar SHA1 Mesaj Tarih
Pham, Gabriel b779ce2831 [SWDEV-493207] Added amdgpu version to version command
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
2025-01-22 18:05:25 -06:00
Kanangot Balakrishnan, Bindhiya 4b74badb00 [SWDEV-481004] Update Changelog for gfx_version number fix (#54)
Updated changelog with an example showing correct gfx version.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-01-22 08:49:41 -06:00
Kanangot Balakrishnan, Bindhiya 6fa991c39c [SWDEV-481004] Fix for incorrect gfx_version number (#52)
The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-01-21 15:42:05 -06:00
Scaffidi, Salvatore 3793be7735 [SWDEV-463406] Update API with fields for gfx_clock_below_host_limit and low_utilization violations
Updated API with fields for gfx_clock_below_host_limit and low_utilization violations
Change-Id: I25647bae6e7b785f44dab024272767658688bcad

---------
Signed-off-by: Scaffidi, Salvatore <Salvatore.Scaffidi@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
2025-01-08 22:07:23 -06:00
Kanangot Balakrishnan, Bindhiya d0e770ffbc SWDEV-504130 Add temperature violation status to amd-smi monitor (#2)
Added boolean temperature violation status to amd-smi monitor.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-01-08 16:35:53 -06:00
Pham, Gabriel 129ad8ffad [SWDEV-502523] Made amd-smi reset command arguments mutually exclusive
Made reset arguments mutually exclusive so that users can only 
select one option at a time to prevent throwing of errors.

---------
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
2025-01-08 16:24:05 -06:00
Pham, Gabriel 5ed340c08b [SWDEV-502523] made set gpu arguments mutually exclusive (#31)
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
2025-01-07 16:48:01 -06:00
Maisam Arif 8ca2c6e247 Deprecated amdsmi_get_energy_count() power field
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I1b5fe8e278b797458e57dff689e692347901bbfd
2025-01-07 12:45:55 -06:00
Pham, Gabriel 93a027ec95 [SWDEV-476303] Exposed valid values for set command (#8)
Updated amd-smi set help text
---------

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
2024-12-20 15:32:10 -06:00
Arif, Maisam 34f9edd2fc Update CHANGELOG.md
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
2024-12-19 16:47:31 -06:00
Juan Castillo f8b8347627 [SWDEV-496693]GPU Metrics 1.7
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram

- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
2024-12-18 10:57:06 -06:00
gabrpham fe290a2056 [SWDEV-484382] Added fclk and socclk to amd-smi metric -c
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ie7e19c757b05455693c0d26eeb5e8b6c1e238375
2024-12-13 00:33:12 -05:00
gabrpham 5f9c2db6f3 [SWDEV-484382] Added new command amd-smi set -c/--clk-level
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: If45152e3a3c94f65b6a8a960601b9ed16fa3d0d7
2024-12-13 00:32:19 -05:00
gabrpham bc16e1a5da [SWDEV-484382] Added new command amd-smi static --clock
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I49e1aa2e699734d81c40c76c62da1cecc5bd3c0e
2024-12-13 00:30:29 -05:00
Charis Poag bc0015fd36 [SWDEV-488288] Remove GFX_BUSY_ACC from amd-smi metric --usage
Output is not helpful to users.

Change-Id: I12a60e28b8eab2fc3ffca4ea88f03018bf0ef3ce
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-12-10 13:37:36 -05:00
Charis Poag b911a0606a [SWDEV-495824] AMD SMI reporting CPX partitions incorrectly
Updated changelog to provide options to users on how to fix.

Change-Id: I4fd04b1e65ff9d678b2d13109599f57a03c84d41
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-12-10 11:20:03 -05:00
Bindhiya Kanangot Balakrishnan 288b11df37 [SWDEV-496639] Align amd-smi xgmi statistics
The xgmi read and write values were displayed in KB. The numbers became
unreadable due to misalignment. So, converted read and write values to
readable units using helper function. Updated Changelog.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Change-Id: I4c90a1de8a58c29cbdf43fe3480a1546f3946673
2024-12-09 12:57:45 -05:00
Charis Poag d323ecff97 [SWDEV-502744] Fix "amd-smi monitor" shows VCN ENC utilization & clock but not VCN DEC
Reason for this fix:
Navi products use vclk and dclk for both encode and decode.
On MI products, only decode is supported.
Navi products cannot support displaying ENC_UTIL % at this time.

Change-Id: I107bb761794ae4724949ac21c110b23a4f616700
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-12-07 12:11:10 -05:00
Bindhiya Kanangot Balakrishnan fc7e1ddb4a [SWDEV-498507] Tool amd-smi could be more case insensitive
Modified amdsmi_cli to accept case insensitive arguments if
the argument does not start with a single dash(-).

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Change-Id: I1b6320db0afaad0900d5a2049206002c3899fa71
2024-12-02 18:09:45 -05:00
Charis Poag 3ea4a42a6e [SWDEV-488276/SWDEV-497613] Update memory partition set functionality
Changes:
  - [CLI] Added warning screen to AMD SMI users
    setting memory partition
  - [CLI] Added a progress bar time-bar for CLI sets display to 40 seconds
  - [API] Updated to wait until the driver reloads with SYSFS files active
  - [CLI] Now users can set or reset without providing:
    amd-smi set -g all <set arguments>
    or amd-smi reset -g all <set arguments>
    now can directly call -> sudo amd-smi set <set arguments>
    or sudo amd-smi reset <set arguments>
  - [SWDEV-475712][CLI/API] Fixed target_graphics_version field
    not properly displaying for older MI or Navi ASICs.
  - [All APIs] Added a catch for the driver to report invalid arguments
    now these APIs will show AMDSMI_STATUS_INVAL
    (ex. changing to NPS8 if the device does not support it)
  - [Install] Modified paths for Python install commands to support
    multi-ROCm installs

Change-Id: Id11f25d68a82d23c6b2d77ccb30b51e860dd0ca7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-12 16:50:32 -04:00
Maisam Arif abee26d4ab Added ras and ecc counting back to Linux VMs
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ie981f7fe8f481f2137e95dda2e200d00ab4d92c8
2024-11-08 11:05:15 -06:00
Peter Park 31821cb585 Mod changelog to fit internal standard
Change-Id: Id90136f16f15a30b2791ed0634a408a7eb73f96f
2024-11-08 11:57:14 -05:00
Charis Poag 7fc4b853d4 [SWDEV-495305] Fix AttributeError: 'Namespace' object has no attribute 'compute_partition'
Changes:
   - [CLI] Earlier we removed compute & memory partition resets,
     this fix changes back to the correct spacing for
     reset commands

Change-Id: I707ff197baf7a32bfb7ef20f2b26a63acd13f08a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-05 18:49:08 -05:00
Charis Poag 0ceca28f41 [SWDEV-463406] Update sample rate + align metric output
Changes:
- Corrected max speed users can sample from FW/driver
  is 100 ms
- Added warning to amdsmi_get_violation_status()
  call on delay required 100ms to sample
- Removed guest support, this API will not be supported
- Updated CLI `amd-smi metric --throttle` outputs from
    XXX_active -> XXX_status
    XXX_percent -> XXX_activity
  to align with host
- Changelog updated

Change-Id: Ib30dd35dcc04ff67904ca82c86a55a16689df226
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-10-23 17:36:35 -04:00
gabrpham f5b7761ac7 [SWDEV-490187] reset gpu partition were removed
The reset gpu partition support for both compute and memory were removed

Code changes related to the following:
  * amdsmi_reset_gpu_compute_partition()
  * amdsmi_reset_gpu_memory_partition()
  * CLI

Change-Id: I372589074b4da172bedd39223edde18939e373ae
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-10-18 16:22:26 -05:00
gabrpham 27b5a35d65 [SWDEV-488846] Removed '--ecc' option from 'amd-smi monitor' when platform is VM
Change-Id: I8f5d7771cbfac3fe5f52dbccbd9f28020adb5f6f
2024-10-16 10:34:19 -04:00
gabrpham eb9116e8c2 [SWDEV-486872] Removed '--ras' from static command when platform is VM
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I0b03f168d7011428cfea3ab303865f4eaeea78ac
2024-10-16 09:29:24 -05:00
gabrpham 4e2fc2d604 Added amd-smi partition as preliminary command.
new command includes following arguments:
  - current - display the current partition information for the selected
    gpu(s)
  - memory - display memory partition information for the selected
    gpu(s)
  - accelerator - display accelerator partition information for the
    selected gpu(s)
additional functionality will be added as more partition APIs are added.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ica86160139002ef5213d6d4b0e390670aeef01c8
2024-09-27 17:05:04 -05:00
Charis Poag 3a4abbd8c0 [SWDEV-422195/SWDEV-440985] GPU metrics 1.6
Changes:
    - Added new GPU metrics:
      1) Violation status' (ex. PVIOL/TVIOL) accumulators
      2) XCP (Graphics Compute Partitions) statistics
      3) pcie other end recovery counter
    - CLI/API/tests changes were made accordingly

Change-Id: I589b9b1f570f25dda12d95bb501feca85da8b3bb
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-27 12:04:21 -05:00
Lang Yu 7a557b1c50 SWDEV-463405: Add amdsmi_get_link_topology_nearest support
amdsmi_get_link_topology_nearest() is used to retrieve
the set of GPUs that are nearest to a given device
at a specific interconnectivity level.

Code changes related to the following:
    * API
    * CLI
    * Unit tests
    * Examples

Header Unification Change: "/amdsmi/+/1122408"

Change-Id: Id0317797c652c267742513936d321677793ec634
Signed-off-by: Lang Yu <lang.yu@amd.com>
2024-09-26 16:43:27 -05:00
Maisam Arif 09c9574454 [SWDEV-469278] - Lowered PyYAML dependency
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Icfee09b84cf1071ec82b65fc2877be69e0283489
2024-09-20 18:03:00 -04:00
gabrpham 8bc4abc88b Corrected partition changes in header and wrapper
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Iafd7de8f08924873da841ee6eca62100a17b2b6c
2024-09-20 17:01:55 -05:00
Dmitrii Galantsev 9924574cbe Revert "[SWDEV-482058] Updated Packaging for offline installs"
Revert submission 1125402

Reason for revert: Packaging a tar archive of 3rd party sources
Reverted Changes:
I8908451c0:[SWDEV-482058] Updated Packaging for offline insta...
I764c8bf01:[SWDEV-469278] Lowered PyYAML post install script ...

Change-Id: Ib32fa5b9351b1cfc2a8d453e744c0d00209359eb
2024-09-20 16:42:29 -04:00
gabrpham c9a489d437 Moved partition_id from static --asic-info to static --partition.
partition_id also removed from the `amdsmi_asic_info_t` struct and
supporting API has been added for querying partition information.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Id5a6291a77d11bb97a1c7a200fc465898e86e081
2024-09-20 03:48:42 -04:00
Maisam Arif 3b7f661e71 Moved KFD information to separate structure and API
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: If6eaea589edc704cf408d6391b5f2154134035e7
2024-09-20 03:48:42 -04:00
Maisam Arif 2cfae06560 [SWDEV-482058] Updated Packaging for offline installs
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8908451c013fc944645b5b5df3104a2ff73e72bd
2024-09-20 00:55:48 -04:00
gabrpham b7f779182d [SWDEV-448738] Added rocmsmi extremum command as 'set -L'
Change-Id: I997c630bd20cc61673813a2301eb5e3002619a32
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>

Change-Id: Ifa884303f9a0fa058af093a23f5be449bba54f29
2024-09-18 14:51:01 -04:00
Juan Castillo ac593f9fa0 [SWDEV-482966/ SWDEV-482967] Removing pytest dependency + install path change
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I7aace93fcad18d67443e6849c10a1fbbc65d0fa8
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
2024-09-18 00:27:00 -04:00
gabrpham 0d4b332fe4 Removed _validate_positive function and replaced with _positive_int or _not_negative_int as appropriate
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I01effcdf9bae31fd8bc926c5d4bdf58274838618
2024-09-17 18:37:16 -04:00
Maisam Arif 639daa3d90 Fixed amdsmi_get_utilization_count() wrapper generation
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ifd59fca042c4b3b0fc53e100b6892c6b4f7b3e95
2024-09-17 16:34:42 -04:00
Maisam Arif 787d4462fa [SWDEV-482412] Optimized PCIe Bandwidth gpu_metrics calls
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ib37d232b94a080e9b490dd065628d2567aaf4642
2024-09-11 23:26:30 -05:00
Charis Poag a33e4c9e14 [SWDEV-483526] Fix MI3x partitions not showing all logical nodes
Changes:
- Updates to amdsmi_asic_info_t structure to include:
  target_graphics_version, kfd_id, node_id, partition_id
- Updates to amd-smi static --asic to display new
  samdsmi_asic_info_t fields
- Updates to gpu enumeration during amdsmi_init()
  to discover all logical GPUs when in a non-SPX mode
  (ex. DPX, TPX, QPX, or CPX)
 - Updates to amdsmi_get_gpu_bdf_id(..) to include
   partition_id details when in BDF or optional bits.
     - bits [63:32] = domain
     - bits [31:28] or bits [2:0] = partition id
     - bits [27:16] = reserved
     - bits [15:8]  = Bus
     - bits [7:3] = Device
     - bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes

- C++/Python tests updated to reflect these outputs

Change-Id: I4be0ea35bb98f3109ae2ca9e82f6b21baa38de29
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-11 16:35:17 -05:00
Tim Huang 260edaa752 [SWDEV-463402] - Support retrieving connection type and P2P capabilities between two GPUs
1. Add a API interface amdsmi_topo_get_p2p_status to retrieve
connection type and P2P capabilities between 2 GPUs.

2. Add getting p2p status test in hw_topology_read
to print P2P capability information.

3. Add below tables for cli topology sub commands:
  - CACHE COHERANCY TABLE
  - ATOMICS TABLE
  - DMA TABLE
  - BI-DIRECTIONAL TABLE

Change-Id: I199173030d4170115cea27c472958a4826e4e1bf
Signed-off-by: Tim Huang <tim.huang@amd.com>
2024-09-06 09:42:34 -04:00
Maisam Arif bc4ca45862 [SWDEV-450553] Added Subsystem Device ID to amd-smi static --asic
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I428b4993cca027a6eb1bb9c617fe715118a59407
2024-09-04 12:51:02 -05:00
gabrpham 7d8e54d0e1 [SWDEV-450553] Added gpu memory overdrive to metric function
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: If7bd6865d641a5a83c594a4d3c57938b1b6dc18e
2024-09-04 12:54:14 -04:00
gabrpham 95ca2b83a1 Changed power parameter in amdsmi_get_energy_count() to energy_accumulator
Issue linked here: https://github.com/ROCm/amdsmi/issues/38

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I622236eb3f0144aefeb6c82d2713b4822bfeeb11
2024-09-04 09:38:08 -04:00
Charis Poag d9d6637cb7 [SWDEV-451960] [WIP] Add Pytest
Updates:
- Added pytest to shared/pytest folder
- User can execute tests:

[pytest]
python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -s -v
python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -s -v

[unittest]
/opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -v
/opt/rocm/share/amd_smi/tests/pytest/integration_test.py -v

- Automatically installs pytest

Change-Id: Ia3281a9608aeeb803b91f8b83f87ff84b01037f4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-08-29 10:09:29 -04:00
Charis Poag d7c583d422 [SWDEV-478807] Fix incorrect firmware versions and names
- Fix updates API to have correct enum names (PM->SMU)
 - Python API/CLI now reports correct versions and names for
    SMC/TA_XGMI/TA_RAS

Change-Id: Icbe115b3070b9f252ef15b09b781b9b3f5861e50
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-08-23 18:03:13 -05:00
gabrpham 0143041262 Fixed cli issue with empty cpu/core parameter
Change-Id: Id0fee74357a56baaec59ca5359eb00a65cfd6185
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
2024-08-05 16:37:36 -05:00
Maisam Arif 3a9c93bfa6 Updated Changelog with Mutex Fix
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I0aee284ce7600efc66b0ad5392c11bb6a502a929
2024-07-19 11:18:09 -04:00