830 Commits

Author SHA1 Message Date
Mallya, Ameya Keshava 0fc5d97026 Added !verify trigger
[ROCm/rocm_smi_lib commit: 2764498bbf]
2025-01-28 20:11:45 -08:00
Kanangot Balakrishnan, Bindhiya d2461a186b [SWDEV-481004] Fix for incorrect gfx_version number (#8)
The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/rocm_smi_lib commit: 94dca7073b]
2025-01-21 15:46:56 -06:00
Mallya, Ameya Keshava e2520ad547 Fixed Workflow for updated KWS structure
[ROCm/rocm_smi_lib commit: aefc865bf4]
2025-01-17 08:26:52 -08:00
Mallya, Ameya Keshava 49b96ca627 Create kws-caller.yml
[ROCm/rocm_smi_lib commit: c6ce9a5aa0]
2025-01-15 11:12:25 -08:00
Galantsev, Dmitrii bc13dfe3c8 [SWDEV-495169] Update ROCm SMI CLI and Error handling (#3)
Issues include:

Update ROCm SMI displaying None or Not Supported to N/A
Update ROCm SMI displaying err msg to instead log err

Signed-off-by: Juan Castillo juan.castillo@amd.com
Change-Id: I1a2ce6e4f329666b5666664a7d7b4475d6c1cbc7

[ROCm/rocm_smi_lib commit: 55ee3cc442]
2025-01-14 17:15:18 -06:00
James Xu 84400150b4 [SWDEV-501108] Update Doxygen note on rsmi_dev_pci_id_get
- To address https://github.com/ROCm/rocm_smi_lib/issues/208
where use of fake BDFs for partitions can cause confusion. This note
is already in the comments of the function definition, but was not
updated in the function declaration.
- Fix broken formatting for the location table for PCIE coordinate fields
- Tracked in SWDEV-501108

Change-Id: Ic85439866cb836bb43acc52314a7f1d026c3215d


[ROCm/rocm_smi_lib commit: 67a0de4279]
2025-01-14 15:49:55 -06:00
Choudhary, Rahul d748d8b247 Create rocm_ci_caller.yml init file to call shared workflow
[ROCm/rocm_smi_lib commit: 3c01c13dfd]
2025-01-07 11:53:58 -08:00
gabrpham a62f424b90 Fixed reset event issues
Issues include:
	SWDEV-480250
	SWDEV-480255
	SWDEV-480248

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Icf12211e4b136f26fce18f09a7bf8b7e9cd20691


[ROCm/rocm_smi_lib commit: 6f51cd651e]
2024-12-30 13:12:46 -05:00
Charis Poag 7b3c814501 [SWDEV-496693] GPU metrics 1.7
Changes:
    - Added new GPU metrics:
      1) XGMI link status - Up/Down; 1 = up; 0 = down
      2) Graphics clocks below host limit (per XCP)
         accumulators -> used to help calculate a violation status
      3) VRAM max bandwidth at max memory clock
    - Updated rocm-smi --showmetrics to include new metrics.
    Units/values reflect as indicated by driver, may differ
    from AMD SMI or other ROCm SMI interfaces which
    use these fields.
    - N/A fields means the device does not support providing this
    data.

Change-Id: I17b313345f15070a76b3a30dd8d5645d212d601b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 4de2168866]
2024-12-15 16:48:13 -05:00
Charis Poag 5d7555e586 [SWDEV-475712] Fix MI2x target_graphics_version
Removed correcting target_graphics_version by
product name. Instead detected target_graphics_version which
needs to be corrected -> populate accordingly.

Change-Id: I90765c87e0629daea5c732dace8acfd17e8c62c7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 8488518b1c]
2024-12-08 22:01:43 -06:00
Charis Poag 72fd1821b0 Merge amd-staging into amd-master 20241125
Change-Id: I801dcda853066d8d2e19a8727b2b07dcafc253b4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 562575c73d]
2024-11-25 08:39:32 -06:00
Charis Poag 06fead7e41 [SWDEV-499029] Fix unable to change memory partition modes
Changes:
  * [API] Removed checking board name, fixes for other MI ASICs
  * [CLI] Increased progress bar to change memory partition modes
    to 140 seconds, since driver reload is variable per system

Change-Id: Ifcaf40d28b4adf5eaa800c9e3748d33749dc414a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: d04cec7f1d]
2024-11-22 20:19:29 -05:00
Zhang Ava c0c3e61cfb Merge amd-staging into amd-master 20241114
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: I97175c160d157e6f8ad0d94b65d2b6f2a2384949


[ROCm/rocm_smi_lib commit: c827c54093]
2024-11-15 11:38:47 +08:00
gabrpham 21d3a831d7 [SWDEV-478748] Changing PCIE Read/Write message TEST FAILURE to WARNING
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I534a94b358f7fddbe3c11d249c6e090cf3fa121e


[ROCm/rocm_smi_lib commit: 5428d29b19]
2024-11-13 15:05:26 -06:00
Peter Park 23190fd3d9 [SWDEV-479054] update doc for rsmi_compute_process_info_get to note 2-step usage:w
Change-Id: I81608f807ab679a27be12be591193712d81232bd
Signed-off-by: Peter Park <peter.park@amd.com>


[ROCm/rocm_smi_lib commit: c3f1d2baf1]
2024-11-13 12:52:18 -05:00
Charis Poag ff46ab4258 Merge amd-staging into amd-master 20241112
Change-Id: I3fba6fb940aa19532037e2125fd1837de4d3f282
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 99c1b5a0df]
2024-11-12 16:43:50 -06:00
Charis Poag 2258c26c53 [SWDEV-488276/SWDEV-497613] Update memory partition set functionality
Changes:
  - Added warning screen to ROCm SMI users
    setting memory partition
  - Added new API (rsmi_dev_memory_partition_capabilities_get)
    to retrieve memory partition capabilities
    (What users can set memory partition modes to)
  - Increased time-bar for CLI sets display to 40 seconds
  - API now waits until the driver reloads with SYSFS files active
  - [SWDEV-475712] [CLI/API] Fixed target_graphics_version field
    not properly displaying for MI2x or Navi 3x ASICs.
  - Updated tests

Change-Id: Iaf89d1b7ad9ceb449b289bc82ea198fe3b23992e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 46902274b6]
2024-11-12 12:18:44 -04:00
Peter Park 8af04d3531 Update changelog fmt to internal standard
Change-Id: Icdb7eb59c6770f46ddae401f23b84cd06e6d3b09


[ROCm/rocm_smi_lib commit: 568cc6e7c7]
2024-11-08 16:20:49 -05:00
Zhang Ava 2a5e1a98bb Merge amd-staging into amd-master 20241106
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: Ib125dee62e5a893871f5c6df7715177973361a02


[ROCm/rocm_smi_lib commit: fa2c9180d7]
2024-11-08 08:42:13 +08:00
Jorge López d51bd18649 Updates driverInitialized() to support amdgpu built as module as well as kernel built-in. Fixes ROCm/rocm_smi_lib#102 and is an updated version of ROCm/rocm_smi_lib#104
Change-Id: Icb3abe820bc67035b822358a1c04bd09a7c22b6b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Reviewed-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 35c1d00f5a]
2024-11-05 16:30:37 -05:00
adapryor 4e399dc383 [SWDEV-412505] Handle mclk permission errors as not supported
Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I25c9af42ed62697f87c70ecaeb153abe53401091


[ROCm/rocm_smi_lib commit: 61ed9e13f4]
2024-10-31 15:18:03 -04:00
Charis Poag 7f4ec4f82f Merge amd-staging into amd-master 20241022
Change-Id: I823ffdba9f1db614542658a2af61df917a44c07a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 7504cd04eb]
2024-10-22 18:23:12 -05:00
Oliveira, Daniel d41fbc88ca [SWDEV-490187 / SWDEV-491215] Remove reset gpu partition + NPS test disabled
The reset gpu partition support for both compute and memory were removed

Code changes related to the following:
  * rsmi_dev_compute_partition_reset()
  * rsmi_dev_memory_partition_reset()
  * CLI
  * Unit tests
  * Documentation

Change-Id: I3fb8570dbf9e755ae70369587ef44bbf64e17fe8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: a1295714f2]
2024-10-21 14:22:57 -05:00
Charis Poag 986c8e09ef Merge amd-staging into amd-master 20240930
Change-Id: I814a16d5e1f9371e00dbbb3623dc975ab2359f44
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 28c2cc3298]
2024-09-30 10:56:18 -04:00
Charis Poag 0b40a73798 [SWDEV-422195/SWDEV-440985] GPU metrics 1.6 + --showmetrics
Changes:
- Added new GPU metrics:
  1) Violation status' (ex. PVIOL/TVIOL) accumulators
  2) XCP (Graphics Compute Partitions) statistics
  3) pcie other end recovery counter
- Added rocm-smi --showmetrics
Units/values reflect as indicated by driver, may differ
from AMD SMI or other ROCm SMI interfaces which
use these fields.
- N/A fields means the device does not support providing this
data.

Change-Id: Ia2cd3bb65c4f474ebdb39db8062ea716f2b4d8ee
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 0609cbf1d0]
2024-09-27 13:18:05 -04:00
Peter Park 459929b5c0 Add LD_LIBRARY_PATH note to rocm.docs pages
https://github.com/ROCm/rocm_smi_lib/pull/197
https://advanced-micro-devices-demo--197.com.readthedocs.build/projects/rocm_smi_lib/en/197/

Change-Id: I64386a4f364e40ce61ad9963376d2686db2aa36d


[ROCm/rocm_smi_lib commit: b7221c64b0]
2024-09-26 11:18:44 -04:00
Zhang Ava 7a75c65b38 Merge amd-staging into amd-master 20240924
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: I8a3bc054f4fa4ca3d3de789db78fd34954bb0d88


[ROCm/rocm_smi_lib commit: fa5312dacc]
2024-09-26 18:36:12 +08:00
Harkirat Gill 8596afd618 Add cstdint header for gcc-15 compatibility
Common C++ headers (like <memory>) in GCC 15.0.0 (combined with libstdc++) don't transitively include uint64_t anymore.

Minimal reproducer: https://godbolt.org/z/dqGbnG8bY

Porting: https://github.com/ROCm/rocm_smi_lib/pull/198
Closes: https://github.com/ROCm/rocm_smi_lib/issues/191

Change-Id: I2786e968c107a78104c43c4c474b7f65eaf88c0a


[ROCm/rocm_smi_lib commit: c61ab4fa28]
2024-09-23 15:05:07 -04:00
James Xu be18b69b33 Skip missing vram_str_path and sdma_str_path if sysfs files not created when passing some, but not all, GPUs to a docker image.
- This fix addresses SWDEV-456049 and probably SWDEV-442181 which
	have the same apparent root cause of an early exiting
	loop while enumerating GPU stats

Change-Id: I517329e06fa2c53205d8b6e002895e648ebf521c


[ROCm/rocm_smi_lib commit: 35496cabc4]
2024-09-19 16:53:37 -04:00
Zhang Ava da7bf7e366 Merge amd-staging into amd-master 20240917
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: I198c849530508a90eee8ae5454035b9c610b3f5a


[ROCm/rocm_smi_lib commit: af2507807f]
2024-09-19 18:44:19 +08:00
James Xu ddba959395 SWDEV-478077 - logging.warn used instead of logging.warning
- logging.warn() is deprecated in favour of logging.warning()
- for some reason, this is the only place in all of rocm_smi.py
	that uses logging.warn() as pointed out on github
	https://github.com/ROCm/rocm_smi_lib/issues/187

Change-Id: Ie1e4a0ea16b996fbed2e902c8edfe68087a5a5fa


[ROCm/rocm_smi_lib commit: fe6a49d186]
2024-09-16 13:50:26 -04:00
Zhang Ava 9316bae9d8 Merge amd-staging into amd-master 20240911
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: Iaa1be5b9c6eb4c205ced9d610feada93ad28aa50


[ROCm/rocm_smi_lib commit: 743bd50aa5]
2024-09-13 18:31:57 +08:00
Oliveira, Daniel 902bd6b1ae [SWDEV-483822] rocm-smi shows 'warning' for unsupported curves
Options '--showvoltagerange' and '--showvc' show 'warning' instead of 'error' for unsupported voltage curves

Code changes related to the following:
  * CLI

Change-Id: Ide662c98202c32ad01ccaf3c47a61f2543f82ebb
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 72b112f8f3]
2024-09-10 11:36:36 -05:00
Zhang Ava 454af241e3 Merge amd-staging into amd-master 20240828
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: I13187d5772ee1e5e74d9daf4268b90819b4198d0


[ROCm/rocm_smi_lib commit: ad511e9b0d]
2024-08-29 20:09:31 +08:00
Charis Poag 16ce520533 Fix rocm-smi --showfw displaying error fw prints
Updates:
  - [CLI] Previously --showfw displayed fw that
    does not exist on systems. This change removes
    that extra output.

Change-Id: If8b063001b80b03579ea1378dfd890c60f62ccd7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 6b8db74578]
2024-08-27 15:43:16 -04:00
Maisam Arif ef2b4bf482 Merge amd-staging into amd-master 20240823
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ifae5d76d62ab50794486b08ea06d009bd870f9ea


[ROCm/rocm_smi_lib commit: 2acc7f2447]
2024-08-23 19:19:47 -05:00
Ranjith Ramakrishnan c2692ad0a9 SWDEV-480347 - Don't terminate build for cpack bytecompile errors
In centos-7, python2 is used for cpack bytecompile. Using f strings in code will result in syntax error.
Setting _python_bytecompile_errors_terminate_build to 0 will ignore the errors

Change-Id: I43ecc99ae16627f4f5f91d0cca0398f6a003fa3c


[ROCm/rocm_smi_lib commit: 4ceffdca68]
2024-08-23 13:43:32 -07:00
Galantsev, Dmitrii 4b21bb4b29 Merge amd-staging into amd-master 20240808
Change-Id: I15b180364b79de72a74ae52fbce7009122a01415
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: ee3caa23ed]
2024-08-08 16:38:24 -05:00
Maisam Arif 02d2ceddaf Bump version tool:2.3.1+hash
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ic67456d7484c2f5a0ce0e086e56b29e20d9d9745


[ROCm/rocm_smi_lib commit: 055b023d2e]
2024-08-08 01:40:55 -05:00
Ranjith Ramakrishnan 6908160870 SWDEV-476075 - Prevent the modification of interpreter directives
CPACK is converting /usr/bin/env python3 to /usr/libexec/platform-python in RHEL8.
Undefining __brp_mangle_shebangs will prevent the same

Change-Id: Id285e2cea1de583853cec17eccf0a3a794cca643


[ROCm/rocm_smi_lib commit: 1b828b735b]
2024-08-05 09:50:04 -07:00
Zhang Ava 86acd39029 Merge amd-staging into amd-master 20240801
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: I8c9b1a2805e83e5de5873ef8fafaf38143c2ebd8


[ROCm/rocm_smi_lib commit: 481928965f]
2024-08-02 13:12:38 +08:00
Ranjith Ramakrishnan 70b29ed8c3 SWDEV-469004 - Append additonal path to system path
rocm-smi is installed in /opt/rocm-ver/bin , but not as a soft link in wheel package
For rocm-smi to work from bin directory, it need the extra path to find rsmiBindings.py

Change-Id: I41388f680cb2ab9f11dc135639b0d30b66082392


[ROCm/rocm_smi_lib commit: c9201f7736]
2024-07-31 19:52:46 -04:00
Maisam Arif 1998b57059 [SWDEV-464799] Handle UnicodeEncodeError with non UTF-8 locales
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ifb8e6e3c7891c4f70faba5441fb87cc4ba2302f3


[ROCm/rocm_smi_lib commit: c2235eea35]
2024-07-31 17:01:01 -04:00
Charis Poag 0d5c46fe52 [SWDEV-475552/SWDEV-475351] Fix segfault TestComputePartitionReadWrite
In order to check partition id's we must continue to check # of devices.
Since this fluctuates with partition updates
and there are drm minor limitations.

For the drm minor limitation of 64, user must remove other drivers
using PCIe space. You can see these by:
ls /sys/class/drm

Recommend: rmmod unneeded driver and reload amdgpu. In order to
ensure CPX can enumerate with all XCP (Graphic Cluster Partitions).

Change-Id: Ib663503f0b7264dce163f6ac2d50795fc8dc5eba
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: c11209f618]
2024-07-27 17:47:54 -05:00
Galantsev, Dmitrii fdf002623e Azure - Switch to amd-staging branch
Change-Id: I9e69d0d4f8ece2dfc7b3327f8486f0d3d8bbeba0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: dd954be887]
2024-07-23 17:07:05 -05:00
Maisam Arif 39bada540d Merge amd-staging into amd-master 20240710
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I82f353d21279e2c1ee1788cb9e949b6d3b7e3270


[ROCm/rocm_smi_lib commit: 3cd677419e]
2024-07-10 19:57:39 -05:00
Maisam Arif 96fd0e1ea4 Bump version lib:7.3.0 tool:2.3.0+hash
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I637b34e03580d5b5efb1e12805a9cdeb7778de74


[ROCm/rocm_smi_lib commit: db4d81b944]
2024-07-10 19:55:15 -05:00
Galantsev, Dmitrii e0f68840a4 Fix return 0 on failed do_configureLogrotate
Fixes https://github.com/ROCm/rocm_smi_lib/issues/184

Change-Id: I206927835de8811df6813c7a9b0b92258d776894
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: e2e65cc7ad]
2024-07-08 11:25:10 -05:00
Zhang Ava fed31f08b4 Merge amd-staging into amd-master 20240628
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: I9493cdf35b64cfa0a99de017e2d6b521af71cf14


[ROCm/rocm_smi_lib commit: 4c0ce45912]
2024-07-04 14:19:02 +08:00
Charis Poag 33eb3fa429 [SWDEV-463213] Add partition ID fallback + new API
Changes:
- Added rsmi_dev_partition_id_get() -> uses fallback described
  below for devices which support partition updates.
- Updated/added to tests for partitions to reflect these changes.

Due to driver changes in KFD, some devices may report bits [31:28] or [2:0].
bits [63:32] = domain
bits [31:28] = partition id
bits [27:16] = reserved
bits [15:8]  = Bus
bits [7:3] = Device
bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes

Change-Id: Ia5641cfb8dbe2d1bff52f8eb81d5a159954528d3
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 323ab1105d]
2024-06-27 17:27:01 -05:00