- To address https://github.com/ROCm/rocm_smi_lib/issues/208
where use of fake BDFs for partitions can cause confusion. This note
is already in the comments of the function definition, but was not
updated in the function declaration.
- Fix broken formatting for the location table for PCIE coordinate fields
- Tracked in SWDEV-501108
Change-Id: Ic85439866cb836bb43acc52314a7f1d026c3215d
[ROCm/rocm_smi_lib commit: 67a0de4279]
Changes:
- Added new GPU metrics:
1) XGMI link status - Up/Down; 1 = up; 0 = down
2) Graphics clocks below host limit (per XCP)
accumulators -> used to help calculate a violation status
3) VRAM max bandwidth at max memory clock
- Updated rocm-smi --showmetrics to include new metrics.
Units/values reflect as indicated by driver, may differ
from AMD SMI or other ROCm SMI interfaces which
use these fields.
- N/A fields means the device does not support providing this
data.
Change-Id: I17b313345f15070a76b3a30dd8d5645d212d601b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 4de2168866]
Changes:
* [API] Removed checking board name, fixes for other MI ASICs
* [CLI] Increased progress bar to change memory partition modes
to 140 seconds, since driver reload is variable per system
Change-Id: Ifcaf40d28b4adf5eaa800c9e3748d33749dc414a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: d04cec7f1d]
Changes:
- Added warning screen to ROCm SMI users
setting memory partition
- Added new API (rsmi_dev_memory_partition_capabilities_get)
to retrieve memory partition capabilities
(What users can set memory partition modes to)
- Increased time-bar for CLI sets display to 40 seconds
- API now waits until the driver reloads with SYSFS files active
- [SWDEV-475712] [CLI/API] Fixed target_graphics_version field
not properly displaying for MI2x or Navi 3x ASICs.
- Updated tests
Change-Id: Iaf89d1b7ad9ceb449b289bc82ea198fe3b23992e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 46902274b6]
The reset gpu partition support for both compute and memory were removed
Code changes related to the following:
* rsmi_dev_compute_partition_reset()
* rsmi_dev_memory_partition_reset()
* CLI
* Unit tests
* Documentation
Change-Id: I3fb8570dbf9e755ae70369587ef44bbf64e17fe8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: a1295714f2]
Changes:
- Added new GPU metrics:
1) Violation status' (ex. PVIOL/TVIOL) accumulators
2) XCP (Graphics Compute Partitions) statistics
3) pcie other end recovery counter
- Added rocm-smi --showmetrics
Units/values reflect as indicated by driver, may differ
from AMD SMI or other ROCm SMI interfaces which
use these fields.
- N/A fields means the device does not support providing this
data.
Change-Id: Ia2cd3bb65c4f474ebdb39db8062ea716f2b4d8ee
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 0609cbf1d0]
- This fix addresses SWDEV-456049 and probably SWDEV-442181 which
have the same apparent root cause of an early exiting
loop while enumerating GPU stats
Change-Id: I517329e06fa2c53205d8b6e002895e648ebf521c
[ROCm/rocm_smi_lib commit: 35496cabc4]
- logging.warn() is deprecated in favour of logging.warning()
- for some reason, this is the only place in all of rocm_smi.py
that uses logging.warn() as pointed out on github
https://github.com/ROCm/rocm_smi_lib/issues/187
Change-Id: Ie1e4a0ea16b996fbed2e902c8edfe68087a5a5fa
[ROCm/rocm_smi_lib commit: fe6a49d186]
Options '--showvoltagerange' and '--showvc' show 'warning' instead of 'error' for unsupported voltage curves
Code changes related to the following:
* CLI
Change-Id: Ide662c98202c32ad01ccaf3c47a61f2543f82ebb
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: 72b112f8f3]
Updates:
- [CLI] Previously --showfw displayed fw that
does not exist on systems. This change removes
that extra output.
Change-Id: If8b063001b80b03579ea1378dfd890c60f62ccd7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 6b8db74578]
In centos-7, python2 is used for cpack bytecompile. Using f strings in code will result in syntax error.
Setting _python_bytecompile_errors_terminate_build to 0 will ignore the errors
Change-Id: I43ecc99ae16627f4f5f91d0cca0398f6a003fa3c
[ROCm/rocm_smi_lib commit: 4ceffdca68]
CPACK is converting /usr/bin/env python3 to /usr/libexec/platform-python in RHEL8.
Undefining __brp_mangle_shebangs will prevent the same
Change-Id: Id285e2cea1de583853cec17eccf0a3a794cca643
[ROCm/rocm_smi_lib commit: 1b828b735b]
rocm-smi is installed in /opt/rocm-ver/bin , but not as a soft link in wheel package
For rocm-smi to work from bin directory, it need the extra path to find rsmiBindings.py
Change-Id: I41388f680cb2ab9f11dc135639b0d30b66082392
[ROCm/rocm_smi_lib commit: c9201f7736]
In order to check partition id's we must continue to check # of devices.
Since this fluctuates with partition updates
and there are drm minor limitations.
For the drm minor limitation of 64, user must remove other drivers
using PCIe space. You can see these by:
ls /sys/class/drm
Recommend: rmmod unneeded driver and reload amdgpu. In order to
ensure CPX can enumerate with all XCP (Graphic Cluster Partitions).
Change-Id: Ib663503f0b7264dce163f6ac2d50795fc8dc5eba
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: c11209f618]
Changes:
- Added rsmi_dev_partition_id_get() -> uses fallback described
below for devices which support partition updates.
- Updated/added to tests for partitions to reflect these changes.
Due to driver changes in KFD, some devices may report bits [31:28] or [2:0].
bits [63:32] = domain
bits [31:28] = partition id
bits [27:16] = reserved
bits [15:8] = Bus
bits [7:3] = Device
bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes
Change-Id: Ia5641cfb8dbe2d1bff52f8eb81d5a159954528d3
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 323ab1105d]
The error message is changed to not supported instead of errors.
Change-Id: I28bd1e009770674389534be12519cc34673ba846
[ROCm/rocm_smi_lib commit: 57e8e72b79]
The provides tag is required when the package provides a virtual package.
Package name along with version will be provided by default and the provides tag is not required for this.
Using the tag for providing the name, but without version was resulting in package upgrade issues.
Change-Id: I74506d8c3bbd75d028bcdc03525c29541dce2b4c
[ROCm/rocm_smi_lib commit: d54bade574]
Updates the license to MIT
Code changes related to the following: None
Change-Id: I62d0a5f02a2d5e58c1952337dff54892793c16cf
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: e7d54946fb]
Checks returned error by rsmi_dev_od_volt_info_get() before assert
Code changes related to the following:
* Unit tests
Change-Id: Icc0f329e35992aae19f07243024521181467bcd3
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: 497ef4a7ef]
When discover the amdgpu, if the assigned numbers are not consecutive,
not all GPU can be discovered. The code is change to discover the
GPU based on max card number.
Change-Id: I8b6a8b49594d6a54c7feb2645bedb83dc5c1b4cc
[ROCm/rocm_smi_lib commit: 8c44416410]
This commit aligns the rsmiBindings.py.in file's
"notification_type_names" & "rsmi_evt_notification_type_t" with
those found in the rsmiBindings.py file.
Change-Id: I67f36606c505992fb98495651310bd70a1755033
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
[ROCm/rocm_smi_lib commit: 0c48cd9122]
In multiple GPUs environment, too many warning messages generated,
and then need to be removed.
Change-Id: I275de2397eb0e6b189e2e17e94335cb1e8f97815
[ROCm/rocm_smi_lib commit: 3d82f1799d]
Fixes reading pp_od_clk_voltage new variable format and size.
Code changes related to the following:
* get_od_clk_volt_info()
* get_od_clk_volt_curve_regions()
* Unit tests
* CLI options removed: --showclkvolt, --showvc, --showvoltagerange, --setvc
Change-Id: Ieedb845eeadcea2f2e447ec576c253ad2a814176
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: 48ddd9abd7]