The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Changes:
- Added new GPU metrics:
1) XGMI link status - Up/Down; 1 = up; 0 = down
2) Graphics clocks below host limit (per XCP)
accumulators -> used to help calculate a violation status
3) VRAM max bandwidth at max memory clock
- Updated rocm-smi --showmetrics to include new metrics.
Units/values reflect as indicated by driver, may differ
from AMD SMI or other ROCm SMI interfaces which
use these fields.
- N/A fields means the device does not support providing this
data.
Change-Id: I17b313345f15070a76b3a30dd8d5645d212d601b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Changes:
* [API] Removed checking board name, fixes for other MI ASICs
* [CLI] Increased progress bar to change memory partition modes
to 140 seconds, since driver reload is variable per system
Change-Id: Ifcaf40d28b4adf5eaa800c9e3748d33749dc414a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Changes:
- Added warning screen to ROCm SMI users
setting memory partition
- Added new API (rsmi_dev_memory_partition_capabilities_get)
to retrieve memory partition capabilities
(What users can set memory partition modes to)
- Increased time-bar for CLI sets display to 40 seconds
- API now waits until the driver reloads with SYSFS files active
- [SWDEV-475712] [CLI/API] Fixed target_graphics_version field
not properly displaying for MI2x or Navi 3x ASICs.
- Updated tests
Change-Id: Iaf89d1b7ad9ceb449b289bc82ea198fe3b23992e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
The reset gpu partition support for both compute and memory were removed
Code changes related to the following:
* rsmi_dev_compute_partition_reset()
* rsmi_dev_memory_partition_reset()
* CLI
* Unit tests
* Documentation
Change-Id: I3fb8570dbf9e755ae70369587ef44bbf64e17fe8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Changes:
- Added new GPU metrics:
1) Violation status' (ex. PVIOL/TVIOL) accumulators
2) XCP (Graphics Compute Partitions) statistics
3) pcie other end recovery counter
- Added rocm-smi --showmetrics
Units/values reflect as indicated by driver, may differ
from AMD SMI or other ROCm SMI interfaces which
use these fields.
- N/A fields means the device does not support providing this
data.
Change-Id: Ia2cd3bb65c4f474ebdb39db8062ea716f2b4d8ee
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
- logging.warn() is deprecated in favour of logging.warning()
- for some reason, this is the only place in all of rocm_smi.py
that uses logging.warn() as pointed out on github
https://github.com/ROCm/rocm_smi_lib/issues/187
Change-Id: Ie1e4a0ea16b996fbed2e902c8edfe68087a5a5fa
Options '--showvoltagerange' and '--showvc' show 'warning' instead of 'error' for unsupported voltage curves
Code changes related to the following:
* CLI
Change-Id: Ide662c98202c32ad01ccaf3c47a61f2543f82ebb
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Updates:
- [CLI] Previously --showfw displayed fw that
does not exist on systems. This change removes
that extra output.
Change-Id: If8b063001b80b03579ea1378dfd890c60f62ccd7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
rocm-smi is installed in /opt/rocm-ver/bin , but not as a soft link in wheel package
For rocm-smi to work from bin directory, it need the extra path to find rsmiBindings.py
Change-Id: I41388f680cb2ab9f11dc135639b0d30b66082392
Changes:
- Added rsmi_dev_partition_id_get() -> uses fallback described
below for devices which support partition updates.
- Updated/added to tests for partitions to reflect these changes.
Due to driver changes in KFD, some devices may report bits [31:28] or [2:0].
bits [63:32] = domain
bits [31:28] = partition id
bits [27:16] = reserved
bits [15:8] = Bus
bits [7:3] = Device
bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes
Change-Id: Ia5641cfb8dbe2d1bff52f8eb81d5a159954528d3
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Fixes reading pp_od_clk_voltage new variable format and size.
Code changes related to the following:
* get_od_clk_volt_info()
* get_od_clk_volt_curve_regions()
* Unit tests
* CLI options restored: --showclkvolt, --showvc, --showvoltagerange, --setvc
* Rework: 48ddd9ab
* Bump CLI version
* CHANGELOG.md
Change-Id: I817ca224de923fdaa992df84592d63b4d5a12b22
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
This commit aligns the rsmiBindings.py.in file's
"notification_type_names" & "rsmi_evt_notification_type_t" with
those found in the rsmiBindings.py file.
Change-Id: I67f36606c505992fb98495651310bd70a1755033
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Fixes reading pp_od_clk_voltage new variable format and size.
Code changes related to the following:
* get_od_clk_volt_info()
* get_od_clk_volt_curve_regions()
* Unit tests
* CLI options removed: --showclkvolt, --showvc, --showvoltagerange, --setvc
Change-Id: Ieedb845eeadcea2f2e447ec576c253ad2a814176
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
This patch adds 'ring hang' enums to ROCM SMI LIB.
This event type name is KFD_SMI_EVENT_RING_HANG.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I9b886eb1fc027f03bcca1e5d1a89a2a186b64bf5
* Updates:
- [CLI] Updated --showmemuse:
-> Add VRAM%, provide better context as "GPU Allocated Memory (VRAM%)"
-> Update "GPU memory use (%)" as
"GPU Memory Read/Write Activity(%)"
- [CLI] Updated --showmaxpower and rocm-smi (no arg)
-> Rounding was inconsistent with values past decimal.
This provides the floor value of the device
Change-Id: Ib76dea2cb8483a1d7f53df675b0a94d8d01c81b9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
* Updates:
- [CHANGELOG.md] Provide 6.1 and 6.0 changes
- [README.md] Update readme with relavant changes
- [CLI] Updated --showpower to expand on types of power provided to users
Change-Id: Ic653cc81f80b7973654e2c23e1ab70567b930aa7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
* Updates:
- [CLI] rocm-smi (no arg) and --showhw:
Now displays 'ID'/'PARTITION ID' from the pcie_id identifier
Helps users identify which partition # the device is
Information provided by KFD
Note: partition_id of 0, means a primary node (AKA root node),
ex. ASICs which do not have partitioning support will show 0
- [API] Fix partitions nodes which do not enumerate with domain:
Adding kfd's domain, allows ASICs which have domains
to enumerate in proper order.
Full pcie_id / bdf propagates to all partition nodes.
- [API] Update rsmi_dev_pci_id_get() to allow users to extract
partition_id from device
- [CLI] Added fix for devices which have modprobe failure,
but DRM does not come up properly. Even though driver shows
initialization was successful.
- [API/Utils] Overloaded print_int_as_hex() template:
Now accepts bitsize, and prints in smallest byte size
possible. Note: bitsize of < 8, please just print as decimial.
Change-Id: Ib0c6f73b2b9c9fea29442a39a669c432874382d8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>