Share the same mutex as rocm-smi implementation. Handle the crash
when a user is not in render group.
Change-Id: I486b26569f9b523b41bbdaf95d51f4a730978cfd
- CLI: Added average_power to display if current_power is empty
- CLI: fixed PCIe current_speed not displaying GT/s
- ROCm API: 1.3 & 1.4
-> commented out setting avg clocks to current clock value
(leave as max uint value, not re-assign; these are not same values)
-> commented out setting current_socket_power = average_power
(leave as max uint value, not re-assign; these are not same values)
-> For all non-array clocks, placed value in first
array[0] to keep outputs consistent
(helps xcd calc)
- ROCm API: rsmi_dev_metrics_curr_gfxclk_get fixed to count
XCDs using backwards compatible rsmi_dev_gpu_metrics_info_get.
- ^ Fixes XCD count overall + assigning clock[0] in 1.3 to curr
freq
- AMD SMI API: amdsmi_get_gpu_metrics_info() initialized all new
1.5 metric values for all lower metric tables
- AMD SMI API: wrapper -> fix is here + returns correct AMD SMI return
- AMD SMI API: wrapper -> now displays amdsmi return status as
string in logs
- gpu_metrics_read.cc -> now has better overview of backwards
compatible output
- gpu_metrics_read.cc -> Cleaned up output, added units, and
display all array output
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id5b60ded5b0ed2cdf0f96ca72c79e356f0410960
1. provide prototype and documentation for esmi specific api.
define structures and update classes as required
2. update cmake files as required and add esmi api to the
amdsmi esmi integration example.
Change-Id: I753ec176f9b381e74c9646525dfd9075237bf8d9
Changes:
- Add new engine field vcn_activity (from 1.4/1.5
gpu_metrics
- Updated log output to enhance view of gpu_metric
data as json pretty print
- Added new fields provided in 1.5
- Added unit overview in python API, CLI is WIP
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: I7d9f29e7ecc35dcd0697814c222cdd02b0d5518e
Add above fields for cache info. Remove driver_date in CLI and
Remove the disable properties of cache.
Change-Id: I80672490908d9e32a149076cc37459fa56b8b0bf
The socket represents a physical device, and the partition devices
should belong to the socket. The partition devices are only
different in function id in BDF. Use the BD part of the BDF to
identify a socket.
Change-Id: I5d355a6f5db02faa7555b760a36c7351b8d8d835
that are called during amdsmi inititalization
amdsmi_get_cpu_family,
amdsmi_get_cpu_model,
amdsmi_get_cpu_threads_per_core,
amdsmi_get_number_of_cpu_cores,
amdsmi_get_number_of_cpu_sockets
Added amdsmi_get_cpucore_info to amdsmi library
Change-Id: Ib88d580e1d85afdf578963247e585cfae05c58ad
Having an unnamed struct confuses our wrapper generator.
Adding a name solved it.
Change-Id: Iab3e73317fb21fb3667beef04878d4f3da96eadf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
* Updates:
- [API] Added rsmi_dev_power_get(uint32_t dv_ind,
uint64_t *power,
RSMI_POWER_TYPE
*type)
provides generic get to average or
current power & provides backwards
compatibility
- Added a utility function to get MonitorTypes
(monitor_type_string(type)) &
RSMI_POWER_TYPE (power_type_string(type))
strings
- [Tests] Added rsmi_dev_power_get tests and
provided better verification of return values for
all power APIs
- [Tests] Updated power outputs to show correct
units
- [example] Now uses avg, current, and generic
power functions with type output response
Change-Id: I5ca06ca37fd5f61e100f2835b664d6cdd1ca42e6
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Adds support for 'gpu_metrics_v1_4' and new counters
Code changes related to the following:
* rsmi gpu_metrics APIs
* rsmi gpu_metrics Logs
* The new gpu_metrics are now part of the Device
Build changes related to the following: None
Change-Id: Ie748e977cd0a01c6a2fb82260014c0699605dbb3
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
* Updates:
- rocm_smi_lib + CLI:
Rename all "NPS mode" -> "memory partition"
related files/functions/API/CLI to align with correct
technical naming
- rocm_smi_main: fixed identifying primary card's unique id
utilize rsmi_dev_unique_id_get to map which
KFD nodes belong to it
- rsmi_dev_*_partition*: now have better logging output
- compute partition tests:
Added 20 sec delay for workaround until GPU
busy is confirmed as the issue
- CPPLint fixes/formatting
- [Example] Moved all endl to "\n" for efficiency
- [Example] Added Edge & Junction temperature examples
- [Example] Added rsmi_minmax_bandwidth_get() example - WIP
Change-Id: Ida6db6fda7e0ac9d696a34cb15b4746e69d58d51
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
* Updates:
- rocm_smi_logger:
General cleanup &
Aligned to cpplint rules for usage
- rocm_smi_monitor:
Fixed MonitorTypes
from not displaying properly in logs
& Added socket power label + current
socket power MonitorTypes
- rocm_smi API:
Added rsmi_dev_current_socket_power_get API
- rocm_smi CLI:
General cleanup,
Concise info now displays device data
in variable width (see printLogSpacer's
new field),
printLogSpacer now as an adjustable
variable that overrides appWidth,
Added Socket Power to base rocm-smi +
--showpower CLI calls,
--showpower & base rocm-smi CLI defaults
to printing socket power (if not available,
displays average power)
- Cleaned up temp label references
- power_read gtests:
Added current socket power to testing
Change-Id: Ica57e6f98ad96e2584e7c7955e188f68d2dab89d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
The purpose of this patch is to add the following missing firmware
blocks to the SMI LIB:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5d4d37d883878dd02ef8533d4eb8891d54d70630
Implements APIs for 'gpu_metrics_v1_3' utilization averages
Code changes related to the following:
* rsmi_dev_activity_metric_get()
* rsmi_dev_activity_avg_mm_get()
* CLI shows "Avg.Memory Bandwidth" under "--showmemuse"
Change-Id: I8e4600f350a7c18499abf022534db2b875f09d5f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
* Updates:
- Fixed infinit loop on systems
which did not have VRAM files
- Fixed concise info from throwing exception
with no amdgpu driver loaded
- Fix for ability to see all nodes when
after switching partitions (mirrors
original card display/settings)
- Added to logs build type, lib path,
and set env. variables
Change-Id: Ic0333df355144ce2242cecea93fe4ce51caf311c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>