- Updated header and source files
- Updated python interface
- Generated python wrapper for updated header
- Updated the CLI to have cpu family & cpu model
as part of metric table
Change-Id: Iea440251797270d5d29ffe883b0ad6db790be658
[ROCm/amdsmi commit: 6f7273fda5]
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards
Code changes related to the following:
* '_get_gpu_metrics_' APIs
* Functional tests
Change-Id: I2dd2ecde11c1d77e343e0ae0e10aeb9120ae9b99
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: 55734d2d7a]
Changes:
* [CLI] Provide fix for "/opt/rocm/bin/amd-smi metric
TypeError: '>' not supported between instances of 'str' and 'i"
--> Python API was updated, CLI needed to reflect these changes
* [API] Updated amdsmi.h's with ROCm SMI
--> Incorrectly added mem_bandwidth_acc & mem_max_bandwidth
--> Realigned wrapper with updates
* [Test] Added metrics not shown in gpu_metrics_read.cc
Change-Id: Ia3a172377fd5a582254dd5a46d81dbec7e763cd9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 34bd26c68e]
Share the same mutex as rocm-smi implementation. Handle the crash
when a user is not in render group.
Change-Id: I486b26569f9b523b41bbdaf95d51f4a730978cfd
[ROCm/amdsmi commit: 5a6b5d2a0a]
- CLI: Added average_power to display if current_power is empty
- CLI: fixed PCIe current_speed not displaying GT/s
- ROCm API: 1.3 & 1.4
-> commented out setting avg clocks to current clock value
(leave as max uint value, not re-assign; these are not same values)
-> commented out setting current_socket_power = average_power
(leave as max uint value, not re-assign; these are not same values)
-> For all non-array clocks, placed value in first
array[0] to keep outputs consistent
(helps xcd calc)
- ROCm API: rsmi_dev_metrics_curr_gfxclk_get fixed to count
XCDs using backwards compatible rsmi_dev_gpu_metrics_info_get.
- ^ Fixes XCD count overall + assigning clock[0] in 1.3 to curr
freq
- AMD SMI API: amdsmi_get_gpu_metrics_info() initialized all new
1.5 metric values for all lower metric tables
- AMD SMI API: wrapper -> fix is here + returns correct AMD SMI return
- AMD SMI API: wrapper -> now displays amdsmi return status as
string in logs
- gpu_metrics_read.cc -> now has better overview of backwards
compatible output
- gpu_metrics_read.cc -> Cleaned up output, added units, and
display all array output
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id5b60ded5b0ed2cdf0f96ca72c79e356f0410960
[ROCm/amdsmi commit: 5ff5af0b5a]
1. provide prototype and documentation for esmi specific api.
define structures and update classes as required
2. update cmake files as required and add esmi api to the
amdsmi esmi integration example.
Change-Id: I753ec176f9b381e74c9646525dfd9075237bf8d9
[ROCm/amdsmi commit: 65eed73f4d]
Changes:
- Add new engine field vcn_activity (from 1.4/1.5
gpu_metrics
- Updated log output to enhance view of gpu_metric
data as json pretty print
- Added new fields provided in 1.5
- Added unit overview in python API, CLI is WIP
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: I7d9f29e7ecc35dcd0697814c222cdd02b0d5518e
[ROCm/amdsmi commit: 8f3861e1d9]
Add above fields for cache info. Remove driver_date in CLI and
Remove the disable properties of cache.
Change-Id: I80672490908d9e32a149076cc37459fa56b8b0bf
[ROCm/amdsmi commit: 59b510de2b]
The socket represents a physical device, and the partition devices
should belong to the socket. The partition devices are only
different in function id in BDF. Use the BD part of the BDF to
identify a socket.
Change-Id: I5d355a6f5db02faa7555b760a36c7351b8d8d835
[ROCm/amdsmi commit: de7e74f7db]
that are called during amdsmi inititalization
amdsmi_get_cpu_family,
amdsmi_get_cpu_model,
amdsmi_get_cpu_threads_per_core,
amdsmi_get_number_of_cpu_cores,
amdsmi_get_number_of_cpu_sockets
Added amdsmi_get_cpucore_info to amdsmi library
Change-Id: Ib88d580e1d85afdf578963247e585cfae05c58ad
[ROCm/amdsmi commit: 28f6383639]
Having an unnamed struct confuses our wrapper generator.
Adding a name solved it.
Change-Id: Iab3e73317fb21fb3667beef04878d4f3da96eadf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
[ROCm/amdsmi commit: 1b606acf73]
* Updates:
- [API] Added rsmi_dev_power_get(uint32_t dv_ind,
uint64_t *power,
RSMI_POWER_TYPE
*type)
provides generic get to average or
current power & provides backwards
compatibility
- Added a utility function to get MonitorTypes
(monitor_type_string(type)) &
RSMI_POWER_TYPE (power_type_string(type))
strings
- [Tests] Added rsmi_dev_power_get tests and
provided better verification of return values for
all power APIs
- [Tests] Updated power outputs to show correct
units
- [example] Now uses avg, current, and generic
power functions with type output response
Change-Id: I5ca06ca37fd5f61e100f2835b664d6cdd1ca42e6
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 31a1fcce7d]
Adds support for 'gpu_metrics_v1_4' and new counters
Code changes related to the following:
* rsmi gpu_metrics APIs
* rsmi gpu_metrics Logs
* The new gpu_metrics are now part of the Device
Build changes related to the following: None
Change-Id: Ie748e977cd0a01c6a2fb82260014c0699605dbb3
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: 4e4ebde640]
* Updates:
- rocm_smi_lib + CLI:
Rename all "NPS mode" -> "memory partition"
related files/functions/API/CLI to align with correct
technical naming
- rocm_smi_main: fixed identifying primary card's unique id
utilize rsmi_dev_unique_id_get to map which
KFD nodes belong to it
- rsmi_dev_*_partition*: now have better logging output
- compute partition tests:
Added 20 sec delay for workaround until GPU
busy is confirmed as the issue
- CPPLint fixes/formatting
- [Example] Moved all endl to "\n" for efficiency
- [Example] Added Edge & Junction temperature examples
- [Example] Added rsmi_minmax_bandwidth_get() example - WIP
Change-Id: Ida6db6fda7e0ac9d696a34cb15b4746e69d58d51
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: b251bb0c9f]
* Updates:
- rocm_smi_logger:
General cleanup &
Aligned to cpplint rules for usage
- rocm_smi_monitor:
Fixed MonitorTypes
from not displaying properly in logs
& Added socket power label + current
socket power MonitorTypes
- rocm_smi API:
Added rsmi_dev_current_socket_power_get API
- rocm_smi CLI:
General cleanup,
Concise info now displays device data
in variable width (see printLogSpacer's
new field),
printLogSpacer now as an adjustable
variable that overrides appWidth,
Added Socket Power to base rocm-smi +
--showpower CLI calls,
--showpower & base rocm-smi CLI defaults
to printing socket power (if not available,
displays average power)
- Cleaned up temp label references
- power_read gtests:
Added current socket power to testing
Change-Id: Ica57e6f98ad96e2584e7c7955e188f68d2dab89d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: f078375350]
The purpose of this patch is to add the following missing firmware
blocks to the SMI LIB:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5d4d37d883878dd02ef8533d4eb8891d54d70630
[ROCm/amdsmi commit: d44a6ef523]