Checks and forces rereading gpu metrics unconditionally
Code changes related to the following:
* Device::dev_log_gpu_metrics()
* amdsmi_get_gpu_metrics_header_info()
Removed unintentionally during work on 'header cleanup Remove non-unified headers'
* Examples
* Unit tests
Change-Id: I83710e173c0f7102d0b7f865c18474c979a95cd8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: 78074d7d77]
Renamed structs to be more conistent with what they are calling
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I6f2be2fcb76f004aa592f0dad8545565700ccd4b
[ROCm/amdsmi commit: f831cf49f7]
- Updated header and source files
- Updated python interface
- Generated python wrapper for updated header
- Updated the CLI to have cpu family & cpu model
as part of metric table
Change-Id: Iea440251797270d5d29ffe883b0ad6db790be658
[ROCm/amdsmi commit: 6f7273fda5]
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards
Code changes related to the following:
* '_get_gpu_metrics_' APIs
* Functional tests
Change-Id: I2dd2ecde11c1d77e343e0ae0e10aeb9120ae9b99
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: 55734d2d7a]
Changes:
* [CLI] Provide fix for "/opt/rocm/bin/amd-smi metric
TypeError: '>' not supported between instances of 'str' and 'i"
--> Python API was updated, CLI needed to reflect these changes
* [API] Updated amdsmi.h's with ROCm SMI
--> Incorrectly added mem_bandwidth_acc & mem_max_bandwidth
--> Realigned wrapper with updates
* [Test] Added metrics not shown in gpu_metrics_read.cc
Change-Id: Ia3a172377fd5a582254dd5a46d81dbec7e763cd9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 34bd26c68e]
Share the same mutex as rocm-smi implementation. Handle the crash
when a user is not in render group.
Change-Id: I486b26569f9b523b41bbdaf95d51f4a730978cfd
[ROCm/amdsmi commit: 5a6b5d2a0a]
Issue: need to return on any failure.
The nullptr check test would segfault without-
all values in struct are not initialized.
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: I4987fb73ba9bcb182de7a439a4286333a41bf7eb
[ROCm/amdsmi commit: d74be3120e]
- CLI: Added average_power to display if current_power is empty
- CLI: fixed PCIe current_speed not displaying GT/s
- ROCm API: 1.3 & 1.4
-> commented out setting avg clocks to current clock value
(leave as max uint value, not re-assign; these are not same values)
-> commented out setting current_socket_power = average_power
(leave as max uint value, not re-assign; these are not same values)
-> For all non-array clocks, placed value in first
array[0] to keep outputs consistent
(helps xcd calc)
- ROCm API: rsmi_dev_metrics_curr_gfxclk_get fixed to count
XCDs using backwards compatible rsmi_dev_gpu_metrics_info_get.
- ^ Fixes XCD count overall + assigning clock[0] in 1.3 to curr
freq
- AMD SMI API: amdsmi_get_gpu_metrics_info() initialized all new
1.5 metric values for all lower metric tables
- AMD SMI API: wrapper -> fix is here + returns correct AMD SMI return
- AMD SMI API: wrapper -> now displays amdsmi return status as
string in logs
- gpu_metrics_read.cc -> now has better overview of backwards
compatible output
- gpu_metrics_read.cc -> Cleaned up output, added units, and
display all array output
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id5b60ded5b0ed2cdf0f96ca72c79e356f0410960
[ROCm/amdsmi commit: 5ff5af0b5a]
1. provide prototype and documentation for esmi specific api.
define structures and update classes as required
2. update cmake files as required and add esmi api to the
amdsmi esmi integration example.
Change-Id: I753ec176f9b381e74c9646525dfd9075237bf8d9
[ROCm/amdsmi commit: 65eed73f4d]
Changes:
- Add new engine field vcn_activity (from 1.4/1.5
gpu_metrics
- Updated log output to enhance view of gpu_metric
data as json pretty print
- Added new fields provided in 1.5
- Added unit overview in python API, CLI is WIP
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: I7d9f29e7ecc35dcd0697814c222cdd02b0d5518e
[ROCm/amdsmi commit: 8f3861e1d9]
Add above fields for cache info. Remove driver_date in CLI and
Remove the disable properties of cache.
Change-Id: I80672490908d9e32a149076cc37459fa56b8b0bf
[ROCm/amdsmi commit: 59b510de2b]
The socket represents a physical device, and the partition devices
should belong to the socket. The partition devices are only
different in function id in BDF. Use the BD part of the BDF to
identify a socket.
Change-Id: I5d355a6f5db02faa7555b760a36c7351b8d8d835
[ROCm/amdsmi commit: de7e74f7db]
Changes AMDSmiDrm to use the versioned library for its dependency
Code changes related to the following:
* AMDSmiDrm::init()
Build changes related to the following: None
Change-Id: Ibd5b3dd88f679912acdfa292502003f58b28daf5
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: e20fd12934]
that are called during amdsmi inititalization
amdsmi_get_cpu_family,
amdsmi_get_cpu_model,
amdsmi_get_cpu_threads_per_core,
amdsmi_get_number_of_cpu_cores,
amdsmi_get_number_of_cpu_sockets
Added amdsmi_get_cpucore_info to amdsmi library
Change-Id: Ib88d580e1d85afdf578963247e585cfae05c58ad
[ROCm/amdsmi commit: 28f6383639]
1. Remove esmi (internal gerrit) repo as git submodule
2. Clone esmi (open-source) repo during cmake using "git clone"
3. Download amd_hsmp.h header file during cmake build
TODO:
We can update the amd_hsmp.h to mainline linux kernel repo after
next Linux kernel release.
Change-Id: I763b5e287e24337c8e9e25f4e421cdb8698b9322
[ROCm/amdsmi commit: 597fb00bef]
This allows for lib version to change
before: libamd_smi.so.1.0
after: libamd_smi.so.23.4
Change-Id: Iaba991afac4e625d11df2bacdf6287c6f8bf5383
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
[ROCm/amdsmi commit: 69c35a4cff]