This commit aligns the rsmiBindings.py.in file's
"notification_type_names" & "rsmi_evt_notification_type_t" with
those found in the rsmiBindings.py file.
Change-Id: I67f36606c505992fb98495651310bd70a1755033
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
[ROCm/rocm_smi_lib commit: 0c48cd9122]
In multiple GPUs environment, too many warning messages generated,
and then need to be removed.
Change-Id: I275de2397eb0e6b189e2e17e94335cb1e8f97815
[ROCm/rocm_smi_lib commit: 3d82f1799d]
Fixes reading pp_od_clk_voltage new variable format and size.
Code changes related to the following:
* get_od_clk_volt_info()
* get_od_clk_volt_curve_regions()
* Unit tests
* CLI options removed: --showclkvolt, --showvc, --showvoltagerange, --setvc
Change-Id: Ieedb845eeadcea2f2e447ec576c253ad2a814176
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: 48ddd9abd7]
This patch adds 'ring hang' enums to ROCM SMI LIB.
This event type name is KFD_SMI_EVENT_RING_HANG.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I9b886eb1fc027f03bcca1e5d1a89a2a186b64bf5
[ROCm/rocm_smi_lib commit: 3282aaa8de]
The environment variable RSMI_MUTEX_THREAD_ONLY=1 to enable thread only mutex.
The RSMI_INIT_FLAG_THRAD_ONLY_MUTEX can also be pass to rsmi_init()
to enable thread only mutex.
Change-Id: I2d9844039b774e386f03bb9bb130d8c342504ea6
[ROCm/rocm_smi_lib commit: 6ff95e55da]
Drops checks that are invalid with the new pp_od_clk_voltage format
Code changes related to the following:
* get_od_clk_volt_info()
* get_od_clk_volt_curve_regions()
Change-Id: I5ebe23aa0ed4ea77d5ab5a94ce34ad9b1b51281f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: e95d80f7ef]
Fixes TestMeasureApiExecutionTime test fails
Code changes related to the following:
* Unit tests
Change-Id: I6223078f219448deb6bfbd78edae371a5a4cf03c
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: adf5c1da67]
* Updates:
- [CLI] Updated --showmemuse:
-> Add VRAM%, provide better context as "GPU Allocated Memory (VRAM%)"
-> Update "GPU memory use (%)" as
"GPU Memory Read/Write Activity(%)"
- [CLI] Updated --showmaxpower and rocm-smi (no arg)
-> Rounding was inconsistent with values past decimal.
This provides the floor value of the device
Change-Id: Ib76dea2cb8483a1d7f53df675b0a94d8d01c81b9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: b86f92230d]
After the dead process is detected, pthread_mutex_consistent() will
be called. After that, the pthread_mutex_unlock() should also be
called to unlock it: "It is the responsibility of the application to
recover the state so it can be reused."
Change-Id: I45d3e2e68c3b06779f3acb1e908dbec0c6a39297
[ROCm/rocm_smi_lib commit: 750704720b]
* Updates:
- [CHANGELOG.md] Provide 6.1 and 6.0 changes
- [README.md] Update readme with relavant changes
- [CLI] Updated --showpower to expand on types of power provided to users
Change-Id: Ic653cc81f80b7973654e2c23e1ab70567b930aa7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: c5acd4ee88]
* Updates:
- [CLI] rocm-smi (no arg) and --showhw:
Now displays 'ID'/'PARTITION ID' from the pcie_id identifier
Helps users identify which partition # the device is
Information provided by KFD
Note: partition_id of 0, means a primary node (AKA root node),
ex. ASICs which do not have partitioning support will show 0
- [API] Fix partitions nodes which do not enumerate with domain:
Adding kfd's domain, allows ASICs which have domains
to enumerate in proper order.
Full pcie_id / bdf propagates to all partition nodes.
- [API] Update rsmi_dev_pci_id_get() to allow users to extract
partition_id from device
- [CLI] Added fix for devices which have modprobe failure,
but DRM does not come up properly. Even though driver shows
initialization was successful.
- [API/Utils] Overloaded print_int_as_hex() template:
Now accepts bitsize, and prints in smallest byte size
possible. Note: bitsize of < 8, please just print as decimial.
Change-Id: Ib0c6f73b2b9c9fea29442a39a669c432874382d8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: c2035fa1b9]
- [CLI] Rounded VRAM output on CLI, no diffrence in output
- [python API] Fixed initializing calls which reuse initializeRsmi()
calls - now we set a global reference to rocmsmi to use
throughout API calls (see error below)
Traceback (most recent call last):
File "/home/charpoag/rocmsmi_pythonapi.py", line 9, in <module>
rocm_smi.initializeRsmi()
File "/opt/rocm/libexec/rocm_smi/rocm_smi.py", line 3531, in initializeRsmi
ret_init = rocmsmi.rsmi_init(0)
NameError: name 'rocmsmi' is not defined
Change-Id: I0eff3b8a432abf6d4344a02b9f638e1191c51a19
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 90160a7c9c]
Checks returned error by get_gpu_pci_bandwith() before assert
Code changes related to the following:
* Unit tests
Change-Id: Ia0fe64f168711147c5e66c7917cf633be40dee9f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: 35b561fd69]
Checks and forces rereading gpu metrics unconditionally
Code changes related to the following:
* Device::dev_log_gpu_metrics()
* Examples
* Unit tests
Change-Id: Ic1c4f34a39f2bf197263f80ddbb84da26345807d
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: b4d37caa70]
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards
Code changes related to the following:
* 'rsmi_dev_metrics_' APIs
* Functional tests
* Examples
Change-Id: I7d562a95889361ee6f8f7588f8a790f42c8eb262
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: ce36198cb1]
Updated:
* [CLI] Fixed vram % - printf style formatting causes many data errors
This fix updates to the recommended way of outputting formatted data.
https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
* [API/CLI] Added gpu_id / GUID from kfd (rsmi_dev_guid_get)
-> CLI name: "GUID"
-> ROCm SMI calls: no arg, -i, --showhw, --showproduct
* [API/CLI] Added node_id from kfd (rsmi_dev_node_get)
-> CLI name: "Node"
-> ROCm SMI calls: no arg, --showhw, --showproduct
* [CLI] Added target gfx version from kfd
-> CLI name: "GFX Version" or "GFX VER"
-> ROCm SMI calls: --showhw, --showproduct
* [CLI] Base ROCm CLI
-> Removed - stacked id formatting:
This is to simplify identifiers helpful to users.
More identifiers can be found on -i --showhw, --showproduct
* [CLI] Update -i, --showhw, --showproduct, w/out arg
-> Card ID/DID/Model/SKU/VBIOS:
All unsupported values now display "N/A" instead
of "unknown" or "unsupported"
* [CLI] Showhw now expands data based on content
Change-Id: Ifb8586f9f545892b8a5aa7903608273cdd77e075
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 4b5ccb57f0]
On some systems [rocm-smi --showpids] reports
get_compute_process_info_by_pid, Not supported on the given system
[PID] [PROCESS NAME] 1 UNKNOWN UNKNOWN UNKNOWN
get_compute_process_info_by_pid fails because cu_occupancy debugfs method
is not provided on some graphics cards and GFX revisions by design
Proposing a change to return success status when only cu_occupancy debugfs method
is not found and provide cu_occupancy invalidation value to mark only
this parameter as UNKNOWN
Change-Id: Iae37070d9bd19483b4e6c8ee24c7d9a4c92f00d7
Signed-off-by: Vladimir Stempen <Vladimir.Stempen@amd.com>
Reviewed-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
[ROCm/rocm_smi_lib commit: 677433b367]
In addition to be able to set clock range, new setextremum option
is added to set only min/max clock as sometimes one of them may
not be supported.
Change-Id: I7c91ba308f3fc6c78efc88117509c515d403a6cb
[ROCm/rocm_smi_lib commit: 4e0a7f2f67]
Updates:
- [CLI] Switching to use generic rsmi_dev_power_get()
this is a backwards compatible function to
retrieve power values. More consistent than
previous fixes.
- [API] Update API for rsmi_dev_power_get()
Now provides @depricated for this function.
Providing notes on newer ASICS only support
current socket power, where as previous
ASICS only provided average power.
Change-Id: I34da0e925cf0b6c669bdd801b017f33f3b3ee86a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 51aec98edd]
Updates:
- [API] rsmi_dev_target_graphics_version_get, takes
reported value from KFD -> parses into human-readable
values. If device does not support, returns MAX UINT64
value and RSMI_STATUS_NOT_SUPPORTED.
Otherwise, puts into base10 format removing
extra 0's + putting in correct format. If user
provides nullptr, returning RSMI_STATUS_INVALID_ARGS.
- [Test/Example] sys_info_read updated to include
new rsmi_dev_target_graphics_version_get tests
Change-Id: I50f94e06b8733a5dec2eb08f284b44927f36abcd
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 5d2cd0c271]
The current code assume err_count sysfs only have 2 lines, which is
changed for umc_err_count by adding extra line for defer errors.
The code is changed to relax such check.
Change-Id: I1c469555a5d460d7bc4f4926245646c09c6a2056
[ROCm/rocm_smi_lib commit: 73c65b6bfe]
Change the python tool not to display above information if it is
not supported.
Change-Id: I48ffd95f07168219a629dfb391c1b4587308286d
[ROCm/rocm_smi_lib commit: 905c25e59b]
Apply the following changes to project documentation for ReadtheDocs:
add version number to documentation left navigation bar and page title
add an "About" section with a license page
enable htmlzip, pdf, epub formats when publishing on Read the Docs
set pdf title, author, copyright, and version
rename .sphinx/.doxygen to sphinx/doxygen
remove docBin from URL
update rocm-docs-core dependency
update dependabot config
Change-Id: Ife8c89a2e9323f436b3e54ef2a9e013c19b3b228
[ROCm/rocm_smi_lib commit: 67dc4b0f2a]
Adds support and implement APIs for 'gpu_metrics_v1_5'
Code changes related to the following:
* gpu metrics 1.5 support
* Unit tests
* Examples
Build changes related to the following: None
Change-Id: Ie8917dd63c1dd1a94467b100fa44b634cebe62b6
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: 373621aed3]
Received EACCES return for file that does not have
write access (read only). Permissions would be an
issue, but we check for sudo/root permissions early on.
Change-Id: I98615b02e4acccc59facb42225887a6b7273716b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: c6b0c93e6f]
Code changes related to the following:
* Check smallest copy size for multi-valued metrics
* Unit tests: gpu_metric_read
* ROCMSMI examples
Build changes related to the following:
* CMakeLists.txt
Change-Id: Ieb2363020fa21c93fbacd0edcc1d394eed183051
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/rocm_smi_lib commit: 8e0d3d5a39]
MCM die check was inconsistent (using avg power).
By using only the energy counter, this provides
a consistent way of checking which die is the MCM node.
Change-Id: I532fa2047706d0f1e92e643ce1e6759e45b65ec0
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 553d26ef3a]