830 Commits

Auteur SHA1 Bericht Datum
David Galiffi f7497646cc Add Doc team to CODEOWNERS file
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: Iad8eea0645b63bddb835ed22080facc7d25c1bc0


[ROCm/rocm_smi_lib commit: 020c7c3e3f]
2024-03-07 11:45:12 -05:00
Istvan Kiss f0c16aa8a0 Update documentation and add python API documentation
Change-Id: Ibccf5b6a5fba81cea42e04a022deac8a3207b9b8


[ROCm/rocm_smi_lib commit: 50a079af0f]
2024-03-06 22:01:30 -05:00
Charis Poag 078f678c30 Fix rocm_smi library calls
- [CLI] Rounded VRAM output on CLI, no diffrence in output
    - [python API] Fixed initializing calls which reuse initializeRsmi()
      calls - now we set a global reference to rocmsmi to use
      throughout API calls (see error below)

Traceback (most recent call last):
  File "/home/charpoag/rocmsmi_pythonapi.py", line 9, in <module>
    rocm_smi.initializeRsmi()
  File "/opt/rocm/libexec/rocm_smi/rocm_smi.py", line 3531, in initializeRsmi
    ret_init = rocmsmi.rsmi_init(0)
NameError: name 'rocmsmi' is not defined

Change-Id: I0eff3b8a432abf6d4344a02b9f638e1191c51a19
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 90160a7c9c]
2024-03-04 21:08:08 -06:00
Maisam Arif 66893cbfcc Merge amd-staging into amd-master 20240304
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I99e9de40a57539407ce06b1e7385830c662134e1


[ROCm/rocm_smi_lib commit: 3c64c32d99]
2024-03-04 15:28:58 -06:00
Oliveira, Daniel 729a26605b fix: [SWDEV-432974] [rocm/rocm_smi_lib]
Checks returned error by get_gpu_pci_bandwith() before assert

Code changes related to the following:
  * Unit tests

Change-Id: Ia0fe64f168711147c5e66c7917cf633be40dee9f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 35b561fd69]
2024-03-01 17:30:07 -06:00
Charis Poag 4db3a5b0cd Merge amd-staging into amd-master 20240226
Change-Id: I1d6db79aa35dabbfb4b837ffdb5dd63ff099cbd9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: a74062cc30]
2024-02-26 13:50:16 -06:00
Charis Poag 2147501b1a Merge amd-staging into amd-master 20240216
Change-Id: Id3e41507ab6143d08cb052710aa19c6f2e402fed
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 93ed5205f9]
2024-02-16 20:03:19 -06:00
Oliveira, Daniel b86b8e165a fix: [rocm/rocm_smi_lib] rsmi_dev_activity_metric_get gfx/memory activity does not update with GPU activity
Checks and forces rereading gpu metrics unconditionally

Code changes related to the following:
  * Device::dev_log_gpu_metrics()
  * Examples
  * Unit tests

Change-Id: Ic1c4f34a39f2bf197263f80ddbb84da26345807d
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: b4d37caa70]
2024-02-16 09:47:45 -06:00
Oliveira, Daniel ea66076ea9 fix: [rocm/rocm_smi_lib] header cleanup Remove non-unified headers
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards

Code changes related to the following:
  * 'rsmi_dev_metrics_' APIs
  * Functional tests
  * Examples

Change-Id: I7d562a95889361ee6f8f7588f8a790f42c8eb262
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: ce36198cb1]
2024-02-14 17:50:26 -06:00
Charis Poag 059fd6260e [SWDEV-423481/SWDEV-423393] Align all device identifier details
Updated:
 * [CLI] Fixed vram % - printf style formatting causes many data errors
   This fix updates to the recommended way of outputting formatted data.
   https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
 * [API/CLI] Added gpu_id / GUID from kfd (rsmi_dev_guid_get)
       -> CLI name: "GUID"
       -> ROCm SMI calls: no arg, -i, --showhw, --showproduct
 * [API/CLI] Added node_id from kfd (rsmi_dev_node_get)
       -> CLI name: "Node"
       -> ROCm SMI calls: no arg, --showhw, --showproduct
 * [CLI] Added target gfx version from kfd
       -> CLI name: "GFX Version" or "GFX VER"
       -> ROCm SMI calls: --showhw, --showproduct
 * [CLI] Base ROCm CLI
       -> Removed - stacked id formatting:
	   This is to simplify identifiers helpful to users.
	   More identifiers can be found on -i --showhw, --showproduct
 * [CLI] Update -i, --showhw, --showproduct, w/out arg
      -> Card ID/DID/Model/SKU/VBIOS:
            All unsupported values now display "N/A" instead
            of "unknown" or "unsupported"
 * [CLI] Showhw now expands data based on content

Change-Id: Ifb8586f9f545892b8a5aa7903608273cdd77e075
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 4b5ccb57f0]
2024-02-13 19:52:29 -05:00
Vladimir Stempen dc98babe34 Fix [Not supported] status for get_compute_process_info_by_pid
On some systems [rocm-smi --showpids] reports
get_compute_process_info_by_pid, Not supported on the given system
[PID] [PROCESS NAME] 1 UNKNOWN UNKNOWN UNKNOWN

get_compute_process_info_by_pid fails because cu_occupancy debugfs method
is not provided on some graphics cards and GFX revisions by design

Proposing a change to return success status when only cu_occupancy debugfs method
is not found and provide cu_occupancy invalidation value to mark only
this parameter as UNKNOWN

Change-Id: Iae37070d9bd19483b4e6c8ee24c7d9a4c92f00d7
Signed-off-by: Vladimir Stempen <Vladimir.Stempen@amd.com>
Reviewed-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 677433b367]
2024-02-13 18:17:47 -05:00
Galantsev, Dmitrii 3564c1a430 CMAKE - Default to lib instead of lib64
Change-Id: Ib21d41018b091d92c2ed408ff0c4d28e6a74c903
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: de9eaaac8c]
2024-02-12 20:16:28 -06:00
Galantsev, Dmitrii 2eccc6bc9f Merge amd-staging into amd-master 20240212
Change-Id: I662f2a470446550ba8c612aa1e5be911d7f7489f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: d03061823a]
2024-02-12 11:30:04 -06:00
Bill(Shuzhou) Liu 194f98b809 Support set min or max clock
In addition to be able to set clock range, new setextremum option
is added to set only min/max clock as sometimes one of them may
not be supported.

Change-Id: I7c91ba308f3fc6c78efc88117509c515d403a6cb


[ROCm/rocm_smi_lib commit: 4e0a7f2f67]
2024-02-09 09:24:26 -06:00
Galantsev, Dmitrii 9bc381ec40 Add lychee.toml for dead link checks
Use Lychee[1] to check dead links

[1] - https://github.com/lycheeverse/lychee

Change-Id: I741a2760283da8c21b95e5b516f78e39a9d9a0a1
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 1015cba489]
2024-02-08 18:18:39 -05:00
Charis Poag 998b883286 [SWDEV-437365] Fix --showpower
Updates:
  - [CLI] Switching to use generic rsmi_dev_power_get()
  this is a backwards compatible function to
  retrieve power values. More consistent than
  previous fixes.
  - [API] Update API for rsmi_dev_power_get()
  Now provides @depricated for this function.
  Providing notes on newer ASICS only support
  current socket power, where as previous
  ASICS only provided average power.

Change-Id: I34da0e925cf0b6c669bdd801b017f33f3b3ee86a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
(cherry picked from commit 2e3cc74c79)


[ROCm/rocm_smi_lib commit: c18ec624af]
2024-02-02 19:30:46 -05:00
Charis Poag 2e3cc74c79 [SWDEV-437365] Fix --showpower
Updates:
  - [CLI] Switching to use generic rsmi_dev_power_get()
  this is a backwards compatible function to
  retrieve power values. More consistent than
  previous fixes.
  - [API] Update API for rsmi_dev_power_get()
  Now provides @depricated for this function.
  Providing notes on newer ASICS only support
  current socket power, where as previous
  ASICS only provided average power.

Change-Id: I34da0e925cf0b6c669bdd801b017f33f3b3ee86a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 51aec98edd]
2024-02-02 00:00:38 -06:00
guanyu12 3a4f31f8a1 Merge amd-staging into amd-master 20240201
Signed-off-by: guanyu12 <guanyu12@amd.com>
Change-Id: I285bcd292990730ccad6b663ba6943211e6a5bba


[ROCm/rocm_smi_lib commit: 23b3376398]
2024-02-01 14:45:10 +08:00
Charis Poag 443d034d36 Add rsmi_dev_target_graphics_version_get
Updates:
   - [API] rsmi_dev_target_graphics_version_get, takes
     reported value from KFD -> parses into human-readable
     values. If device does not support, returns MAX UINT64
     value and RSMI_STATUS_NOT_SUPPORTED.
     Otherwise, puts into base10 format removing
     extra 0's + putting in correct format. If user
     provides nullptr, returning RSMI_STATUS_INVALID_ARGS.
    - [Test/Example] sys_info_read updated to include
     new rsmi_dev_target_graphics_version_get tests

Change-Id: I50f94e06b8733a5dec2eb08f284b44927f36abcd
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 5d2cd0c271]
2024-01-29 14:25:24 -06:00
Galantsev, Dmitrii bf5b13ffb1 Merge amd-staging into amd-master 20240124
Change-Id: I358fde8bed15c8b2a240a0be8cf5411e21238b08
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 9386d60522]
2024-01-24 16:26:34 -06:00
Bill(Shuzhou) Liu c1e745f25c UMC ECC count return not supported
The current code assume err_count sysfs only have 2 lines, which is
changed for umc_err_count by adding extra line for defer errors.
The code is changed to relax such check.

Change-Id: I1c469555a5d460d7bc4f4926245646c09c6a2056


[ROCm/rocm_smi_lib commit: 73c65b6bfe]
2024-01-24 08:31:24 -06:00
Bill(Shuzhou) Liu 3da1539121 Voltage clock display as 0 when overdrive and voltage not supported
Change the python tool not to display above information if it is
not supported.

Change-Id: I48ffd95f07168219a629dfb391c1b4587308286d


[ROCm/rocm_smi_lib commit: 905c25e59b]
2024-01-19 17:11:08 -05:00
guanyu12 0645e00358 Merge amd-staging into amd-master 20240118
Signed-off-by: guanyu12 <guanyu12@amd.com>
Change-Id: I22971ade4774319930cb0a9bced2e3c3d7e91265


[ROCm/rocm_smi_lib commit: 68ba8fd4ff]
2024-01-18 10:29:57 +08:00
Bill(Shuzhou) Liu 0566bbc47a Return NOT_SUPPORT for set function in VM guest
Fix the unit tests which are fail in VM guest environment.

Change-Id: Id7c58887692bbdecba54f5d2d8463b292e19b4ad


[ROCm/rocm_smi_lib commit: a0ec98c30d]
2024-01-17 11:18:25 -06:00
Galantsev, Dmitrii 673087e6a0 Remove word 'error' from non-error message
This simplifies grep lookup

Change-Id: I46cd13e0ab414791655fd93e8dcf270a946a6687
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 147af192b5]
2024-01-12 15:18:55 -06:00
Sam Wu c785a58e99 [ROCDOC-95] Standardize documentation for ReadtheDocs
Apply the following changes to project documentation for ReadtheDocs:

add version number to documentation left navigation bar and page title
add an "About" section with a license page
enable htmlzip, pdf, epub formats when publishing on Read the Docs
set pdf title, author, copyright, and version
rename .sphinx/.doxygen to sphinx/doxygen
remove docBin from URL
update rocm-docs-core dependency
update dependabot config

Change-Id: Ife8c89a2e9323f436b3e54ef2a9e013c19b3b228


[ROCm/rocm_smi_lib commit: 67dc4b0f2a]
2024-01-11 17:47:58 -05:00
guanyu12 0e6c78e91a Merge amd-staging into amd-master 20240111
Signed-off-by: guanyu12 <guanyu12@amd.com>
Change-Id: Ia13a1669d77e91446362e2e0c19e84496046c488


[ROCm/rocm_smi_lib commit: 770a177077]
2024-01-11 11:30:59 +08:00
Oliveira, Daniel c0335b2695 rocm_smi_lib: Fix gpu_metrics_v1_5 support
Adds support and implement APIs for 'gpu_metrics_v1_5'

Code changes related to the following:
  * gpu metrics 1.5 support
  * Unit tests
  * Examples

Build changes related to the following: None

Change-Id: Ie8917dd63c1dd1a94467b100fa44b634cebe62b6
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 373621aed3]
2024-01-05 14:24:34 -06:00
guanyu12 cc5cb51278 Merge amd-staging into amd-master 20231214
Signed-off-by: guanyu12 <guanyu12@amd.com>
Change-Id: Iebd82680b7ed56abf84ad71a92a267a90a488aa6


[ROCm/rocm_smi_lib commit: 4b17a34716]
2023-12-14 19:31:42 +08:00
Galantsev, Dmitrii 3c068722f0 SWDEV-436561 - Add CODEOWNERS
Change-Id: I4201a0fa76f61dd56c84d644bca049f9846b27fe
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 8615d096c3]
2023-12-12 11:18:23 -06:00
guanyu12 f7690cf57a Merge amd-staging into amd-master 20231207
Signed-off-by: guanyu12 <guanyu12@amd.com>
Change-Id: Ic67feea6e7d21338cc3bbd76220f03effec59cbf


[ROCm/rocm_smi_lib commit: 6793fda4ef]
2023-12-07 13:21:57 +08:00
Charis Poag 18fa660402 Memory partition permission denied fix
Received EACCES return for file that does not have
write access (read only). Permissions would be an
issue, but we check for sudo/root permissions early on.

Change-Id: I98615b02e4acccc59facb42225887a6b7273716b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: c6b0c93e6f]
2023-12-06 21:51:30 -05:00
Galantsev, Dmitrii f38b62abf5 TESTS - Temporarily disable overdrive tests
Change-Id: Ice06d31e874621abf3135548eedfe2158281891d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 48163b8d4f]
2023-12-06 19:33:17 -06:00
Galantsev, Dmitrii bb50cf42a2 TESTS - Fix overdrive error on not-supported
Change-Id: I47e7f499229b47b151f4ba4d5fa9c59ac04d6816
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 102c2c692a]
2023-12-06 02:43:04 -06:00
Galantsev, Dmitrii c00b9ee7df Merge amd-staging into amd-master 20231205
Change-Id: Ib8b0672f8993cfd995d567f582dd9b33d03ddac4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 1ae7164f20]
2023-12-05 10:28:15 -06:00
Oliveira, Daniel e2a833f347 rocm_smi_lib: Fix GPU Metrics Max Elements Read Exceeded
Code changes related to the following:
  * Check smallest copy size for multi-valued metrics
  * Unit tests: gpu_metric_read
  * ROCMSMI examples

Build changes related to the following:
  * CMakeLists.txt

Change-Id: Ieb2363020fa21c93fbacd0edcc1d394eed183051
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 8e0d3d5a39]
2023-12-04 17:01:08 -06:00
Galantsev, Dmitrii 7fc67c88ce Fix ASAN for tests and log metrics better
Change-Id: Ib495cfc28c48a4d291a89673a3b6fc13313845c7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: a128867497]
2023-11-30 15:39:05 -05:00
Galantsev, Dmitrii d734ec5aa6 Add linting via pre-commit and docker
Please see .pre-commit-config.yaml for details

- Add clang-format
- Add cpplint
- Add config for clang-tidy but don't enforce with pre-commit

Change-Id: Ica447c78e6fde94b43bfdc00f5b4efc338363e24
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 142fbac7ac]
2023-11-28 23:21:36 -05:00
Galantsev, Dmitrii fe2198b170 Merge amd-staging into amd-master 20231121
Change-Id: I400dfcdbf7fd1afcb020805342a4389038ce3917
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 0c5c46db6f]
2023-11-21 17:28:29 -06:00
Galantsev, Dmitrii a435423020 Bump version lib:7.0.0 tool:2.0.0+hash
Change-Id: I7f2fd5605a93d07f61b997a25e1fbcf2780ea5cb
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: e1c972a193]
2023-11-21 17:19:41 -06:00
Galantsev, Dmitrii a854fbe9f6 Add version hash
Change-Id: I6cf18b00a45ebd106f981e92681cab2ef25924e2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: d61aaf44e1]
2023-11-21 17:14:38 -06:00
Charis Poag b6ae7c5775 Fix CLI checks for secondary die
MCM die check was inconsistent (using avg power).
By using only the energy counter, this provides
a consistent way of checking which die is the MCM node.

Change-Id: I532fa2047706d0f1e92e643ce1e6759e45b65ec0
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 553d26ef3a]
2023-11-21 11:58:52 -05:00
Oliveira, Daniel 85670a59e6 rocm_smi_lib: Fix Refactoring gpu_metrics code
Uses new support for 'gpu_metrics_v1_4'

Code changes related to the following:
  * rsmi gpu_metrics APIs
  * rsmi gpu_metrics Logs
  * new data structure fields added in 1.4
  * added APIs for all other existing metrics before 1.4
  * added support to older metrics; 1.1, and 1.2
  * added support to dump_internal_metrics_table()
  * public APIs renamed to start with prefix 'rsmi_dev_metrics_'
  * Unit tests updated
  * Examples updated

Build changes related to the following: None

Change-Id: I23e59f99d3ed43318cd6bd43bd2f0c5387e9ccb9
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 713d259f88]
2023-11-20 19:36:47 -06:00
Galantsev, Dmitrii 7dd5624e1e Merge amd-staging into amd-master 20231116
This merge skips Ibdaf031be9d916020b4049544dbd725858c7711d as that
change introduces a bug in gpu-metrics

Change-Id: Ied8447affd5ed3c847734d75517b04c073dc44b4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 9d456edcd6]
2023-11-16 11:18:33 -06:00
Oliveira, Daniel 83589929db rocm_smi_lib: Fix Refactoring gpu_metrics code
Uses new support for 'gpu_metrics_v1_4'

Code changes related to the following:
  * rsmi gpu_metrics APIs
  * rsmi gpu_metrics Logs
  * new data structure fields added in 1.4
  * added APIs for all other existing metrics before 1.4
  * added support to older metrics; 1.1, and 1.2
  * public APIs renamed to start with prefix 'rsmi_dev_metrics_'
  * Unit tests updated
  * Examples updated

Build changes related to the following: None

Change-Id: Ibdaf031be9d916020b4049544dbd725858c7711d
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 2c8ba4cae9]
2023-11-10 19:05:09 -06:00
Bill(Shuzhou) Liu b34e62d832 Sort GPU index using BDF
Sort GPU index based on BDF. Also add an API to get the XGMI
physical id.

Change-Id: I998876e435165c59d450ecd0b979315278b488a5


[ROCm/rocm_smi_lib commit: e5627d2bf1]
2023-11-06 20:51:25 -06:00
Galantsev, Dmitrii 92e2857be9 Fix issues introduced in e89751e202
- std=c++.. is not required because CMAKE_CXX_STANDARD is set
- nullptr check breaks the test because we rely on nullptr as an api for
  checking feature availability.
- enum number setting is unnecessary

Change-Id: I393e6dd3f292b7fa4198302f140c0443ba5e50f5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: a099f0682a]
2023-11-03 17:54:35 -05:00
Galantsev, Dmitrii 3126d1461c CMake - Bump version
Change-Id: Ibe62c0059262bcb9937ae856b796392b1fe362a0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 7d629c5959]
2023-11-02 18:26:00 -05:00
Galantsev, Dmitrii 3ac6b36a60 Merge amd-staging into amd-master 20231102
Change-Id: I7d1901564af875f2f9aa8879f24bff098ea30600
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 8aa036ae08]
2023-11-02 18:24:10 -05:00
Charis Poag 521bd38bbd Fix GPU Metric content revision check
Change-Id: I94ff4732be01214591b635357d9a62eb7d5192a0
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: b49e82a4f4]
2023-10-31 17:42:02 -05:00