Commit grafiek

517 Commits

Auteur SHA1 Bericht Datum
Istvan Kiss f0c16aa8a0 Update documentation and add python API documentation
Change-Id: Ibccf5b6a5fba81cea42e04a022deac8a3207b9b8


[ROCm/rocm_smi_lib commit: 50a079af0f]
2024-03-06 22:01:30 -05:00
Charis Poag 078f678c30 Fix rocm_smi library calls
- [CLI] Rounded VRAM output on CLI, no diffrence in output
    - [python API] Fixed initializing calls which reuse initializeRsmi()
      calls - now we set a global reference to rocmsmi to use
      throughout API calls (see error below)

Traceback (most recent call last):
  File "/home/charpoag/rocmsmi_pythonapi.py", line 9, in <module>
    rocm_smi.initializeRsmi()
  File "/opt/rocm/libexec/rocm_smi/rocm_smi.py", line 3531, in initializeRsmi
    ret_init = rocmsmi.rsmi_init(0)
NameError: name 'rocmsmi' is not defined

Change-Id: I0eff3b8a432abf6d4344a02b9f638e1191c51a19
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 90160a7c9c]
2024-03-04 21:08:08 -06:00
Oliveira, Daniel 729a26605b fix: [SWDEV-432974] [rocm/rocm_smi_lib]
Checks returned error by get_gpu_pci_bandwith() before assert

Code changes related to the following:
  * Unit tests

Change-Id: Ia0fe64f168711147c5e66c7917cf633be40dee9f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 35b561fd69]
2024-03-01 17:30:07 -06:00
Oliveira, Daniel b86b8e165a fix: [rocm/rocm_smi_lib] rsmi_dev_activity_metric_get gfx/memory activity does not update with GPU activity
Checks and forces rereading gpu metrics unconditionally

Code changes related to the following:
  * Device::dev_log_gpu_metrics()
  * Examples
  * Unit tests

Change-Id: Ic1c4f34a39f2bf197263f80ddbb84da26345807d
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: b4d37caa70]
2024-02-16 09:47:45 -06:00
Oliveira, Daniel ea66076ea9 fix: [rocm/rocm_smi_lib] header cleanup Remove non-unified headers
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards

Code changes related to the following:
  * 'rsmi_dev_metrics_' APIs
  * Functional tests
  * Examples

Change-Id: I7d562a95889361ee6f8f7588f8a790f42c8eb262
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: ce36198cb1]
2024-02-14 17:50:26 -06:00
Charis Poag 059fd6260e [SWDEV-423481/SWDEV-423393] Align all device identifier details
Updated:
 * [CLI] Fixed vram % - printf style formatting causes many data errors
   This fix updates to the recommended way of outputting formatted data.
   https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
 * [API/CLI] Added gpu_id / GUID from kfd (rsmi_dev_guid_get)
       -> CLI name: "GUID"
       -> ROCm SMI calls: no arg, -i, --showhw, --showproduct
 * [API/CLI] Added node_id from kfd (rsmi_dev_node_get)
       -> CLI name: "Node"
       -> ROCm SMI calls: no arg, --showhw, --showproduct
 * [CLI] Added target gfx version from kfd
       -> CLI name: "GFX Version" or "GFX VER"
       -> ROCm SMI calls: --showhw, --showproduct
 * [CLI] Base ROCm CLI
       -> Removed - stacked id formatting:
	   This is to simplify identifiers helpful to users.
	   More identifiers can be found on -i --showhw, --showproduct
 * [CLI] Update -i, --showhw, --showproduct, w/out arg
      -> Card ID/DID/Model/SKU/VBIOS:
            All unsupported values now display "N/A" instead
            of "unknown" or "unsupported"
 * [CLI] Showhw now expands data based on content

Change-Id: Ifb8586f9f545892b8a5aa7903608273cdd77e075
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 4b5ccb57f0]
2024-02-13 19:52:29 -05:00
Vladimir Stempen dc98babe34 Fix [Not supported] status for get_compute_process_info_by_pid
On some systems [rocm-smi --showpids] reports
get_compute_process_info_by_pid, Not supported on the given system
[PID] [PROCESS NAME] 1 UNKNOWN UNKNOWN UNKNOWN

get_compute_process_info_by_pid fails because cu_occupancy debugfs method
is not provided on some graphics cards and GFX revisions by design

Proposing a change to return success status when only cu_occupancy debugfs method
is not found and provide cu_occupancy invalidation value to mark only
this parameter as UNKNOWN

Change-Id: Iae37070d9bd19483b4e6c8ee24c7d9a4c92f00d7
Signed-off-by: Vladimir Stempen <Vladimir.Stempen@amd.com>
Reviewed-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 677433b367]
2024-02-13 18:17:47 -05:00
Galantsev, Dmitrii 3564c1a430 CMAKE - Default to lib instead of lib64
Change-Id: Ib21d41018b091d92c2ed408ff0c4d28e6a74c903
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: de9eaaac8c]
2024-02-12 20:16:28 -06:00
Bill(Shuzhou) Liu 194f98b809 Support set min or max clock
In addition to be able to set clock range, new setextremum option
is added to set only min/max clock as sometimes one of them may
not be supported.

Change-Id: I7c91ba308f3fc6c78efc88117509c515d403a6cb


[ROCm/rocm_smi_lib commit: 4e0a7f2f67]
2024-02-09 09:24:26 -06:00
Galantsev, Dmitrii 9bc381ec40 Add lychee.toml for dead link checks
Use Lychee[1] to check dead links

[1] - https://github.com/lycheeverse/lychee

Change-Id: I741a2760283da8c21b95e5b516f78e39a9d9a0a1
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 1015cba489]
2024-02-08 18:18:39 -05:00
Charis Poag 2e3cc74c79 [SWDEV-437365] Fix --showpower
Updates:
  - [CLI] Switching to use generic rsmi_dev_power_get()
  this is a backwards compatible function to
  retrieve power values. More consistent than
  previous fixes.
  - [API] Update API for rsmi_dev_power_get()
  Now provides @depricated for this function.
  Providing notes on newer ASICS only support
  current socket power, where as previous
  ASICS only provided average power.

Change-Id: I34da0e925cf0b6c669bdd801b017f33f3b3ee86a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 51aec98edd]
2024-02-02 00:00:38 -06:00
Charis Poag 443d034d36 Add rsmi_dev_target_graphics_version_get
Updates:
   - [API] rsmi_dev_target_graphics_version_get, takes
     reported value from KFD -> parses into human-readable
     values. If device does not support, returns MAX UINT64
     value and RSMI_STATUS_NOT_SUPPORTED.
     Otherwise, puts into base10 format removing
     extra 0's + putting in correct format. If user
     provides nullptr, returning RSMI_STATUS_INVALID_ARGS.
    - [Test/Example] sys_info_read updated to include
     new rsmi_dev_target_graphics_version_get tests

Change-Id: I50f94e06b8733a5dec2eb08f284b44927f36abcd
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 5d2cd0c271]
2024-01-29 14:25:24 -06:00
Bill(Shuzhou) Liu c1e745f25c UMC ECC count return not supported
The current code assume err_count sysfs only have 2 lines, which is
changed for umc_err_count by adding extra line for defer errors.
The code is changed to relax such check.

Change-Id: I1c469555a5d460d7bc4f4926245646c09c6a2056


[ROCm/rocm_smi_lib commit: 73c65b6bfe]
2024-01-24 08:31:24 -06:00
Bill(Shuzhou) Liu 3da1539121 Voltage clock display as 0 when overdrive and voltage not supported
Change the python tool not to display above information if it is
not supported.

Change-Id: I48ffd95f07168219a629dfb391c1b4587308286d


[ROCm/rocm_smi_lib commit: 905c25e59b]
2024-01-19 17:11:08 -05:00
Bill(Shuzhou) Liu 0566bbc47a Return NOT_SUPPORT for set function in VM guest
Fix the unit tests which are fail in VM guest environment.

Change-Id: Id7c58887692bbdecba54f5d2d8463b292e19b4ad


[ROCm/rocm_smi_lib commit: a0ec98c30d]
2024-01-17 11:18:25 -06:00
Galantsev, Dmitrii 673087e6a0 Remove word 'error' from non-error message
This simplifies grep lookup

Change-Id: I46cd13e0ab414791655fd93e8dcf270a946a6687
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 147af192b5]
2024-01-12 15:18:55 -06:00
Sam Wu c785a58e99 [ROCDOC-95] Standardize documentation for ReadtheDocs
Apply the following changes to project documentation for ReadtheDocs:

add version number to documentation left navigation bar and page title
add an "About" section with a license page
enable htmlzip, pdf, epub formats when publishing on Read the Docs
set pdf title, author, copyright, and version
rename .sphinx/.doxygen to sphinx/doxygen
remove docBin from URL
update rocm-docs-core dependency
update dependabot config

Change-Id: Ife8c89a2e9323f436b3e54ef2a9e013c19b3b228


[ROCm/rocm_smi_lib commit: 67dc4b0f2a]
2024-01-11 17:47:58 -05:00
Oliveira, Daniel c0335b2695 rocm_smi_lib: Fix gpu_metrics_v1_5 support
Adds support and implement APIs for 'gpu_metrics_v1_5'

Code changes related to the following:
  * gpu metrics 1.5 support
  * Unit tests
  * Examples

Build changes related to the following: None

Change-Id: Ie8917dd63c1dd1a94467b100fa44b634cebe62b6
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 373621aed3]
2024-01-05 14:24:34 -06:00
Galantsev, Dmitrii 3c068722f0 SWDEV-436561 - Add CODEOWNERS
Change-Id: I4201a0fa76f61dd56c84d644bca049f9846b27fe
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 8615d096c3]
2023-12-12 11:18:23 -06:00
Charis Poag 18fa660402 Memory partition permission denied fix
Received EACCES return for file that does not have
write access (read only). Permissions would be an
issue, but we check for sudo/root permissions early on.

Change-Id: I98615b02e4acccc59facb42225887a6b7273716b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: c6b0c93e6f]
2023-12-06 21:51:30 -05:00
Galantsev, Dmitrii f38b62abf5 TESTS - Temporarily disable overdrive tests
Change-Id: Ice06d31e874621abf3135548eedfe2158281891d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 48163b8d4f]
2023-12-06 19:33:17 -06:00
Galantsev, Dmitrii bb50cf42a2 TESTS - Fix overdrive error on not-supported
Change-Id: I47e7f499229b47b151f4ba4d5fa9c59ac04d6816
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 102c2c692a]
2023-12-06 02:43:04 -06:00
Oliveira, Daniel e2a833f347 rocm_smi_lib: Fix GPU Metrics Max Elements Read Exceeded
Code changes related to the following:
  * Check smallest copy size for multi-valued metrics
  * Unit tests: gpu_metric_read
  * ROCMSMI examples

Build changes related to the following:
  * CMakeLists.txt

Change-Id: Ieb2363020fa21c93fbacd0edcc1d394eed183051
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 8e0d3d5a39]
2023-12-04 17:01:08 -06:00
Galantsev, Dmitrii 7fc67c88ce Fix ASAN for tests and log metrics better
Change-Id: Ib495cfc28c48a4d291a89673a3b6fc13313845c7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: a128867497]
2023-11-30 15:39:05 -05:00
Galantsev, Dmitrii d734ec5aa6 Add linting via pre-commit and docker
Please see .pre-commit-config.yaml for details

- Add clang-format
- Add cpplint
- Add config for clang-tidy but don't enforce with pre-commit

Change-Id: Ica447c78e6fde94b43bfdc00f5b4efc338363e24
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 142fbac7ac]
2023-11-28 23:21:36 -05:00
Galantsev, Dmitrii a435423020 Bump version lib:7.0.0 tool:2.0.0+hash
Change-Id: I7f2fd5605a93d07f61b997a25e1fbcf2780ea5cb
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: e1c972a193]
2023-11-21 17:19:41 -06:00
Galantsev, Dmitrii a854fbe9f6 Add version hash
Change-Id: I6cf18b00a45ebd106f981e92681cab2ef25924e2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: d61aaf44e1]
2023-11-21 17:14:38 -06:00
Charis Poag b6ae7c5775 Fix CLI checks for secondary die
MCM die check was inconsistent (using avg power).
By using only the energy counter, this provides
a consistent way of checking which die is the MCM node.

Change-Id: I532fa2047706d0f1e92e643ce1e6759e45b65ec0
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 553d26ef3a]
2023-11-21 11:58:52 -05:00
Oliveira, Daniel 85670a59e6 rocm_smi_lib: Fix Refactoring gpu_metrics code
Uses new support for 'gpu_metrics_v1_4'

Code changes related to the following:
  * rsmi gpu_metrics APIs
  * rsmi gpu_metrics Logs
  * new data structure fields added in 1.4
  * added APIs for all other existing metrics before 1.4
  * added support to older metrics; 1.1, and 1.2
  * added support to dump_internal_metrics_table()
  * public APIs renamed to start with prefix 'rsmi_dev_metrics_'
  * Unit tests updated
  * Examples updated

Build changes related to the following: None

Change-Id: I23e59f99d3ed43318cd6bd43bd2f0c5387e9ccb9
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 713d259f88]
2023-11-20 19:36:47 -06:00
Oliveira, Daniel 83589929db rocm_smi_lib: Fix Refactoring gpu_metrics code
Uses new support for 'gpu_metrics_v1_4'

Code changes related to the following:
  * rsmi gpu_metrics APIs
  * rsmi gpu_metrics Logs
  * new data structure fields added in 1.4
  * added APIs for all other existing metrics before 1.4
  * added support to older metrics; 1.1, and 1.2
  * public APIs renamed to start with prefix 'rsmi_dev_metrics_'
  * Unit tests updated
  * Examples updated

Build changes related to the following: None

Change-Id: Ibdaf031be9d916020b4049544dbd725858c7711d
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 2c8ba4cae9]
2023-11-10 19:05:09 -06:00
Bill(Shuzhou) Liu b34e62d832 Sort GPU index using BDF
Sort GPU index based on BDF. Also add an API to get the XGMI
physical id.

Change-Id: I998876e435165c59d450ecd0b979315278b488a5


[ROCm/rocm_smi_lib commit: e5627d2bf1]
2023-11-06 20:51:25 -06:00
Galantsev, Dmitrii 92e2857be9 Fix issues introduced in e89751e202
- std=c++.. is not required because CMAKE_CXX_STANDARD is set
- nullptr check breaks the test because we rely on nullptr as an api for
  checking feature availability.
- enum number setting is unnecessary

Change-Id: I393e6dd3f292b7fa4198302f140c0443ba5e50f5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: a099f0682a]
2023-11-03 17:54:35 -05:00
Galantsev, Dmitrii 3126d1461c CMake - Bump version
Change-Id: Ibe62c0059262bcb9937ae856b796392b1fe362a0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 7d629c5959]
2023-11-02 18:26:00 -05:00
Charis Poag 521bd38bbd Fix GPU Metric content revision check
Change-Id: I94ff4732be01214591b635357d9a62eb7d5192a0
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: b49e82a4f4]
2023-10-31 17:42:02 -05:00
Bill(Shuzhou) Liu fee0c807ce Query the CPU and GPU link type
The rsmi_topo_get_link_type() is extended to support query the CPU
and GPU link type by passing dv_ind_dst as 0xFFFFFFFF.

Change-Id: I1f212a01e8120adb70a08ab772fa9faaaecefa29


[ROCm/rocm_smi_lib commit: de5bc164de]
2023-10-31 10:17:24 -04:00
Charis Poag e89751e202 Partition EBUSY with RSMI_STATUS_BUSY & invalid GPU Metrics check
* Updates:
   - [API/CLI] rsmi_dev_*_partition_set &
     rsmi_dev_*_partition_reset - exposed RSMI_STATUS_BUSY for
     EBUSY writes + cleaned up accidental map insertions
     (maplookup[] can insert values that are not in the map,
     map.at(key) fixes this potential issue)
   - [API] rsmi_dev_gpu_metrics_info_get() - returns
     RSMI_STATUS_NOT_SUPPORTED for unsupported metric tables
     outside of 1v1/1v2/1v3
   - [API] writeDevInfoStr() - exposes RSMI_STATUS_BUSY for
     EBUSY write errors; kept backward compatibility
     for other writes which do not care about these states
   - [API] rsmi_dev_od_volt_info_get()
      & rsmi_dev_od_volt_curve_regions_get() have better logging
     + Expose more details on why they are erroring
   - [Utils/logs/example] Expose AMD GPU gfx target version to aid in
     system troubleshooting
   - [Utils] Added test methods that look at od volt
     freq & regions into here - for easier access across
     several tests
   - [Utils] Updated getRSMIStatusString(new argument - fullstatus;
     default to true for backwards compatibility)
     -> true shows shortened RSMI STATUS response
   - [Utils] Added splitString to cut out noisy return responses
     (used in getRSMIStatusString(), when fullstatus = true)
   - [Utils] Added getFileCreationDate() to expose build date
     of the library - helpful for local builds or experimental builds
   - [Utils] Macro cleanup
   - [Example] Added a few gpu_metric checks - helpful for upcoming
     updates
   - [Device] SYSFS/DebugFS - now have better r/w displayed in logs
   - [LOGS] Expose library build date - see above for details
   - [Tests] Add more warnings/errors to test builds
   - [Tests] Moved up Partition tests for ordered test runs - helped
     identify issues with GPU BUSY writes
   - [Tests] compute_partition_read_write - handles RSMI_STATUS_BUSY
     with waits for busy status found & cleaned up how we checked
     for partition changes - with RSMI responses exposed more clearly
   - [Tests] perf_determinism - multi gpu now properly runs through
     with full resets as needed
   - [Tests] volt_freq_curv_read - better error handling with more
     verbose output

Change-Id: Ie94c6abb6a9aab95c345996d3ad3843cf6734977
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 57b6135e54]
2023-10-27 14:52:02 -04:00
Étienne Mollier e142177077 CMake - Set rocm_smi64 soversion to 1
Upstream soversion is at 5 for a while, but Debian's soversion has been set to
 1 in the beginning of the rocm-smi-lib package.  This is probably erroneous,
 and the library should probably be better off being synchronized with upstream
 so there is some kind of ABI compatibility between the two distributions.
 .
 FIXME: please use upstream soversion next time an ABI breakage justifies an
 SOVERSION bump, instead of just incrementing the present version by one.
Author: Étienne Mollier <emollier@debian.org>
Forwarded: not-needed
Last-Update: 2023-09-17

Change-Id: I6c4d28bd26889359c0b83c474d5ae58a81741cf4
Co-authored-by: Étienne Mollier <emollier@debian.org>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 1775ae4b8d]
2023-10-23 16:41:26 -05:00
Étienne Mollier dcaf4a292e CMake - do not enforce -fPIE.
When built with LTO enabled, the linking of liboam.so chokes on the
following error, which is somewhat similar to the Debian bug #1030876
affecting PA-RISC, although the symptoms subtly differs in that it
suggests to build using -fPIC:

	/usr/bin/ld: /tmp/cc0wF8Kx.ltrans0.ltrans.o: relocation R_X86_64_PC32 against symbol `_ZTVSt9exception@@GLIBCXX_3.4' can not be used when making a shared object; recompile with -fPIC

The -fPIC argument is passed appropriately down to the build command,
however it looks to be erased by the late introduction of -fPIE flag
by upstream build system.  Erasing this flag allows the build to go
through, both with LTO and on PA-RISC.

Bug: https://github.com/RadeonOpenCompute/rocm_smi_lib/issues/111
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1015653
Change-Id: I8b35fd4b62cfa1a9ddb145362464df5dd276e2f5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: c4c19e7917]
2023-10-23 16:37:37 -05:00
Galantsev, Dmitrii 86088ab63d CMake - Prevent failure to build on non-amd64 targets
Change-Id: Ifaa59fb672ea01c07cffea6cd2429bec15a5deaf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Co-authored-by: Étienne Mollier <emollier@debian.org>
Change-Id: Ia691ab1db0061f04662e10e112da4b9ef06c4256


[ROCm/rocm_smi_lib commit: 1cf05dd9c7]
2023-10-23 16:36:17 -05:00
Galantsev, Dmitrii 0152763a39 README - Clean-up cli readme
Change-Id: I665cc5a48a240f0d2289439a4877c9f667b19851
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 275108f5b9]
2023-10-23 13:17:04 -05:00
Maxime Chambonnet 24f5ea66e1 Updated README.md with standard Markdown tables, cleaned a bit header levels.
Change-Id: Ibd6e382413d7667a5a823ac69620a2cfb7046bc5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 8cfcb51550]
2023-10-23 13:11:18 -05:00
Sam Wu 6dfdffe5a9 Update rocm-docs
Change-Id: I30633c9cd29bc58b0c48086d5f493204f3d6ffd8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 1de63ce506]
2023-10-18 14:09:26 -05:00
Charis Poag 73d4fbf53d bdfid fix for partition & xgmi nodes
* Updates:
    - [API] After discovering all amd gpus, we now properly
      map correct bdf (xgmi nodes). Especially important for
      partition changes - aka secondary nodes.
    - [API] While adding new secondary nodes we now have
      better grouping -> due to resorting based on
      kfd properties list & matching to primary uniqueid
    - [API] All secondary nodes are now AddToDeviceList
      with correct bdf (location id), provided by kfd
    - [API] Modified AddToDeviceList(..., uint64_t bdfid):
      providing an optional field - bdfid. This allows working
      around primary pcie cards with xgmi nodes
    - [API] Utils - cpplint minor fixes
    - [Example] Removed all endl references w/ newline, fixed
      spacing, and some incorrect values displaying as hex
      (needed dec representation)
    - [API] kfd node functions - now print full path of file
      for trace logs
    - [Tests] power_read.cc: Added in generic power test to
      confirm guaranteeing specific return values

Change-Id: I143474e8d64c4915a966e789be6bcea4fa7f4472
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 6f1afd2678]
2023-10-13 20:14:39 -05:00
Galantsev, Dmitrii 2e5f5fd51a TESTS - Skip XGMI test
Change-Id: Idd9f505f36fac4a670e5129f835aa051b5c4c9fa
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 2a7589a065]
2023-10-12 21:27:55 -05:00
Galantsev, Dmitrii 02c4b477d1 Fix rocm_smi.cc
Change-Id: Ib074dd542d8d37a6a618e10bd3bd389ad0cef108
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 3f0071599d]
2023-10-11 11:46:49 -05:00
Charis Poag 3ea6946b31 Add rsmi_dev_power_get
* Updates:
  - [API] Added rsmi_dev_power_get(uint32_t dv_ind,
                                   uint64_t *power,
                                   RSMI_POWER_TYPE
                                   *type)
          provides generic get to average or
          current power & provides backwards
          compatibility
  - Added a utility function to get MonitorTypes
    (monitor_type_string(type)) &
    RSMI_POWER_TYPE (power_type_string(type))
    strings
  - [Tests] Added rsmi_dev_power_get tests and
    provided better verification of return values for
    all power APIs
  - [Tests] Updated power outputs to show correct
    units
  - [example] Now uses avg, current, and generic
    power functions with type output response

Change-Id: I5ca06ca37fd5f61e100f2835b664d6cdd1ca42e6
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 31a1fcce7d]
2023-10-10 00:34:19 -05:00
Oliveira, Daniel 5e444f87ad rocm_smi_lib: Fix Modernize and refactor gpu_metrics
Adds support for 'gpu_metrics_v1_4' and new counters

Code changes related to the following:
  * rsmi gpu_metrics APIs
  * rsmi gpu_metrics Logs
  * The new gpu_metrics are now part of the Device

Build changes related to the following: None

Change-Id: Ie748e977cd0a01c6a2fb82260014c0699605dbb3
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 4e4ebde640]
2023-10-09 21:43:22 -05:00
Charis Poag d57d65a607 Rename NPS -> memory partition + compute partition node fix
* Updates:
        - rocm_smi_lib + CLI:
          Rename all "NPS mode" -> "memory partition"
          related files/functions/API/CLI to align with correct
          technical naming
        - rocm_smi_main: fixed identifying primary card's unique id
          utilize rsmi_dev_unique_id_get to map which
          KFD nodes belong to it
        - rsmi_dev_*_partition*: now have better logging output
        - compute partition tests:
          Added 20 sec delay for workaround until GPU
          busy is confirmed as the issue
        - CPPLint fixes/formatting
        - [Example] Moved all endl to "\n" for efficiency
        - [Example] Added Edge & Junction temperature examples
        - [Example] Added rsmi_minmax_bandwidth_get() example - WIP

Change-Id: Ida6db6fda7e0ac9d696a34cb15b4746e69d58d51
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: b251bb0c9f]
2023-10-06 11:51:09 -04:00
Galantsev, Dmitrii fce4f5fa08 Update package version
Change-Id: Ie094f75d028a09f862729094815f8a2b6ea8ad78
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 8244a677db]
2023-10-05 12:49:11 -05:00
Galantsev, Dmitrii 6e7555c5a3 TESTS - Don't fail on TestFrequenciesRead
- Return from freq_output function early if clock is unsupported
- Right-align frequencies

Change-Id: I799c9351dac8a5be161bc9243cd3816539728357
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: e962d3b281]
2023-10-04 18:24:56 -05:00