Граф коммитов

464 Коммитов

Автор SHA1 Сообщение Дата
Bill(Shuzhou) Liu 20975db2be Do not print the library name if in default folder
The rocm-smi python tool will not print the library name on default
folder.

Change-Id: I203a872ebe2fc994766a2628049ca50c8bfa7120


[ROCm/rocm_smi_lib commit: 016dbf8aa3]
2023-09-27 12:14:33 -04:00
Galantsev, Dmitrii 9d07110891 Fix out-of-bounds array access for --showvc
get_od_clk_volt_info assumed the size of the file instead of checking
the length. This caused out-of-bounds array element access.

Change-Id: Ibda8f0c3a6d1623d48964641ae5ef610d2072e94
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 8eb9f892d3]
2023-09-26 13:59:37 -05:00
Charis Poag fd5066437b Add Current (Instant) Socket Power
* Updates:
    - rocm_smi_logger:
      General cleanup &
      Aligned to cpplint rules for usage
    - rocm_smi_monitor:
      Fixed MonitorTypes
      from not displaying properly in logs
      & Added socket power label + current
      socket power MonitorTypes
    - rocm_smi API:
      Added rsmi_dev_current_socket_power_get API
    - rocm_smi CLI:
      General cleanup,
      Concise info now displays device data
      in variable width (see printLogSpacer's
      new field),
      printLogSpacer now as an adjustable
      variable that overrides appWidth,
      Added Socket Power to base rocm-smi +
      --showpower CLI calls,
      --showpower & base rocm-smi CLI defaults
      to printing socket power (if not available,
      displays average power)
    - Cleaned up temp label references
    - power_read gtests:
      Added current socket power to testing

Change-Id: Ica57e6f98ad96e2584e7c7955e188f68d2dab89d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: f078375350]
2023-09-25 01:38:54 -04:00
Galantsev, Dmitrii 80c47e3c09 SWDEV-422836 - Add sleep frequency support
Change-Id: I0bde403b010bf036ce44ed0600cc7eb03742c6b6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 3d40c4bb2c]
2023-09-25 01:38:27 -04:00
Ori Messinger 9eaad9eaea ROCm SMI LIB: Add Missing Firmware Blocks
The purpose of this patch is to add the following missing firmware
blocks to the SMI LIB:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5d4d37d883878dd02ef8533d4eb8891d54d70630


[ROCm/rocm_smi_lib commit: d44a6ef523]
2023-09-25 01:37:38 -04:00
Bill(Shuzhou) Liu 93557c6e4e Change the python tool id output label
Change the label from GPU to Device as we call rsmi_dev_id_get().

Change-Id: I8ffe3673d434e5291ebd5cc909afb7d18154ecb6


[ROCm/rocm_smi_lib commit: 2247c4b46c]
2023-09-25 01:31:04 -04:00
Bill(Shuzhou) Liu ce2ca09d2c Handle the memory frequency with only one line
Change the code to handle the memory frequency if it is only one line.

Change-Id: I09e6ee78a2b9c12c861243dc89296e4e7862da49


[ROCm/rocm_smi_lib commit: 85df5676d4]
2023-09-25 01:30:56 -04:00
Galantsev, Dmitrii 164efd81af SWDEV-423672 - Always compile and install gtest
This commit makes sure GTest is always compiled with rocm_smi_lib_tests.

GTest installation was inconsistent outside of AMD CI environment.
libgtest.so wouldn't get installed with rocm_smi_lib_tests if gtest
existed on the build machine. Which is undesirable when packaging.

Change-Id: I607df6c67c81480e3b6487b28f14924e8bf56ad4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 0c662611e9]
2023-09-23 21:10:12 -04:00
Galantsev, Dmitrii ff072106c2 PY: Remove f-strings from rocm_smi.py
Change-Id: I0a422e8f66473af837460ecb2450e5be329163b0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 1683245ecf]
2023-09-22 19:15:59 -05:00
Oliveira, Daniel 1bf68ad1c9 rocm_smi_lib: Fix [linux BM] [AMDSMI] Memory Bandwidth
Implements APIs for 'gpu_metrics_v1_3' utilization averages

Code changes related to the following:
  * rsmi_dev_activity_metric_get()
  * rsmi_dev_activity_avg_mm_get()
  * CLI shows "Avg.Memory Bandwidth" under "--showmemuse"

Change-Id: I8e4600f350a7c18499abf022534db2b875f09d5f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: e0483f2ee2]
2023-09-21 11:00:29 -04:00
Galantsev, Dmitrii 3fceeef3f0 PACKAGE - Fix packaging
Allow for configureLogrotate to fail without failing configure

In previous commit I forgot to invert the check when switching
"IS_SYSTEMD" and "!IS_SYSTEMD" if-else statements.

Change-Id: I8eb8e7981c6353a2e60064eb3a6e35821ea2a0d0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: b99867eb80]
2023-09-20 10:37:35 -05:00
Galantsev, Dmitrii 73ec4e32e6 PACKAGE - Cleanup packaging
- Clean-up packaging scripts. More consistent with RDC.
- Remove all 'sudo' calls. all these scripts are to be ran by root.
- Reduce scope of variables.
- Remove unnecessary functions

Change-Id: Ib90f8e66ef4eae24f73e940fff44f515e12233f5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 431a7071a0]
2023-09-20 01:07:51 -04:00
Sam Wu a246645060 fix toc to point to correct doxysphinx output path
update doc requirements; rocm-docs-core to 0.24.1

Change-Id: I78257d476a8bc47fd1a4ee03aa3db1a430ed116f


[ROCm/rocm_smi_lib commit: 7b32ea614b]
2023-09-18 09:07:01 -06:00
Galantsev, Dmitrii 3397cadf11 rocm_smi.py: Fix pipe into head error
When piping rocm_smi into 'head' it failed with "Broken pipe" error. The
error can be safely ignored. head closes the pipe early which causes
calls a SIGPIPE signal to be raised.

https://docs.python.org/3/library/signal.html#note-on-sigpipe

Change-Id: I4a589c6ed9a8c5b50de84b33e28115c6b510045f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 094c98a74f]
2023-09-15 16:37:10 -04:00
Galantsev, Dmitrii 11679b8e05 rsmiBindings.py - Add initRsmiBindings()
Library path was printed at all times even with --json flag.
This commit adds a mandatory initRsmiBindings function which is a core
component of the rsmiBindings.py library.

It **MUST** be called on import.

Change-Id: Ic6ae1ec5d1fabba288910e6aed6c4706e53e5cd7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 3b95214fff]
2023-09-15 16:37:10 -04:00
Galantsev, Dmitrii f25177840e TESTS - Check power and frequency support
It is not guaranteed that power can be read or set for some GPUs
(MI300). It is also not guaranteed that frequencies can be set.

As this is not a tool issue - we simply skip the failing test.

Change-Id: I134e96a476040cef513cd924f00e30cd6dea42a5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 5c574ac79c]
2023-09-14 22:19:33 -04:00
Galantsev, Dmitrii 3707b84f81 README - shell -> bash
Change-Id: I3a50c38ae280747b4874cff443091f332980fe50
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 238c7f6dca]
2023-09-14 17:32:01 -04:00
Galantsev, Dmitrii e95c072536 README - Add a documentation link
Change-Id: Ia56994825e99e72829283f07bed7379d95d24498
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 26c4578ee2]
2023-09-14 16:31:42 -05:00
Galantsev, Dmitrii 6335bdb6e0 Add errors for existing but empty dev files
Change-Id: Iad9febc50f9b8e6085f8b605249ee884d2f134d6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: a4b470fe71]
2023-09-14 17:30:03 -04:00
Oliveira, Daniel 24edb0acab rocm_smi_lib: Fix rocm-smi --resetfans results in Permission Denied
For operations related to:
  --resetfans
  --setfan

We report 'Not supported' for these cases instead of 'Permission denied'

Code changes related to the following:
  * rocm_smi_properties
  * rocm_smi related APIs

Change-Id: I144646efc3804fabd45cc5a46351803950b4feb7
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 12f395e592]
2023-09-14 16:54:29 -04:00
Galantsev, Dmitrii d131ac8f32 Fix misspelling averge -> average
Change-Id: I3546348560acadb1e775e10ad24115de4ccfc800
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: d9381b6dae]
2023-09-13 19:49:46 -05:00
Galantsev, Dmitrii cc740d7d22 PY: Silence error output when printing concise info
Change-Id: I9ce4ad523b3fe2ec8afc5bea791810ec67558f11
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 4acfb00ad5]
2023-09-12 19:16:16 -04:00
Galantsev, Dmitrii 76aec0097a TESTS - re-enable frequency tests on aqua_vanjaram
Change-Id: I8fcd9418da5b973897ccfffc7d8a2f3ea833ea77
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: ff992e9b56]
2023-09-11 19:43:25 -05:00
Galantsev, Dmitrii ab1ca937c7 SWDEV-409184 - Fix erroneous 'not supported' when HWMON is absent
Change-Id: Ic5ff406977d962fadc709a03853dac61b5460a26
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 41ade41d84]
2023-09-11 19:34:30 -05:00
Charis Poag d975792f47 Add GPU partition nodes
* Updates:
    - Fixed infinit loop on systems
      which did not have VRAM files
    - Fixed concise info from throwing exception
      with no amdgpu driver loaded
    - Fix for ability to see all nodes when
      after switching partitions (mirrors
      original card display/settings)
    - Added to logs build type, lib path,
      and set env. variables

Change-Id: Ic0333df355144ce2242cecea93fe4ce51caf311c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: ed6777a8e7]
2023-09-07 22:17:54 -05:00
Galantsev, Dmitrii 9da052436a Cleanup rocm_smi.cc
Change-Id: Ia676c237222b0dd5d9e8a054a93776f3b11e2225
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 4aef767596]
2023-09-07 15:50:40 -04:00
Bill(Shuzhou) Liu ac4131905d Fix doxygen warning messages
The Doxygen will enable warning as error message.

Change-Id: Ie7a7c9a823388c4140f31489604d65ec43005772


[ROCm/rocm_smi_lib commit: fab0542ab1]
2023-09-07 08:48:38 -04:00
Oliveira, Daniel a044785231 rocm_smi_lib: Fix rocm-smi --showfan shows 'unable to detect fan'
Code changes related to the following:
  * Reverts earlier fix for the same issue
  * Check for existence of files before reading

Change-Id: I175b20c3343c414b12b79dc3fc404f53fbaabf3a
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 328ce0150b]
2023-08-30 14:45:05 -05:00
Galantsev, Dmitrii 0679c33b1a TESTS - Add 90402 and simplify description
Change-Id: Ie6ab12d4201841fcb832d6827a5ec0ae5bb65114
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 84e90e55d5]
2023-08-25 14:01:53 -05:00
Bill(Shuzhou) Liu 087a642570 Numa affinity shows large number
Change the affinity from unsigned int to integer to represent -1.

Change-Id: I82dc6f476b45fa4ec03a3c686fe8e6e2b7761b56


[ROCm/rocm_smi_lib commit: 471fbfddc1]
2023-08-25 09:01:08 -04:00
Oliveira, Daniel 2b199f14ec rocm_smi_lib/rocm_smi.py: Fix Add 'GPU name' in rocm-smi output
Code changes related to the following:
  * rocm_smi.py

Change-Id: I600e776bf479f972b8d639ce5a658a24916aed3c
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 3602447109]
2023-08-24 13:22:24 -05:00
Oliveira, Daniel 3992aafc08 rocm_smi_lib/rocm_smi.py: Fix rocm-smi --resetfans shows 'permission denied'
Properly handles 'Not supported' fan cases where:
 * sysfs file (pwm#_enable) exists
 * sysfs file (pwm#_enable) does not exist

Change-Id: Ifa3c290e5ee1d27a550e94d86cd25ad8dcef3f59
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 654f65118b]
2023-08-24 10:54:50 -05:00
Oliveira, Daniel c166a66017 rocm_smi_lib/rocm_smi.py: Fix rocm-smi --showfan shows 'unable to detect fan'
Properly handles 'Unable to detect' vs 'Not supported' fan cases where:
 * sysfs file (pwm#) exists, and readings report zero (0), "Unable to detect fan speed"
 * sysfs file (pwm#) does not exist, then "Not supported"

Change-Id: If4b0312c872b76647a3e54427ba2a3f3e8e6dab1
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: f9fd6b0a96]
2023-08-23 20:43:08 -05:00
Charis Poag ae5f3d6ceb Error handling for unset freqs
Sending RSMI_STATUS_UNEXPECTED_DATA for drivers
which do not set some clock freqs

Change-Id: I43a9515c2757dddd412bb25cfd54095e63367030
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: f191c2753c]
2023-08-23 10:44:57 -05:00
Galantsev, Dmitrii 8c557e16cc TESTS - Fix incorrect TestVoltCurvRead assert if not supported
Change-Id: I2242aa9be84543276c63f1f57fdc489754c9ee07
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 613bd8ad1d]
2023-08-22 16:51:42 -04:00
Galantsev, Dmitrii 7d96f95fb2 .editorconfig - Remove broken whitespace rule
Change-Id: I67260f1f1952609dc89834d0763acd732bf39860
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 548b68cb67]
2023-08-22 16:51:20 -04:00
Galantsev, Dmitrii 2b64402b90 TESTS - Use gpu version as a workaround for a missing name
Depends-On: Ifbd38f11fbde7ba28af4be1d611310dea1b5112a
Change-Id: Ia7b7975f03424854df0a470b2719cf2ff2cf8e40
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 62f01cb150]
2023-08-21 19:18:22 -04:00
Bill(Shuzhou) Liu c695336b6d Fallback to kfd node when VRAM sysfs not available
The driver may not expose VRAM sysfs in certain system. Add a
fallback to it.

Change-Id: Ib3be71b4f4d2c79318d5026b0a97f3657d8a97b6


[ROCm/rocm_smi_lib commit: a10f00bf57]
2023-08-17 14:36:03 -05:00
Charis Poag 47420111a8 [SWDEV-399953] Smart Temperature detection + partitioning display
* Updates:
    - Fix for devices which do not have edge sensors, but junction
    - Added partitioning (memory and dynamic) displays for
      base rocm-smi CLI calls
    - Added subheading for base rocm-smi call output
    - Added better hwmon and device detection logging

Change-Id: I8219884b2e532d6ed379527cacdc1f2b232a5451
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 755e14dbad]
2023-08-10 19:53:38 -04:00
Oliveira, Daniel a75b7f741c Fix rsmitstReadWrite.TestPowerReadWrite test failure
Code changes related to the following:
  * All reinforcement work moved to their own files
  * Self contained changes only to support them
  * New files added to CMakeLists.txt

Change-Id: I761e91f54392824df9145eaed8b9805986861285
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: cc5ab079df]
2023-08-09 21:51:05 -05:00
Ranjith Ramakrishnan 9d347e9e2f SWDEV-366827 - Disable file reorg backward compatibility support by default
Change-Id: I1de06d0d6a30c8c862d768b58460ef1b49d15e29


[ROCm/rocm_smi_lib commit: 9406cdd832]
2023-08-07 09:21:19 -07:00
Charis Poag 4e39fe3e25 [lib] Enhance Logger: gpu_metrics + enable console out
* Updates:
    - Env variable RSMI_LOGGING=0 or any other value
        -> all logging off
    - Env variable RSMI_LOGGING=1 -> logs only
    - Env variable RSMI_LOGGING=2 -> console only
    - Env variable RSMI_LOGGING=3 -> both logs + console
    - Metrics output includes hexdump of current file
      and decoded metrics (functions: logHexDump
      and log_gpu_metrics)
    - System info gathered, now includes if system's
      perceived endianness - little or big endian
      helpful for viewing decoded hexdump or any
      binary translation
    - Added templates for printing unsigned hex
      (print_unsigned_hex_and_int), unsigned integers
      (print_unsigned_int), and printing both unsigned
      hex and int with an optional header
      (print_unsigned_hex_and_int)
    - Fixed some build compile warnings/errors -
      ex. doing strncpys for sku or board names
      this operation is expected and needed
      and for temp file writes if unsuccessful
      we now properly send RSMI_STATUS_FILE_ERROR
    - Fixed on RHEL 8.8/9.x logrotate does not properly
      initialize

Change-Id: Ifa0f0218c9cafd0a8cd6aa8e7f94d61e9107200f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 9c7eed7edc]
2023-08-01 21:46:19 -05:00
Bill(Shuzhou) Liu 3bdcaa1deb Crash when ecc count sysfile cannot be read
Replace assert with error handling code.

Change-Id: I6500ae4d38a8caea87828aa7d76373d20c8354c7


[ROCm/rocm_smi_lib commit: 0522439ac2]
2023-07-31 08:36:53 -05:00
Bill(Shuzhou) Liu f9f936e3d2 Change reset power error message to logging
Since the reset will continue if the reset power and current power
is the same, error may confuse the user.

Change-Id: I35b9ef17afd47b5af5bd2b8882a44f63991fe509


[ROCm/rocm_smi_lib commit: aeb6c61f54]
2023-07-27 15:18:28 -05:00
Bill(Shuzhou) Liu aca23ecb0b Handle csv output when the command is not based on the device
Fix the error only one csv line can be printed out when output
is not based on device.

Change-Id: Idacc5d98acc223e932fb3d46c888bfa04778b73c


[ROCm/rocm_smi_lib commit: 80d650b95a]
2023-07-26 15:28:18 -05:00
Maisam Arif 60a4b3cb19 SWDEV-394316 - Handle not applicable vbios
Change-Id: I3390078a63c9a5eff67024b84a3be1369c4b1460
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>


[ROCm/rocm_smi_lib commit: c78ec46671]
2023-07-25 16:33:22 -05:00
Charis Poag 9b5289aff7 Update logging and README for other project usage
Updates:
    * [rocm-smi] Logging now can update files on
      per-project-basis for install/remove
    * [rocm-smi] README now has latest build
      instructions, including test builds
    * [rocm-smi] Updated README to include
      revision dates

Change-Id: Ifb19a6f32ccf6938f47225db53fef88021909264
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 4613e8dec3]
2023-07-20 19:09:11 -05:00
Oliveira, Daniel 4c063f4038 Add revision to --showhw
Code changes related to the following:
  * Added 'rsmi_dev_revision_get()' related code
  * Test code
  * Functional tests

Change-Id: I8c2097c65384a028c8c8437b717d05d52fe45250
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 573620f586]
2023-07-18 16:17:33 -05:00
Galantsev, Dmitrii 9045bac955 Fix sys and id tests
The following read tests were failing:
*.TestIdInfoRead
*.TestSysInfoRead

1. *.TestIdInfoRead failed because rsmi_dev_brand_get did not specify
   dependency on vbios_version.

2. *.TestSysInfoRead failed because the test didn't expect vbios_version to
   be missing. Which is a new behavior in Aqua Vanjaram.

Change-Id: I9ee88a12fcf6cff2032049e2ecdfb2957efb03ab
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 8fe848d10e]
2023-07-17 15:52:23 -04:00
Galantsev, Dmitrii e2bee8b2f9 Add .cache to gitignore
Change-Id: Ida03bf1f50704bea44827d7578cd74c1896d4368
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: b0fe2fbd07]
2023-07-17 15:52:23 -04:00