Граф коммитов

188 Коммитов

Автор SHA1 Сообщение Дата
Galantsev, Dmitrii 2831a5addc Merge amd-staging into amd-master 20231005
Change-Id: Ie217f139f63aa10ec5e9ce48797b7cb94864736d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-05 16:22:45 -05:00
Galantsev, Dmitrii d862bee754 Add --version to CLI
Change-Id: Id2a8f10f544ed04e874db773820534eddd73f55d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-02 17:57:02 -05:00
Ori Messinger aa89f2e125 ROCm SMI CLI: Add Missing Firmware Blocks
The purpose of this patch is to add the following missing firmware
blocks to the SMI CLI:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: If9cabdc60ffcf08f27c9e6bdc20e8a26b192a738
2023-09-29 18:13:16 -04:00
Bill(Shuzhou) Liu 016dbf8aa3 Do not print the library name if in default folder
The rocm-smi python tool will not print the library name on default
folder.

Change-Id: I203a872ebe2fc994766a2628049ca50c8bfa7120
2023-09-27 12:14:33 -04:00
Hao Zhou 4ce4535450 Merge amd-staging into amd-master 20230926
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I6d152514b258cf7b5f0ab0e54e2539ab5f033f14
2023-09-26 09:40:23 +08:00
Charis Poag f078375350 Add Current (Instant) Socket Power
* Updates:
    - rocm_smi_logger:
      General cleanup &
      Aligned to cpplint rules for usage
    - rocm_smi_monitor:
      Fixed MonitorTypes
      from not displaying properly in logs
      & Added socket power label + current
      socket power MonitorTypes
    - rocm_smi API:
      Added rsmi_dev_current_socket_power_get API
    - rocm_smi CLI:
      General cleanup,
      Concise info now displays device data
      in variable width (see printLogSpacer's
      new field),
      printLogSpacer now as an adjustable
      variable that overrides appWidth,
      Added Socket Power to base rocm-smi +
      --showpower CLI calls,
      --showpower & base rocm-smi CLI defaults
      to printing socket power (if not available,
      displays average power)
    - Cleaned up temp label references
    - power_read gtests:
      Added current socket power to testing

Change-Id: Ica57e6f98ad96e2584e7c7955e188f68d2dab89d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-09-25 01:38:54 -04:00
Galantsev, Dmitrii 3d40c4bb2c SWDEV-422836 - Add sleep frequency support
Change-Id: I0bde403b010bf036ce44ed0600cc7eb03742c6b6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-25 01:38:27 -04:00
Bill(Shuzhou) Liu 2247c4b46c Change the python tool id output label
Change the label from GPU to Device as we call rsmi_dev_id_get().

Change-Id: I8ffe3673d434e5291ebd5cc909afb7d18154ecb6
2023-09-25 01:31:04 -04:00
Hao Zhou d417ea52f6 Merge amd-staging into amd-master 20230925
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Id510a95e3bea2ddddae7c417071bde599569930a
2023-09-25 09:38:13 +08:00
Galantsev, Dmitrii 3a4e428fd5 PY: Remove f-strings from rocm_smi.py
Change-Id: I0a422e8f66473af837460ecb2450e5be329163b0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-23 00:12:34 -05:00
Galantsev, Dmitrii 1683245ecf PY: Remove f-strings from rocm_smi.py
Change-Id: I0a422e8f66473af837460ecb2450e5be329163b0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-22 19:15:59 -05:00
Oliveira, Daniel e0483f2ee2 rocm_smi_lib: Fix [linux BM] [AMDSMI] Memory Bandwidth
Implements APIs for 'gpu_metrics_v1_3' utilization averages

Code changes related to the following:
  * rsmi_dev_activity_metric_get()
  * rsmi_dev_activity_avg_mm_get()
  * CLI shows "Avg.Memory Bandwidth" under "--showmemuse"

Change-Id: I8e4600f350a7c18499abf022534db2b875f09d5f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-09-21 11:00:29 -04:00
Hao Zhou 6d081cd1b1 Merge amd-staging into amd-master 20230921
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I08c5ae1cca4b03dbb3cfcbcbf61d4b1b633908c1
2023-09-21 13:40:25 +08:00
Galantsev, Dmitrii 094c98a74f rocm_smi.py: Fix pipe into head error
When piping rocm_smi into 'head' it failed with "Broken pipe" error. The
error can be safely ignored. head closes the pipe early which causes
calls a SIGPIPE signal to be raised.

https://docs.python.org/3/library/signal.html#note-on-sigpipe

Change-Id: I4a589c6ed9a8c5b50de84b33e28115c6b510045f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-15 16:37:10 -04:00
Galantsev, Dmitrii 3b95214fff rsmiBindings.py - Add initRsmiBindings()
Library path was printed at all times even with --json flag.
This commit adds a mandatory initRsmiBindings function which is a core
component of the rsmiBindings.py library.

It **MUST** be called on import.

Change-Id: Ic6ae1ec5d1fabba288910e6aed6c4706e53e5cd7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-15 16:37:10 -04:00
Galantsev, Dmitrii a4b470fe71 Add errors for existing but empty dev files
Change-Id: Iad9febc50f9b8e6085f8b605249ee884d2f134d6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-14 17:30:03 -04:00
Oliveira, Daniel 12f395e592 rocm_smi_lib: Fix rocm-smi --resetfans results in Permission Denied
For operations related to:
  --resetfans
  --setfan

We report 'Not supported' for these cases instead of 'Permission denied'

Code changes related to the following:
  * rocm_smi_properties
  * rocm_smi related APIs

Change-Id: I144646efc3804fabd45cc5a46351803950b4feb7
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-09-14 16:54:29 -04:00
Hao Zhou 265341dd39 Merge amd-staging into amd-master 20230914
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I408a62826faff38d319b0d7ef08767223b3b327f
2023-09-14 10:23:32 +08:00
Galantsev, Dmitrii 4acfb00ad5 PY: Silence error output when printing concise info
Change-Id: I9ce4ad523b3fe2ec8afc5bea791810ec67558f11
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-12 19:16:16 -04:00
Charis Poag ed6777a8e7 Add GPU partition nodes
* Updates:
    - Fixed infinit loop on systems
      which did not have VRAM files
    - Fixed concise info from throwing exception
      with no amdgpu driver loaded
    - Fix for ability to see all nodes when
      after switching partitions (mirrors
      original card display/settings)
    - Added to logs build type, lib path,
      and set env. variables

Change-Id: Ic0333df355144ce2242cecea93fe4ce51caf311c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-09-07 22:17:54 -05:00
Hao Zhou 26c6f96d71 Merge amd-staging into amd-master 20230907
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: If27857b677876229d45a9f686b1a7ec7c1316e15
2023-09-07 10:19:41 +08:00
Oliveira, Daniel 328ce0150b rocm_smi_lib: Fix rocm-smi --showfan shows 'unable to detect fan'
Code changes related to the following:
  * Reverts earlier fix for the same issue
  * Check for existence of files before reading

Change-Id: I175b20c3343c414b12b79dc3fc404f53fbaabf3a
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-30 14:45:05 -05:00
Hao Zhou a3eff9e2fd Merge amd-staging into amd-master 20230830
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I41c0e3ec76a43af30f25e140f25b521a6097d125
2023-08-30 18:03:51 +08:00
Bill(Shuzhou) Liu 471fbfddc1 Numa affinity shows large number
Change the affinity from unsigned int to integer to represent -1.

Change-Id: I82dc6f476b45fa4ec03a3c686fe8e6e2b7761b56
2023-08-25 09:01:08 -04:00
Oliveira, Daniel 3602447109 rocm_smi_lib/rocm_smi.py: Fix Add 'GPU name' in rocm-smi output
Code changes related to the following:
  * rocm_smi.py

Change-Id: I600e776bf479f972b8d639ce5a658a24916aed3c
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 13:22:24 -05:00
Oliveira, Daniel 654f65118b rocm_smi_lib/rocm_smi.py: Fix rocm-smi --resetfans shows 'permission denied'
Properly handles 'Not supported' fan cases where:
 * sysfs file (pwm#_enable) exists
 * sysfs file (pwm#_enable) does not exist

Change-Id: Ifa3c290e5ee1d27a550e94d86cd25ad8dcef3f59
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 10:54:50 -05:00
Oliveira, Daniel f9fd6b0a96 rocm_smi_lib/rocm_smi.py: Fix rocm-smi --showfan shows 'unable to detect fan'
Properly handles 'Unable to detect' vs 'Not supported' fan cases where:
 * sysfs file (pwm#) exists, and readings report zero (0), "Unable to detect fan speed"
 * sysfs file (pwm#) does not exist, then "Not supported"

Change-Id: If4b0312c872b76647a3e54427ba2a3f3e8e6dab1
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-23 20:43:08 -05:00
Hao Zhou 3921278c4e Merge amd-staging into amd-master 20230817
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Idb8b837f8b65e3e820719f39a6f2c0c4ebc5b7f9
2023-08-17 13:27:13 +08:00
Charis Poag 755e14dbad [SWDEV-399953] Smart Temperature detection + partitioning display
* Updates:
    - Fix for devices which do not have edge sensors, but junction
    - Added partitioning (memory and dynamic) displays for
      base rocm-smi CLI calls
    - Added subheading for base rocm-smi call output
    - Added better hwmon and device detection logging

Change-Id: I8219884b2e532d6ed379527cacdc1f2b232a5451
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-10 19:53:38 -04:00
Hao Zhou c4bc5550f7 Merge amd-staging into amd-master 20230731
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I606526255c6120c373fe25723c698ddf3d174b17
2023-07-31 13:51:06 +08:00
Bill(Shuzhou) Liu aeb6c61f54 Change reset power error message to logging
Since the reset will continue if the reset power and current power
is the same, error may confuse the user.

Change-Id: I35b9ef17afd47b5af5bd2b8882a44f63991fe509
2023-07-27 15:18:28 -05:00
Hao Zhou d52e613adf Merge amd-staging into amd-master 20230727
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I476516b5ce8c7a6c05d48d06acd7736141bee0dd
2023-07-27 12:11:11 +08:00
Bill(Shuzhou) Liu 80d650b95a Handle csv output when the command is not based on the device
Fix the error only one csv line can be printed out when output
is not based on device.

Change-Id: Idacc5d98acc223e932fb3d46c888bfa04778b73c
2023-07-26 15:28:18 -05:00
Maisam Arif c78ec46671 SWDEV-394316 - Handle not applicable vbios
Change-Id: I3390078a63c9a5eff67024b84a3be1369c4b1460
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2023-07-25 16:33:22 -05:00
Hao Zhou b81fcc2c32 Merge amd-staging into amd-master 20230720
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Ib2cb8cd0fa41f863e692dd5f92ab78ace776210c
2023-07-20 14:42:57 +08:00
Oliveira, Daniel 573620f586 Add revision to --showhw
Code changes related to the following:
  * Added 'rsmi_dev_revision_get()' related code
  * Test code
  * Functional tests

Change-Id: I8c2097c65384a028c8c8437b717d05d52fe45250
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-07-18 16:17:33 -05:00
Bill(Shuzhou) Liu 0aeb6025bd rocm-smi --showevents shows wrong gpuID
Use the gpuid returned from the event data instead.

Change-Id: I7f286cc105f7ea12985223e603504f0ef3d9724e
2023-07-13 08:28:53 -05:00
Hao Zhou 18770368a7 Merge amd-staging into amd-master 20230710
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I932f7e92939b6244a600c4f1bb92580aec9947ca
2023-07-10 10:19:28 +08:00
Jeremy Newton 2d2c73a5e6 Fix python loading of librocm_smi64
The librocm_smi64.so is used for development, while
librocm_smi64.so.MAJOR is used for runtime, thus the python front end
should not be loading the .so binary, but rather the .so.MAJOR binary.

As well, it's good not to hardcode "lib" as some distros will change
this.

rsmiBindings.py is now generated with CMake

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I7cb745f8936fdf10d3ebd6c1e606031f713184ca
2023-07-06 09:52:56 -04:00
Hao Zhou 58a09f7063 Merge amd-staging into amd-master 20230625
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Id257f743e4a45881696c053b08a245b09b3e34ad
2023-06-25 11:20:46 +08:00
Bill(Shuzhou) Liu d9b6af7a09 Expand showpids to provide more details
Provide details of GPU usage by an application.

Change-Id: I0f36df7d358754c2c8a60432b736d98f667ee99c
2023-06-16 08:52:18 -04:00
Hao Zhou 741740b04b Merge amd-staging into amd-master 20230615
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I67e22b2d06c8a9cd21aa3034bc78627d2fd0c586
2023-06-15 10:15:40 +08:00
Galantsev, Dmitrii 713f85721b --showtempgraph - Show N/A when no temp found
If temp in hwmon was missing - rocm-smi crashed.
e.g. /sys/class/drm/card1/device/hwmon/hwmon5/temp1_input

This change displays "N/A" for temp instead of crashing.

Change-Id: I02f84a466bd3acfbd9b65e7e4ca0f18e76606c3b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-12 19:16:39 -05:00
Maisam Arif 00e170c2f5 SWDEV-404157 - Fixed printLog delimiter parsing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I3d8e22d185790f4325aeacc18e4bfcfe8777d356
2023-06-08 20:02:51 -05:00
Hao Zhou 255b4d122b Revert "Revert "Merge amd-staging into amd-master 20230602""
This reverts commit 5aa94c48d1.

Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I38b7d0ca4535503bf0b9ba491de0eb747f3dd966
2023-06-07 11:56:43 +08:00
Hao Zhou 5aa94c48d1 Revert "Merge amd-staging into amd-master 20230602"
Revert submission 869878

Reason for revert: <RPM package on RHEL9.x is broken>
Reverted Changes:
I4886ef2a6:Merge amd-staging into amd-master 20230602
I0f277acf3:Revert "Revert "Merge amd-staging into amd-master ...

Change-Id: Ie370327c8db0404c9cedde42c1376e3cec56fae0
2023-06-02 02:12:07 -04:00
Hao Zhou 8560d96c81 Merge amd-staging into amd-master 20230602
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I4886ef2a60b08e17bd9165ff0cd46b4297e15972
2023-06-02 10:11:48 +08:00
Galantsev, Dmitrii e8391c9d7c Clean-up python errors and warnings
Used pyright to show errors and warnings and resolved most

Change-Id: I0fdf7dcdf08db5c35dec80f6645e0a395fbe4197
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-01 17:37:57 -04:00
Hao Zhou ecb1303732 Merge amd-staging into amd-master 20230414
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I518af7182bb6537c9c03a30d53c44d2143f3064f
2023-04-14 12:17:58 +08:00
Charis Poag 6be92b9e26 [SWDEV-392571] Fix concise info when missing VRAM info
Updates:
    * [rocm-smi] Added larger app width size, which helps
      display missing device info
    * [rocm-smi] Added better context when rsmi_ret_ok
      does not return with RSMI_STATUS_SUCCESS
    * [rocm-smi] Removed all references to an
      undefined function (printLogNoDev())
    * [rocm-smi] Fixed not detecting non-int
      values when setting the voltage curve
    * [rocm-smi] Added better context on missing
      sysfs file when setting clock overdrive
      values
    * [rocm-smi] Fixed getMemInfo() calls not
      referencing tuple values (making it easier
      to read)
    * [rocm-smi] Silenced concise info spitting
      out errors for missing VRAM files, instead
      display which metric is "unsupported" if
      the files are missing
    * [rocm-smi] Updated function descriptions for
      rsmi_ret_ok & getMemInfo
    * [rocm-smi] Updated getMemInfo to provide a
      quiet call, to silence for concise info calls.
      This provides a way to keep the output clean.
    * [rocm-smi-lib] Added when using debug sysfs
      files, to state, which enums are enabled
      for debug

Change-Id: I0e9e0c97ccf71467ced0e1a1f71803327a8be2b7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-04-13 15:11:35 -04:00