Граф коммитов

545 Коммитов

Автор SHA1 Сообщение Дата
Étienne Mollier c4c19e7917 CMake - do not enforce -fPIE.
When built with LTO enabled, the linking of liboam.so chokes on the
following error, which is somewhat similar to the Debian bug #1030876
affecting PA-RISC, although the symptoms subtly differs in that it
suggests to build using -fPIC:

	/usr/bin/ld: /tmp/cc0wF8Kx.ltrans0.ltrans.o: relocation R_X86_64_PC32 against symbol `_ZTVSt9exception@@GLIBCXX_3.4' can not be used when making a shared object; recompile with -fPIC

The -fPIC argument is passed appropriately down to the build command,
however it looks to be erased by the late introduction of -fPIE flag
by upstream build system.  Erasing this flag allows the build to go
through, both with LTO and on PA-RISC.

Bug: https://github.com/RadeonOpenCompute/rocm_smi_lib/issues/111
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1015653
Change-Id: I8b35fd4b62cfa1a9ddb145362464df5dd276e2f5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-23 16:37:37 -05:00
Galantsev, Dmitrii 1cf05dd9c7 CMake - Prevent failure to build on non-amd64 targets
Change-Id: Ifaa59fb672ea01c07cffea6cd2429bec15a5deaf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Co-authored-by: Étienne Mollier <emollier@debian.org>
Change-Id: Ia691ab1db0061f04662e10e112da4b9ef06c4256
2023-10-23 16:36:17 -05:00
Galantsev, Dmitrii 275108f5b9 README - Clean-up cli readme
Change-Id: I665cc5a48a240f0d2289439a4877c9f667b19851
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-23 13:17:04 -05:00
Maxime Chambonnet 8cfcb51550 Updated README.md with standard Markdown tables, cleaned a bit header levels.
Change-Id: Ibd6e382413d7667a5a823ac69620a2cfb7046bc5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-23 13:11:18 -05:00
Sam Wu 1de63ce506 Update rocm-docs
Change-Id: I30633c9cd29bc58b0c48086d5f493204f3d6ffd8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-18 14:09:26 -05:00
Charis Poag 6f1afd2678 bdfid fix for partition & xgmi nodes
* Updates:
    - [API] After discovering all amd gpus, we now properly
      map correct bdf (xgmi nodes). Especially important for
      partition changes - aka secondary nodes.
    - [API] While adding new secondary nodes we now have
      better grouping -> due to resorting based on
      kfd properties list & matching to primary uniqueid
    - [API] All secondary nodes are now AddToDeviceList
      with correct bdf (location id), provided by kfd
    - [API] Modified AddToDeviceList(..., uint64_t bdfid):
      providing an optional field - bdfid. This allows working
      around primary pcie cards with xgmi nodes
    - [API] Utils - cpplint minor fixes
    - [Example] Removed all endl references w/ newline, fixed
      spacing, and some incorrect values displaying as hex
      (needed dec representation)
    - [API] kfd node functions - now print full path of file
      for trace logs
    - [Tests] power_read.cc: Added in generic power test to
      confirm guaranteeing specific return values

Change-Id: I143474e8d64c4915a966e789be6bcea4fa7f4472
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-10-13 20:14:39 -05:00
Galantsev, Dmitrii 2a7589a065 TESTS - Skip XGMI test
Change-Id: Idd9f505f36fac4a670e5129f835aa051b5c4c9fa
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-12 21:27:55 -05:00
Galantsev, Dmitrii 3f0071599d Fix rocm_smi.cc
Change-Id: Ib074dd542d8d37a6a618e10bd3bd389ad0cef108
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-11 11:46:49 -05:00
Charis Poag 31a1fcce7d Add rsmi_dev_power_get
* Updates:
  - [API] Added rsmi_dev_power_get(uint32_t dv_ind,
                                   uint64_t *power,
                                   RSMI_POWER_TYPE
                                   *type)
          provides generic get to average or
          current power & provides backwards
          compatibility
  - Added a utility function to get MonitorTypes
    (monitor_type_string(type)) &
    RSMI_POWER_TYPE (power_type_string(type))
    strings
  - [Tests] Added rsmi_dev_power_get tests and
    provided better verification of return values for
    all power APIs
  - [Tests] Updated power outputs to show correct
    units
  - [example] Now uses avg, current, and generic
    power functions with type output response

Change-Id: I5ca06ca37fd5f61e100f2835b664d6cdd1ca42e6
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-10-10 00:34:19 -05:00
Oliveira, Daniel 4e4ebde640 rocm_smi_lib: Fix Modernize and refactor gpu_metrics
Adds support for 'gpu_metrics_v1_4' and new counters

Code changes related to the following:
  * rsmi gpu_metrics APIs
  * rsmi gpu_metrics Logs
  * The new gpu_metrics are now part of the Device

Build changes related to the following: None

Change-Id: Ie748e977cd0a01c6a2fb82260014c0699605dbb3
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-10-09 21:43:22 -05:00
Charis Poag b251bb0c9f Rename NPS -> memory partition + compute partition node fix
* Updates:
        - rocm_smi_lib + CLI:
          Rename all "NPS mode" -> "memory partition"
          related files/functions/API/CLI to align with correct
          technical naming
        - rocm_smi_main: fixed identifying primary card's unique id
          utilize rsmi_dev_unique_id_get to map which
          KFD nodes belong to it
        - rsmi_dev_*_partition*: now have better logging output
        - compute partition tests:
          Added 20 sec delay for workaround until GPU
          busy is confirmed as the issue
        - CPPLint fixes/formatting
        - [Example] Moved all endl to "\n" for efficiency
        - [Example] Added Edge & Junction temperature examples
        - [Example] Added rsmi_minmax_bandwidth_get() example - WIP

Change-Id: Ida6db6fda7e0ac9d696a34cb15b4746e69d58d51
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-10-06 11:51:09 -04:00
Galantsev, Dmitrii 8244a677db Update package version
Change-Id: Ie094f75d028a09f862729094815f8a2b6ea8ad78
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-05 12:49:11 -05:00
Galantsev, Dmitrii e962d3b281 TESTS - Don't fail on TestFrequenciesRead
- Return from freq_output function early if clock is unsupported
- Right-align frequencies

Change-Id: I799c9351dac8a5be161bc9243cd3816539728357
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-04 18:24:56 -05:00
Galantsev, Dmitrii d862bee754 Add --version to CLI
Change-Id: Id2a8f10f544ed04e874db773820534eddd73f55d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-02 17:57:02 -05:00
Bill(Shuzhou) Liu d665157cd1 rocm-smi shows wrong fwinfo
Add new fw block into the rocm-smi tool.

Change-Id: Id5c7ccc2fc491f7e5d0390aeb4c6f81fd12fa644
2023-10-02 16:28:31 -04:00
Ori Messinger aa89f2e125 ROCm SMI CLI: Add Missing Firmware Blocks
The purpose of this patch is to add the following missing firmware
blocks to the SMI CLI:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: If9cabdc60ffcf08f27c9e6bdc20e8a26b192a738
2023-09-29 18:13:16 -04:00
Galantsev, Dmitrii cf6bcbbb27 Upgrade to CXX-17 gtest-1.14 and cmake-3.14
Also change the TARGET from amd_smi_libraries to rocm_smi_libraries
This helps reduce confusion between rocm-smi and amd-smi

Change-Id: Ie54cedd831ba24bd9afc341ad15b7e8e20732059
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-28 12:44:51 -05:00
Bill(Shuzhou) Liu 016dbf8aa3 Do not print the library name if in default folder
The rocm-smi python tool will not print the library name on default
folder.

Change-Id: I203a872ebe2fc994766a2628049ca50c8bfa7120
2023-09-27 12:14:33 -04:00
Galantsev, Dmitrii 8eb9f892d3 Fix out-of-bounds array access for --showvc
get_od_clk_volt_info assumed the size of the file instead of checking
the length. This caused out-of-bounds array element access.

Change-Id: Ibda8f0c3a6d1623d48964641ae5ef610d2072e94
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-26 13:59:37 -05:00
Charis Poag f078375350 Add Current (Instant) Socket Power
* Updates:
    - rocm_smi_logger:
      General cleanup &
      Aligned to cpplint rules for usage
    - rocm_smi_monitor:
      Fixed MonitorTypes
      from not displaying properly in logs
      & Added socket power label + current
      socket power MonitorTypes
    - rocm_smi API:
      Added rsmi_dev_current_socket_power_get API
    - rocm_smi CLI:
      General cleanup,
      Concise info now displays device data
      in variable width (see printLogSpacer's
      new field),
      printLogSpacer now as an adjustable
      variable that overrides appWidth,
      Added Socket Power to base rocm-smi +
      --showpower CLI calls,
      --showpower & base rocm-smi CLI defaults
      to printing socket power (if not available,
      displays average power)
    - Cleaned up temp label references
    - power_read gtests:
      Added current socket power to testing

Change-Id: Ica57e6f98ad96e2584e7c7955e188f68d2dab89d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-09-25 01:38:54 -04:00
Galantsev, Dmitrii 3d40c4bb2c SWDEV-422836 - Add sleep frequency support
Change-Id: I0bde403b010bf036ce44ed0600cc7eb03742c6b6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-25 01:38:27 -04:00
Ori Messinger d44a6ef523 ROCm SMI LIB: Add Missing Firmware Blocks
The purpose of this patch is to add the following missing firmware
blocks to the SMI LIB:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5d4d37d883878dd02ef8533d4eb8891d54d70630
2023-09-25 01:37:38 -04:00
Bill(Shuzhou) Liu 2247c4b46c Change the python tool id output label
Change the label from GPU to Device as we call rsmi_dev_id_get().

Change-Id: I8ffe3673d434e5291ebd5cc909afb7d18154ecb6
2023-09-25 01:31:04 -04:00
Bill(Shuzhou) Liu 85df5676d4 Handle the memory frequency with only one line
Change the code to handle the memory frequency if it is only one line.

Change-Id: I09e6ee78a2b9c12c861243dc89296e4e7862da49
2023-09-25 01:30:56 -04:00
Galantsev, Dmitrii 0c662611e9 SWDEV-423672 - Always compile and install gtest
This commit makes sure GTest is always compiled with rocm_smi_lib_tests.

GTest installation was inconsistent outside of AMD CI environment.
libgtest.so wouldn't get installed with rocm_smi_lib_tests if gtest
existed on the build machine. Which is undesirable when packaging.

Change-Id: I607df6c67c81480e3b6487b28f14924e8bf56ad4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-23 21:10:12 -04:00
Galantsev, Dmitrii 1683245ecf PY: Remove f-strings from rocm_smi.py
Change-Id: I0a422e8f66473af837460ecb2450e5be329163b0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-22 19:15:59 -05:00
Oliveira, Daniel e0483f2ee2 rocm_smi_lib: Fix [linux BM] [AMDSMI] Memory Bandwidth
Implements APIs for 'gpu_metrics_v1_3' utilization averages

Code changes related to the following:
  * rsmi_dev_activity_metric_get()
  * rsmi_dev_activity_avg_mm_get()
  * CLI shows "Avg.Memory Bandwidth" under "--showmemuse"

Change-Id: I8e4600f350a7c18499abf022534db2b875f09d5f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-09-21 11:00:29 -04:00
Galantsev, Dmitrii b99867eb80 PACKAGE - Fix packaging
Allow for configureLogrotate to fail without failing configure

In previous commit I forgot to invert the check when switching
"IS_SYSTEMD" and "!IS_SYSTEMD" if-else statements.

Change-Id: I8eb8e7981c6353a2e60064eb3a6e35821ea2a0d0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-20 10:37:35 -05:00
Galantsev, Dmitrii 431a7071a0 PACKAGE - Cleanup packaging
- Clean-up packaging scripts. More consistent with RDC.
- Remove all 'sudo' calls. all these scripts are to be ran by root.
- Reduce scope of variables.
- Remove unnecessary functions

Change-Id: Ib90f8e66ef4eae24f73e940fff44f515e12233f5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-20 01:07:51 -04:00
Sam Wu 7b32ea614b fix toc to point to correct doxysphinx output path
update doc requirements; rocm-docs-core to 0.24.1

Change-Id: I78257d476a8bc47fd1a4ee03aa3db1a430ed116f
2023-09-18 09:07:01 -06:00
Galantsev, Dmitrii 094c98a74f rocm_smi.py: Fix pipe into head error
When piping rocm_smi into 'head' it failed with "Broken pipe" error. The
error can be safely ignored. head closes the pipe early which causes
calls a SIGPIPE signal to be raised.

https://docs.python.org/3/library/signal.html#note-on-sigpipe

Change-Id: I4a589c6ed9a8c5b50de84b33e28115c6b510045f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-15 16:37:10 -04:00
Galantsev, Dmitrii 3b95214fff rsmiBindings.py - Add initRsmiBindings()
Library path was printed at all times even with --json flag.
This commit adds a mandatory initRsmiBindings function which is a core
component of the rsmiBindings.py library.

It **MUST** be called on import.

Change-Id: Ic6ae1ec5d1fabba288910e6aed6c4706e53e5cd7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-15 16:37:10 -04:00
Galantsev, Dmitrii 5c574ac79c TESTS - Check power and frequency support
It is not guaranteed that power can be read or set for some GPUs
(MI300). It is also not guaranteed that frequencies can be set.

As this is not a tool issue - we simply skip the failing test.

Change-Id: I134e96a476040cef513cd924f00e30cd6dea42a5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-14 22:19:33 -04:00
Galantsev, Dmitrii 238c7f6dca README - shell -> bash
Change-Id: I3a50c38ae280747b4874cff443091f332980fe50
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-14 17:32:01 -04:00
Galantsev, Dmitrii 26c4578ee2 README - Add a documentation link
Change-Id: Ia56994825e99e72829283f07bed7379d95d24498
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-14 16:31:42 -05:00
Galantsev, Dmitrii a4b470fe71 Add errors for existing but empty dev files
Change-Id: Iad9febc50f9b8e6085f8b605249ee884d2f134d6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-14 17:30:03 -04:00
Oliveira, Daniel 12f395e592 rocm_smi_lib: Fix rocm-smi --resetfans results in Permission Denied
For operations related to:
  --resetfans
  --setfan

We report 'Not supported' for these cases instead of 'Permission denied'

Code changes related to the following:
  * rocm_smi_properties
  * rocm_smi related APIs

Change-Id: I144646efc3804fabd45cc5a46351803950b4feb7
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-09-14 16:54:29 -04:00
Galantsev, Dmitrii d9381b6dae Fix misspelling averge -> average
Change-Id: I3546348560acadb1e775e10ad24115de4ccfc800
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-13 19:49:46 -05:00
Galantsev, Dmitrii 4acfb00ad5 PY: Silence error output when printing concise info
Change-Id: I9ce4ad523b3fe2ec8afc5bea791810ec67558f11
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-12 19:16:16 -04:00
Galantsev, Dmitrii ff992e9b56 TESTS - re-enable frequency tests on aqua_vanjaram
Change-Id: I8fcd9418da5b973897ccfffc7d8a2f3ea833ea77
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-11 19:43:25 -05:00
Galantsev, Dmitrii 41ade41d84 SWDEV-409184 - Fix erroneous 'not supported' when HWMON is absent
Change-Id: Ic5ff406977d962fadc709a03853dac61b5460a26
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-11 19:34:30 -05:00
Charis Poag ed6777a8e7 Add GPU partition nodes
* Updates:
    - Fixed infinit loop on systems
      which did not have VRAM files
    - Fixed concise info from throwing exception
      with no amdgpu driver loaded
    - Fix for ability to see all nodes when
      after switching partitions (mirrors
      original card display/settings)
    - Added to logs build type, lib path,
      and set env. variables

Change-Id: Ic0333df355144ce2242cecea93fe4ce51caf311c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-09-07 22:17:54 -05:00
Galantsev, Dmitrii 4aef767596 Cleanup rocm_smi.cc
Change-Id: Ia676c237222b0dd5d9e8a054a93776f3b11e2225
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-07 15:50:40 -04:00
Bill(Shuzhou) Liu fab0542ab1 Fix doxygen warning messages
The Doxygen will enable warning as error message.

Change-Id: Ie7a7c9a823388c4140f31489604d65ec43005772
2023-09-07 08:48:38 -04:00
Oliveira, Daniel 328ce0150b rocm_smi_lib: Fix rocm-smi --showfan shows 'unable to detect fan'
Code changes related to the following:
  * Reverts earlier fix for the same issue
  * Check for existence of files before reading

Change-Id: I175b20c3343c414b12b79dc3fc404f53fbaabf3a
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-30 14:45:05 -05:00
Galantsev, Dmitrii 84e90e55d5 TESTS - Add 90402 and simplify description
Change-Id: Ie6ab12d4201841fcb832d6827a5ec0ae5bb65114
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-25 14:01:53 -05:00
Bill(Shuzhou) Liu 471fbfddc1 Numa affinity shows large number
Change the affinity from unsigned int to integer to represent -1.

Change-Id: I82dc6f476b45fa4ec03a3c686fe8e6e2b7761b56
2023-08-25 09:01:08 -04:00
Oliveira, Daniel 3602447109 rocm_smi_lib/rocm_smi.py: Fix Add 'GPU name' in rocm-smi output
Code changes related to the following:
  * rocm_smi.py

Change-Id: I600e776bf479f972b8d639ce5a658a24916aed3c
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 13:22:24 -05:00
Oliveira, Daniel 654f65118b rocm_smi_lib/rocm_smi.py: Fix rocm-smi --resetfans shows 'permission denied'
Properly handles 'Not supported' fan cases where:
 * sysfs file (pwm#_enable) exists
 * sysfs file (pwm#_enable) does not exist

Change-Id: Ifa3c290e5ee1d27a550e94d86cd25ad8dcef3f59
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 10:54:50 -05:00
Oliveira, Daniel f9fd6b0a96 rocm_smi_lib/rocm_smi.py: Fix rocm-smi --showfan shows 'unable to detect fan'
Properly handles 'Unable to detect' vs 'Not supported' fan cases where:
 * sysfs file (pwm#) exists, and readings report zero (0), "Unable to detect fan speed"
 * sysfs file (pwm#) does not exist, then "Not supported"

Change-Id: If4b0312c872b76647a3e54427ba2a3f3e8e6dab1
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-23 20:43:08 -05:00