Grafik Komit

479 Melakukan

Penulis SHA1 Pesan Tanggal
Galantsev, Dmitrii b0fe2fbd07 Add .cache to gitignore
Change-Id: Ida03bf1f50704bea44827d7578cd74c1896d4368
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-07-17 15:52:23 -04:00
Bill(Shuzhou) Liu 0aeb6025bd rocm-smi --showevents shows wrong gpuID
Use the gpuid returned from the event data instead.

Change-Id: I7f286cc105f7ea12985223e603504f0ef3d9724e
2023-07-13 08:28:53 -05:00
Galantsev, Dmitrii e6c42c6626 Simplify gitignore
Remove generic gitignore to simplify tracking of generated files

Change-Id: Idf1f9719b2cfd16b31332a3ed87be5943c2c1ce7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-07-07 11:48:09 -04:00
Jeremy Newton 2d2c73a5e6 Fix python loading of librocm_smi64
The librocm_smi64.so is used for development, while
librocm_smi64.so.MAJOR is used for runtime, thus the python front end
should not be loading the .so binary, but rather the .so.MAJOR binary.

As well, it's good not to hardcode "lib" as some distros will change
this.

rsmiBindings.py is now generated with CMake

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I7cb745f8936fdf10d3ebd6c1e606031f713184ca
2023-07-06 09:52:56 -04:00
Jeremy Newton 828f46b445 Only install asan license if enabled
Change-Id: I79c6fce84c23ed12e65db8e234a29dbfedd11f68
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-30 23:34:43 -04:00
Jeremy Newton 4f481dd7f3 Actually fix version string
There seems to be a scope issue with the existing variables, but just
putting in the pkg version string seems sufficient.

Change-Id: I4ccef872ff848a70cb2abc07bf605c5f29a608e8
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-30 23:34:14 -04:00
Tom Rix 19c3e2aff9 Improve handling of ContructBDFID errors
Building on this package on Fedora reports this warning
In file included from rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:62:
In member function 'amd::smi::Device::set_bdfid(unsigned long)',
    inlined from 'amd::smi::RocmSMI::Initialize(unsigned long)' at rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:330:27:
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/include/rocm_smi/rocm_smi_device.h:199:42: warning: 'bdfid' may be used uninitialized [-Wmaybe-uninitialized]
  199 |     void set_bdfid(uint64_t val) {bdfid_ = val;}
      |                                   ~~~~~~~^~~~~
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc: In member function 'amd::smi::RocmSMI::Initialize(unsigned long)':
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:324:12: note: 'bdfid' was declared here
  324 |   uint64_t bdfid;
      |            ^~~~~

Only set the bdfid when it is know to be valid.

Signed-off-by: Tom Rix <trix@redhat.com>
Change-Id: I839b4d2d2d4e3b25469cf5972245b9630da00c87
2023-06-30 00:16:44 -04:00
Jeremy Newton 74dc98114f Update default version to match tags
When building from github, these tags don't exist, so the defaults
should try to match the internal tags

Change-Id: Id570341f27e21916b1a7f3605ee2b5b9716cad9b
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-30 00:16:22 -04:00
Jeremy Newton 1a86dd75bb Fix version file generation
This looks like a typo, as the following variables are not defined:
- AMD_SMI_LIBS_TARGET_VERSION_MAJOR
- AMD_SMI_LIBS_TARGET_VERSION_MINOR
- AMD_SMI_LIBS_TARGET_VERSION_PATCH

Change-Id: I43449e7bd2a2de643d33e79fad063a7859679c8d
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-29 14:42:30 -04:00
Jeremy Newton d00d885394 Fix python script install permissions
The keyword "PROGRAMS" should be used in place of "FILES" in order to
make sure executable scripts have the correct permissions.

Change-Id: I6c287dc1291774ad6d97a04d621957dea0a1b697
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-27 14:57:59 -04:00
Bill(Shuzhou) Liu 910bf677a9 Crash if no hwmon sysfs
Return NOT_SUPPORTED if no hwmon sysfs.

Change-Id: I01356a21f004ab552ca6ef7ffb49934bfdfd5e31
2023-06-26 08:00:32 -05:00
Galantsev, Dmitrii 82078565e9 SWDEV-406542 - Add gtest to install targets
Change-Id: I116505aaa33109fce66ab8daf9921e2de11a27d4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-20 11:14:56 -05:00
Galantsev, Dmitrii 9519d5b8cf SWDEV-391041 - Disable TestPowerReadWrite
Change-Id: I56b5bea3e5206a6f0d5ecdb482103881f80f0b8b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-16 15:18:27 -04:00
Galantsev, Dmitrii e7585cc045 Assign tests to aqua_vanjaram
Change-Id: Iee78b1e810356327261006087b081e39dab0b9e8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-16 15:18:27 -04:00
Bill(Shuzhou) Liu d9b6af7a09 Expand showpids to provide more details
Provide details of GPU usage by an application.

Change-Id: I0f36df7d358754c2c8a60432b736d98f667ee99c
2023-06-16 08:52:18 -04:00
Galantsev, Dmitrii 0478d53e23 SWDEV-340919 - Package rsmitst
Similar to I879b21428e6642f19fda67092b365d8b78b7ba7b.

Main CMake improvements:

* Add rsmitst with -DBUILD_TESTS=ON
* Package tests into rocm-smi-lib-tests.deb and .rpm
* Note - this breaks build_rsmitst.sh

Misc improvements:

* Add .editorconfig to normalize code formatting
* Export compile_commands.json
* Remove gtest source and pull from github instead

Change-Id: Ib87ed4a5acd9f78badae6d028e5ff3d4f56dafc2
Depends-On: I8b26795471ad1432c805e45d8b58d7bb34abfcfc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-13 22:52:10 -05:00
Galantsev, Dmitrii ac94bf5ed5 Temporarily ignore TestFrequencies
See SWDEV-391039 and SWDEV-391040 for details

Change-Id: I662ba43363d949465454ea4af4d4586b3d47a811
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-12 19:26:21 -05:00
Galantsev, Dmitrii 713f85721b --showtempgraph - Show N/A when no temp found
If temp in hwmon was missing - rocm-smi crashed.
e.g. /sys/class/drm/card1/device/hwmon/hwmon5/temp1_input

This change displays "N/A" for temp instead of crashing.

Change-Id: I02f84a466bd3acfbd9b65e7e4ca0f18e76606c3b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-12 19:16:39 -05:00
Maisam Arif 00e170c2f5 SWDEV-404157 - Fixed printLog delimiter parsing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I3d8e22d185790f4325aeacc18e4bfcfe8777d356
2023-06-08 20:02:51 -05:00
Galantsev, Dmitrii f78f9a4082 Fix test temp blacklist, ignore TestVoltCurvRead
Change-Id: I86fa14fdc06e1b170a0bc0c0727fc08e4f4e2074
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-06 17:02:14 -04:00
Charis Poag e2dec17284 [SWDEV-402336 + SWDEV-398070] Fix RPM install part2
Updates:
    [rocm-smi] RPM installation comment included a macro,
    now removed

Change-Id: Ifa7a8d2d1a713940c39e20df9d02635e0e623dd8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-06-05 13:50:57 -05:00
Galantsev, Dmitrii e8391c9d7c Clean-up python errors and warnings
Used pyright to show errors and warnings and resolved most

Change-Id: I0fdf7dcdf08db5c35dec80f6645e0a395fbe4197
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-01 17:37:57 -04:00
Charis Poag b0f2a9d2ef [SWDEV-402336 + SWDEV-398070] Fix RPM install - override macros
Updates:
    * [rocm-smi] RPM installation now overrides macro usage

Change-Id: I2a5ba14670becc178f672182eabe71965a526178
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-06-01 11:58:42 -04:00
Galantsev, Dmitrii 2048f8978f Fix memset compile warning
Change-Id: If31210f3c6038e56f43ae8631ed1657d1509488e
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-05-31 21:54:32 -04:00
Bill(Shuzhou) Liu a6467c4083 Fallback to gpu_metrics if the sysfs is not available
The gpu_metrics may have required PCI link width and speed.

Change-Id: I939d733f5f6a71088545ba042345eb1b6ad20ee5
2023-05-24 14:51:43 -05:00
Bill(Shuzhou) Liu 160c99d12d SWDEV-400644: Reset the mutex only if errors
To prevent reset the mutex while using it, only reset the mutex
if it cannot acquire it.

Change-Id: I95e0ed1bf543f285ce81b4df9c51e16a88081d38
2023-05-22 11:20:44 -04:00
Charis Poag c3a095a180 [SWDEV-398070] Adding logging to ROCm SMI (by default off)
Updates:
    * [rocm-smi] Provide a thread-safe logging feature
    * [rocm-smi] Adding logrotation into install/upgrade/remove
      scripts
    * [rocm-smi] Updated cmake lists to include rocm_smi_logger
    * [rocm-smi] Updated DEB/RPM install/remove logging file &
      folder with all users having r/w privledges for
      /var/log/rocm_smi_lib/ROCm-SMI-lib.log
    * [rocm-smi] Added ability to do a glob search for multiple files
      (globFileExists), assists doing file searches with * strings
    * [rocm-smi] Added ability to log system details when RSMI_LOGGING
      is turned on (getSystemDetails())
    * [rocm-smi] Added logging to provide which ROCm API is being called
      when RSMI_LOGGING is on
    * [rocm-smi] Added logging to provide SYSFS path and read value,
      when RSMI_LOGGING is on. Provides error reponse on failure.
    * [rocm-smi] Added logging to provide SYSFS path and read value,
      when RSMI_LOGGING is on. Provides error reponse on failure.
    * [rocm-smi] Added environment variable RSMI_LOGGING to control
      when logging is enabled or disabled. By default, by not
      setting this env. variable, logging is turned off. When
      setting RSMI_LOGGING=<any value>, logging is enabled
      which is placed in /var/log/rocm_smi_lib/ROCm-SMI-lib.log file.
      Setting RSMI_LOGGING is allowed in both debug and release builds.
    * [rocm-smi] Removed an initialize procedure which keeps
      debug_inf_loop. Seems this feature is not being used.

Change-Id: I79b48387609c6233c6f05b04fb8bba66b68c2399
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-05-17 21:18:52 -05:00
Sam Wu ed74bc6eca sphinx documentation
ref: https://github.com/RadeonOpenCompute/rocm_smi_lib/pull/119

fix formatting in docs/index.md

Change-Id: I940ef8147a40bd3b702aa591bd56557a870621fb
2023-05-11 10:41:45 -04:00
Ranjith Ramakrishnan daffcdb930 SWDEV-383221 - Set the default value of ROCM_HEADER_WRAPPER_WERROR to OFF
Using wrapper header files will result in #warning message by default

Change-Id: I8941a96bdc1b921a7646ccb353130cb283957ff8
2023-05-08 16:56:52 -07:00
Charis Poag 6be92b9e26 [SWDEV-392571] Fix concise info when missing VRAM info
Updates:
    * [rocm-smi] Added larger app width size, which helps
      display missing device info
    * [rocm-smi] Added better context when rsmi_ret_ok
      does not return with RSMI_STATUS_SUCCESS
    * [rocm-smi] Removed all references to an
      undefined function (printLogNoDev())
    * [rocm-smi] Fixed not detecting non-int
      values when setting the voltage curve
    * [rocm-smi] Added better context on missing
      sysfs file when setting clock overdrive
      values
    * [rocm-smi] Fixed getMemInfo() calls not
      referencing tuple values (making it easier
      to read)
    * [rocm-smi] Silenced concise info spitting
      out errors for missing VRAM files, instead
      display which metric is "unsupported" if
      the files are missing
    * [rocm-smi] Updated function descriptions for
      rsmi_ret_ok & getMemInfo
    * [rocm-smi] Updated getMemInfo to provide a
      quiet call, to silence for concise info calls.
      This provides a way to keep the output clean.
    * [rocm-smi-lib] Added when using debug sysfs
      files, to state, which enums are enabled
      for debug

Change-Id: I0e9e0c97ccf71467ced0e1a1f71803327a8be2b7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-04-13 15:11:35 -04:00
Bill(Shuzhou) Liu b6789891b0 Validate the clock frequency when set it
Add the check of the clock frequency when set it.

Change-Id: I707291bfb5007bb69100c780af50a4b0f697bb37
2023-04-06 11:54:38 -04:00
Charis Poag 78a0812f7f [SWDEV-391036 + SWDEV-392933] Fixes for VoltRead and ComputePart.
Updates:
    * VoltRead - needed to properly send out RSMI_STATUS_NOT_SUPPORTED
      when device does not have voltage hwmon files
    * ComputePart. - test failure was likely caused due to EvtNotif
      causing conflicts (unknown exactly why). Test passes when
      moving it ahead of the event notifier. Both API calls may have
      a system resource issue, TBD.
    * rocm_smi_example - now indicates when an API call
      returns RSMI_STATUS_NOT_SUPPORTED or
      RSMI_STATUS_NOT_YET_IMPLEMENTED. Allows example to fully complete
      on systems which may not provide support for all API calls.

Change-Id: I520b8584e078d412414e8e5797c664220a7e823a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-04-05 12:44:29 -05:00
Bill(Shuzhou) Liu 58c83eb379 Increase the max BDF ID length
Increase the max length from 256 to 512.

Change-Id: I3114f7ce6852aafa9dfec0186f27c1121c939c69
2023-03-29 10:04:28 -04:00
Bill(Shuzhou) Liu 0c82a9d577 Correct subsystem name by matching device id.
The rsmi_dev_subsystem_name_get() only matches subvendor id and
subdevice id for a vendor. The change will also match device id.

Change-Id: Ife3aedaf6fc7390ed7fa62edbde40c2340689b23
2023-03-28 15:48:31 -05:00
AravindanC 778f3b7fdc SWDEV-351540 - ASAN packaging for rocm_smi_lib
Change-Id: Iab354d02d261a0270a3d118b825835fc6f021c15
2023-03-20 13:14:53 -07:00
Charis Poag f44d1ea8bc [SWDEV-387906] Fix rocm-smi initialize crash
Fix was needed due to hwmon updates.
Several voltage sensors (ex. vddgfx/vddnb)
are unsupported or not applicable
to upcoming hardware. This was not the case
for previous hardware sensors, resulting in
the rocm-smi crash observed.

Change-Id: Ib8593e10811638def26fc7a1eda29309e328db09
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-03-17 15:04:34 -05:00
Bill(Shuzhou) Liu 1b7eb4e1f4 Fix cppcheck static analysis report warning
Fix some warning from static anaysis tool.

Change-Id: I7e8c2f5d6f79aff5fdcad81b1fd832900f213c47
2023-03-13 09:27:19 -05:00
Ranjith Ramakrishnan 14b86107a7 SWDEV-366831 - Compile time flag to switch between #warning and #error message
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message

Change-Id: Ib49633501aa6eb6d97158b1ecfc47de6f18fba85
2023-03-10 08:56:45 -08:00
Bill(Shuzhou) Liu 710649ab66 Filter out the GPUs not assigned to a container in showpid
The process ids of other container are still visible in the sysfs file,
filter it out to prevent crash.

Change-Id: I665912cd09c606804186aff8cba5c24f5e58ded7
2023-03-06 11:05:02 -06:00
Charis Poag c252ecccd1 [SWDEV-335697 + SWDEV-342812] Fix NPS & Compute tests
Updates:
    * Fixed rsmi_dev_compute_partition_get
      & rsmi_dev_nps_mode_get to properly check
      for invalid arguments
    * Updated compute partition & NPS mode tests
      - Now properly confirms the invalid
        argument is seen
      - Spacing for multiple devices is added
        to better see distinction between
        separate device's tests (for verbose output)
      - Changed expect to assert calls, so errors
        are observed faster for test failures
      - Fixed multiple device testing where a
        variable should have been unset, but
        having multiple devices caused it to
        set
      - Updated multiple device testing to iterate
        accross all devices (previously returned,
        instead of continuing checking support
        after RSMI_STATUS_NOT_SUPPORTED detected)
      - Fixed a few spelling errors & verbose output

Change-Id: Ieba9e5b46763c6cd880fbf27fcdf58be8ecbc683
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-03-02 13:24:38 -06:00
Bill(Shuzhou) Liu fcb6afa289 mem_use_pct uninitialized error
Initialize mem_use_pct if the memory info is not available.

Change-Id: Id8e285050149c51077356826c8f99719b473060d
2023-02-27 16:47:45 -06:00
Charis Poag 0d3558945b [SWDEV-335697] Add RSMI_STATUS_SETTING_UNAVAILABLE for dynamic partition
Updates:
    * Added RSMI_STATUS_SETTING_UNAVAILABLE for
      rsmi_dev_compute_partition_set - gives users
      better error output when attempting to set
      compute partition to values not listed in
      available_compute_partition SYSFS
    * Updated python --setcomputepartition to
      provide better output when receiving
      RSMI_STATUS_SETTING_UNAVAILABLE
    * Updated all test & example files to check for
      RSMI_STATUS_SETTING_UNAVAILABLE when doing
      rsmi_dev_compute_partition_set

Change-Id: Ida5d54880d9b9b6e4a0468cdb962fdc0c18d6257
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-02-27 11:17:44 -06:00
Bill(Shuzhou) Liu 55bc2e2072 Memory usage division by zero
The showAllConcise with division by zero error.

Change-Id: I469f1b9f268842cd51662be6f9036f555a8949b2
2023-02-24 10:12:36 -06:00
Bill(Shuzhou) Liu b40933b895 Use Unified Changelog Template
The CHANGELOG.md is added to track changes.

Change-Id: I33547cb7f1596b4b8abf206aebdd664649d4d19f
2023-02-21 14:27:55 -06:00
Charis Poag 77c950a4bf [SWDEV-381630] Add reset partition functionality
Updates:
    * Added rsmi_dev_compute_partition_reset & rsmi_dev_nps_mode_reset
    * Added --resetcomputepartition and --resetnpsmode python smi calls
    * Added temp data files rocmsmi_boot_compute_partition_<device num>
      & rocmsmi_boot_nps_mode_partition_<device num>, writes UNKNOWN
      if data cannot be read or device does not support
    * Cleaned up NPS & compute API documentation
    * Added creation and reading of API temp files (used in reset
      functionality)
    * Cleaned up output of rocm_smi_example
    * Updated rocm_smi_example to check if running with sudo permission
      before executing write API calls (cleans up erroneous output)
    * Added template specialization for storing temp data, requires
      specific rsmi_type_t enums (restrics what data can be stored)
    * Added storage of temp data, if temp files do not exist
    * Updated google tests for NPS & compute to include reset API calls

Change-Id: I69895a466b97107617e6dbb355737b84499a76c9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-02-17 12:55:08 -06:00
Charis Poag 9ef376cd61 SWDEV-342812- Add NPS support
Updates:
    * Added rsmi_dev_nps_mode_set and rsmi_dev_nps_mode_get
    * Added ability to set multiple SYSFS files in debug build
    * Added ability to see user's env variables set for debug build
    * Added tests for rsmi_dev_nps_mode_set and rsmi_dev_nps_mode_get
    * Added ability to restart AMD GPU driver, used in nps_mode_set
    * Updated ROCm_SMI_Manual.pdf to include new APIs
    * Added progress bar for long running python_smi_tools, used
      in setting nps_mode if runs longer than .1 seconds

Change-Id: I6d61bedd28d7cba6aff432ad2d127ba741b7d15a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-02-14 11:54:24 -06:00
Bill(Shuzhou) Liu ae10e842af rocm-smi --showxgmierr return error instead of error counter values
The rocm-smi pass wrong arguments

Change-Id: I3a3923abdd263d4af87f3ec90670bb16afa2ef9b
2023-02-13 16:36:24 -05:00
Bill(Shuzhou) Liu 00a6c78a51 Dispaly printable device name
Fallback to other methods if the device name in sysfs is not printable.

Change-Id: I20b22950399d4a515d2688b955248a3de3c61d05
2023-02-10 11:32:46 -05:00
Ranjith Ramakrishnan 02141a7f1d SWDEV-366831 - File reorg backward compatibility message changed to #error
Change-Id: I3d3b220b31f42140eab5404df790a130d2c238c4
2023-02-08 14:25:16 -08:00
Ori Messinger 56f9d6bfc0 ROCm SMI CLI: Fix --showproductname bug
This patch fixes a --showproductname bug, which is related to the
device's SKU. If a device with a VBIOS value that cannot be decoded
is used, that device's SKU cannot be parsed out of the VBIOS string.

Now, when the VBIOS value cannot be decoded, an error will be
printed instead of crashing with an 'UnboundLocalError' message.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I680a182e94107e782235b8a2477ab165988f7703
2023-02-02 14:52:13 -05:00