Gráfico de commits

506 Commits

Autor SHA1 Mensaje Fecha
Galantsev, Dmitrii 76aec0097a TESTS - re-enable frequency tests on aqua_vanjaram
Change-Id: I8fcd9418da5b973897ccfffc7d8a2f3ea833ea77
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: ff992e9b56]
2023-09-11 19:43:25 -05:00
Galantsev, Dmitrii ab1ca937c7 SWDEV-409184 - Fix erroneous 'not supported' when HWMON is absent
Change-Id: Ic5ff406977d962fadc709a03853dac61b5460a26
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 41ade41d84]
2023-09-11 19:34:30 -05:00
Charis Poag d975792f47 Add GPU partition nodes
* Updates:
    - Fixed infinit loop on systems
      which did not have VRAM files
    - Fixed concise info from throwing exception
      with no amdgpu driver loaded
    - Fix for ability to see all nodes when
      after switching partitions (mirrors
      original card display/settings)
    - Added to logs build type, lib path,
      and set env. variables

Change-Id: Ic0333df355144ce2242cecea93fe4ce51caf311c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: ed6777a8e7]
2023-09-07 22:17:54 -05:00
Galantsev, Dmitrii 9da052436a Cleanup rocm_smi.cc
Change-Id: Ia676c237222b0dd5d9e8a054a93776f3b11e2225
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 4aef767596]
2023-09-07 15:50:40 -04:00
Bill(Shuzhou) Liu ac4131905d Fix doxygen warning messages
The Doxygen will enable warning as error message.

Change-Id: Ie7a7c9a823388c4140f31489604d65ec43005772


[ROCm/rocm_smi_lib commit: fab0542ab1]
2023-09-07 08:48:38 -04:00
Oliveira, Daniel a044785231 rocm_smi_lib: Fix rocm-smi --showfan shows 'unable to detect fan'
Code changes related to the following:
  * Reverts earlier fix for the same issue
  * Check for existence of files before reading

Change-Id: I175b20c3343c414b12b79dc3fc404f53fbaabf3a
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 328ce0150b]
2023-08-30 14:45:05 -05:00
Galantsev, Dmitrii 0679c33b1a TESTS - Add 90402 and simplify description
Change-Id: Ie6ab12d4201841fcb832d6827a5ec0ae5bb65114
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 84e90e55d5]
2023-08-25 14:01:53 -05:00
Bill(Shuzhou) Liu 087a642570 Numa affinity shows large number
Change the affinity from unsigned int to integer to represent -1.

Change-Id: I82dc6f476b45fa4ec03a3c686fe8e6e2b7761b56


[ROCm/rocm_smi_lib commit: 471fbfddc1]
2023-08-25 09:01:08 -04:00
Oliveira, Daniel 2b199f14ec rocm_smi_lib/rocm_smi.py: Fix Add 'GPU name' in rocm-smi output
Code changes related to the following:
  * rocm_smi.py

Change-Id: I600e776bf479f972b8d639ce5a658a24916aed3c
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 3602447109]
2023-08-24 13:22:24 -05:00
Oliveira, Daniel 3992aafc08 rocm_smi_lib/rocm_smi.py: Fix rocm-smi --resetfans shows 'permission denied'
Properly handles 'Not supported' fan cases where:
 * sysfs file (pwm#_enable) exists
 * sysfs file (pwm#_enable) does not exist

Change-Id: Ifa3c290e5ee1d27a550e94d86cd25ad8dcef3f59
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 654f65118b]
2023-08-24 10:54:50 -05:00
Oliveira, Daniel c166a66017 rocm_smi_lib/rocm_smi.py: Fix rocm-smi --showfan shows 'unable to detect fan'
Properly handles 'Unable to detect' vs 'Not supported' fan cases where:
 * sysfs file (pwm#) exists, and readings report zero (0), "Unable to detect fan speed"
 * sysfs file (pwm#) does not exist, then "Not supported"

Change-Id: If4b0312c872b76647a3e54427ba2a3f3e8e6dab1
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: f9fd6b0a96]
2023-08-23 20:43:08 -05:00
Charis Poag ae5f3d6ceb Error handling for unset freqs
Sending RSMI_STATUS_UNEXPECTED_DATA for drivers
which do not set some clock freqs

Change-Id: I43a9515c2757dddd412bb25cfd54095e63367030
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: f191c2753c]
2023-08-23 10:44:57 -05:00
Galantsev, Dmitrii 8c557e16cc TESTS - Fix incorrect TestVoltCurvRead assert if not supported
Change-Id: I2242aa9be84543276c63f1f57fdc489754c9ee07
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 613bd8ad1d]
2023-08-22 16:51:42 -04:00
Galantsev, Dmitrii 7d96f95fb2 .editorconfig - Remove broken whitespace rule
Change-Id: I67260f1f1952609dc89834d0763acd732bf39860
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 548b68cb67]
2023-08-22 16:51:20 -04:00
Galantsev, Dmitrii 2b64402b90 TESTS - Use gpu version as a workaround for a missing name
Depends-On: Ifbd38f11fbde7ba28af4be1d611310dea1b5112a
Change-Id: Ia7b7975f03424854df0a470b2719cf2ff2cf8e40
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 62f01cb150]
2023-08-21 19:18:22 -04:00
Bill(Shuzhou) Liu c695336b6d Fallback to kfd node when VRAM sysfs not available
The driver may not expose VRAM sysfs in certain system. Add a
fallback to it.

Change-Id: Ib3be71b4f4d2c79318d5026b0a97f3657d8a97b6


[ROCm/rocm_smi_lib commit: a10f00bf57]
2023-08-17 14:36:03 -05:00
Charis Poag 47420111a8 [SWDEV-399953] Smart Temperature detection + partitioning display
* Updates:
    - Fix for devices which do not have edge sensors, but junction
    - Added partitioning (memory and dynamic) displays for
      base rocm-smi CLI calls
    - Added subheading for base rocm-smi call output
    - Added better hwmon and device detection logging

Change-Id: I8219884b2e532d6ed379527cacdc1f2b232a5451
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 755e14dbad]
2023-08-10 19:53:38 -04:00
Oliveira, Daniel a75b7f741c Fix rsmitstReadWrite.TestPowerReadWrite test failure
Code changes related to the following:
  * All reinforcement work moved to their own files
  * Self contained changes only to support them
  * New files added to CMakeLists.txt

Change-Id: I761e91f54392824df9145eaed8b9805986861285
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: cc5ab079df]
2023-08-09 21:51:05 -05:00
Ranjith Ramakrishnan 9d347e9e2f SWDEV-366827 - Disable file reorg backward compatibility support by default
Change-Id: I1de06d0d6a30c8c862d768b58460ef1b49d15e29


[ROCm/rocm_smi_lib commit: 9406cdd832]
2023-08-07 09:21:19 -07:00
Charis Poag 4e39fe3e25 [lib] Enhance Logger: gpu_metrics + enable console out
* Updates:
    - Env variable RSMI_LOGGING=0 or any other value
        -> all logging off
    - Env variable RSMI_LOGGING=1 -> logs only
    - Env variable RSMI_LOGGING=2 -> console only
    - Env variable RSMI_LOGGING=3 -> both logs + console
    - Metrics output includes hexdump of current file
      and decoded metrics (functions: logHexDump
      and log_gpu_metrics)
    - System info gathered, now includes if system's
      perceived endianness - little or big endian
      helpful for viewing decoded hexdump or any
      binary translation
    - Added templates for printing unsigned hex
      (print_unsigned_hex_and_int), unsigned integers
      (print_unsigned_int), and printing both unsigned
      hex and int with an optional header
      (print_unsigned_hex_and_int)
    - Fixed some build compile warnings/errors -
      ex. doing strncpys for sku or board names
      this operation is expected and needed
      and for temp file writes if unsuccessful
      we now properly send RSMI_STATUS_FILE_ERROR
    - Fixed on RHEL 8.8/9.x logrotate does not properly
      initialize

Change-Id: Ifa0f0218c9cafd0a8cd6aa8e7f94d61e9107200f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 9c7eed7edc]
2023-08-01 21:46:19 -05:00
Bill(Shuzhou) Liu 3bdcaa1deb Crash when ecc count sysfile cannot be read
Replace assert with error handling code.

Change-Id: I6500ae4d38a8caea87828aa7d76373d20c8354c7


[ROCm/rocm_smi_lib commit: 0522439ac2]
2023-07-31 08:36:53 -05:00
Bill(Shuzhou) Liu f9f936e3d2 Change reset power error message to logging
Since the reset will continue if the reset power and current power
is the same, error may confuse the user.

Change-Id: I35b9ef17afd47b5af5bd2b8882a44f63991fe509


[ROCm/rocm_smi_lib commit: aeb6c61f54]
2023-07-27 15:18:28 -05:00
Bill(Shuzhou) Liu aca23ecb0b Handle csv output when the command is not based on the device
Fix the error only one csv line can be printed out when output
is not based on device.

Change-Id: Idacc5d98acc223e932fb3d46c888bfa04778b73c


[ROCm/rocm_smi_lib commit: 80d650b95a]
2023-07-26 15:28:18 -05:00
Maisam Arif 60a4b3cb19 SWDEV-394316 - Handle not applicable vbios
Change-Id: I3390078a63c9a5eff67024b84a3be1369c4b1460
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>


[ROCm/rocm_smi_lib commit: c78ec46671]
2023-07-25 16:33:22 -05:00
Charis Poag 9b5289aff7 Update logging and README for other project usage
Updates:
    * [rocm-smi] Logging now can update files on
      per-project-basis for install/remove
    * [rocm-smi] README now has latest build
      instructions, including test builds
    * [rocm-smi] Updated README to include
      revision dates

Change-Id: Ifb19a6f32ccf6938f47225db53fef88021909264
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 4613e8dec3]
2023-07-20 19:09:11 -05:00
Oliveira, Daniel 4c063f4038 Add revision to --showhw
Code changes related to the following:
  * Added 'rsmi_dev_revision_get()' related code
  * Test code
  * Functional tests

Change-Id: I8c2097c65384a028c8c8437b717d05d52fe45250
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/rocm_smi_lib commit: 573620f586]
2023-07-18 16:17:33 -05:00
Galantsev, Dmitrii 9045bac955 Fix sys and id tests
The following read tests were failing:
*.TestIdInfoRead
*.TestSysInfoRead

1. *.TestIdInfoRead failed because rsmi_dev_brand_get did not specify
   dependency on vbios_version.

2. *.TestSysInfoRead failed because the test didn't expect vbios_version to
   be missing. Which is a new behavior in Aqua Vanjaram.

Change-Id: I9ee88a12fcf6cff2032049e2ecdfb2957efb03ab
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 8fe848d10e]
2023-07-17 15:52:23 -04:00
Galantsev, Dmitrii e2bee8b2f9 Add .cache to gitignore
Change-Id: Ida03bf1f50704bea44827d7578cd74c1896d4368
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: b0fe2fbd07]
2023-07-17 15:52:23 -04:00
Bill(Shuzhou) Liu 8c41a911b4 rocm-smi --showevents shows wrong gpuID
Use the gpuid returned from the event data instead.

Change-Id: I7f286cc105f7ea12985223e603504f0ef3d9724e


[ROCm/rocm_smi_lib commit: 0aeb6025bd]
2023-07-13 08:28:53 -05:00
Galantsev, Dmitrii 48712cbeb5 Simplify gitignore
Remove generic gitignore to simplify tracking of generated files

Change-Id: Idf1f9719b2cfd16b31332a3ed87be5943c2c1ce7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: e6c42c6626]
2023-07-07 11:48:09 -04:00
Jeremy Newton 7bcd452099 Fix python loading of librocm_smi64
The librocm_smi64.so is used for development, while
librocm_smi64.so.MAJOR is used for runtime, thus the python front end
should not be loading the .so binary, but rather the .so.MAJOR binary.

As well, it's good not to hardcode "lib" as some distros will change
this.

rsmiBindings.py is now generated with CMake

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I7cb745f8936fdf10d3ebd6c1e606031f713184ca


[ROCm/rocm_smi_lib commit: 2d2c73a5e6]
2023-07-06 09:52:56 -04:00
Jeremy Newton 367d83b5e1 Only install asan license if enabled
Change-Id: I79c6fce84c23ed12e65db8e234a29dbfedd11f68
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/rocm_smi_lib commit: 828f46b445]
2023-06-30 23:34:43 -04:00
Jeremy Newton 0998343abd Actually fix version string
There seems to be a scope issue with the existing variables, but just
putting in the pkg version string seems sufficient.

Change-Id: I4ccef872ff848a70cb2abc07bf605c5f29a608e8
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/rocm_smi_lib commit: 4f481dd7f3]
2023-06-30 23:34:14 -04:00
Tom Rix 4480132c6d Improve handling of ContructBDFID errors
Building on this package on Fedora reports this warning
In file included from rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:62:
In member function 'amd::smi::Device::set_bdfid(unsigned long)',
    inlined from 'amd::smi::RocmSMI::Initialize(unsigned long)' at rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:330:27:
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/include/rocm_smi/rocm_smi_device.h:199:42: warning: 'bdfid' may be used uninitialized [-Wmaybe-uninitialized]
  199 |     void set_bdfid(uint64_t val) {bdfid_ = val;}
      |                                   ~~~~~~~^~~~~
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc: In member function 'amd::smi::RocmSMI::Initialize(unsigned long)':
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:324:12: note: 'bdfid' was declared here
  324 |   uint64_t bdfid;
      |            ^~~~~

Only set the bdfid when it is know to be valid.

Signed-off-by: Tom Rix <trix@redhat.com>
Change-Id: I839b4d2d2d4e3b25469cf5972245b9630da00c87


[ROCm/rocm_smi_lib commit: 19c3e2aff9]
2023-06-30 00:16:44 -04:00
Jeremy Newton 54d51a154b Update default version to match tags
When building from github, these tags don't exist, so the defaults
should try to match the internal tags

Change-Id: Id570341f27e21916b1a7f3605ee2b5b9716cad9b
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/rocm_smi_lib commit: 74dc98114f]
2023-06-30 00:16:22 -04:00
Jeremy Newton 6933161e82 Fix version file generation
This looks like a typo, as the following variables are not defined:
- AMD_SMI_LIBS_TARGET_VERSION_MAJOR
- AMD_SMI_LIBS_TARGET_VERSION_MINOR
- AMD_SMI_LIBS_TARGET_VERSION_PATCH

Change-Id: I43449e7bd2a2de643d33e79fad063a7859679c8d
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/rocm_smi_lib commit: 1a86dd75bb]
2023-06-29 14:42:30 -04:00
Jeremy Newton cca727f7e3 Fix python script install permissions
The keyword "PROGRAMS" should be used in place of "FILES" in order to
make sure executable scripts have the correct permissions.

Change-Id: I6c287dc1291774ad6d97a04d621957dea0a1b697
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/rocm_smi_lib commit: d00d885394]
2023-06-27 14:57:59 -04:00
Bill(Shuzhou) Liu 87c5c38d44 Crash if no hwmon sysfs
Return NOT_SUPPORTED if no hwmon sysfs.

Change-Id: I01356a21f004ab552ca6ef7ffb49934bfdfd5e31


[ROCm/rocm_smi_lib commit: 910bf677a9]
2023-06-26 08:00:32 -05:00
Galantsev, Dmitrii e9dd6bfb51 SWDEV-406542 - Add gtest to install targets
Change-Id: I116505aaa33109fce66ab8daf9921e2de11a27d4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 82078565e9]
2023-06-20 11:14:56 -05:00
Galantsev, Dmitrii 5b99944421 SWDEV-391041 - Disable TestPowerReadWrite
Change-Id: I56b5bea3e5206a6f0d5ecdb482103881f80f0b8b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 9519d5b8cf]
2023-06-16 15:18:27 -04:00
Galantsev, Dmitrii ab5f6e5872 Assign tests to aqua_vanjaram
Change-Id: Iee78b1e810356327261006087b081e39dab0b9e8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: e7585cc045]
2023-06-16 15:18:27 -04:00
Bill(Shuzhou) Liu 86a1deaca9 Expand showpids to provide more details
Provide details of GPU usage by an application.

Change-Id: I0f36df7d358754c2c8a60432b736d98f667ee99c


[ROCm/rocm_smi_lib commit: d9b6af7a09]
2023-06-16 08:52:18 -04:00
Galantsev, Dmitrii 9f534b1c6b SWDEV-340919 - Package rsmitst
Similar to I879b21428e6642f19fda67092b365d8b78b7ba7b.

Main CMake improvements:

* Add rsmitst with -DBUILD_TESTS=ON
* Package tests into rocm-smi-lib-tests.deb and .rpm
* Note - this breaks build_rsmitst.sh

Misc improvements:

* Add .editorconfig to normalize code formatting
* Export compile_commands.json
* Remove gtest source and pull from github instead

Change-Id: Ib87ed4a5acd9f78badae6d028e5ff3d4f56dafc2
Depends-On: I8b26795471ad1432c805e45d8b58d7bb34abfcfc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 0478d53e23]
2023-06-13 22:52:10 -05:00
Galantsev, Dmitrii f46d020feb Temporarily ignore TestFrequencies
See SWDEV-391039 and SWDEV-391040 for details

Change-Id: I662ba43363d949465454ea4af4d4586b3d47a811
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: ac94bf5ed5]
2023-06-12 19:26:21 -05:00
Galantsev, Dmitrii 61ac3f8ca9 --showtempgraph - Show N/A when no temp found
If temp in hwmon was missing - rocm-smi crashed.
e.g. /sys/class/drm/card1/device/hwmon/hwmon5/temp1_input

This change displays "N/A" for temp instead of crashing.

Change-Id: I02f84a466bd3acfbd9b65e7e4ca0f18e76606c3b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 713f85721b]
2023-06-12 19:16:39 -05:00
Maisam Arif a9d4f69eea SWDEV-404157 - Fixed printLog delimiter parsing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I3d8e22d185790f4325aeacc18e4bfcfe8777d356


[ROCm/rocm_smi_lib commit: 00e170c2f5]
2023-06-08 20:02:51 -05:00
Galantsev, Dmitrii 432e74ba08 Fix test temp blacklist, ignore TestVoltCurvRead
Change-Id: I86fa14fdc06e1b170a0bc0c0727fc08e4f4e2074
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: f78f9a4082]
2023-06-06 17:02:14 -04:00
Charis Poag 4cab3bb312 [SWDEV-402336 + SWDEV-398070] Fix RPM install part2
Updates:
    [rocm-smi] RPM installation comment included a macro,
    now removed

Change-Id: Ifa7a8d2d1a713940c39e20df9d02635e0e623dd8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: e2dec17284]
2023-06-05 13:50:57 -05:00
Galantsev, Dmitrii 6aef6b09ea Clean-up python errors and warnings
Used pyright to show errors and warnings and resolved most

Change-Id: I0fdf7dcdf08db5c35dec80f6645e0a395fbe4197
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: e8391c9d7c]
2023-06-01 17:37:57 -04:00
Charis Poag b96cc5b897 [SWDEV-402336 + SWDEV-398070] Fix RPM install - override macros
Updates:
    * [rocm-smi] RPM installation now overrides macro usage

Change-Id: I2a5ba14670becc178f672182eabe71965a526178
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: b0f2a9d2ef]
2023-06-01 11:58:42 -04:00