İşleme Grafiği

556 İşleme

Yazar SHA1 Mesaj Tarih
Galantsev, Dmitrii ff992e9b56 TESTS - re-enable frequency tests on aqua_vanjaram
Change-Id: I8fcd9418da5b973897ccfffc7d8a2f3ea833ea77
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-11 19:43:25 -05:00
Galantsev, Dmitrii 41ade41d84 SWDEV-409184 - Fix erroneous 'not supported' when HWMON is absent
Change-Id: Ic5ff406977d962fadc709a03853dac61b5460a26
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-11 19:34:30 -05:00
Charis Poag ed6777a8e7 Add GPU partition nodes
* Updates:
    - Fixed infinit loop on systems
      which did not have VRAM files
    - Fixed concise info from throwing exception
      with no amdgpu driver loaded
    - Fix for ability to see all nodes when
      after switching partitions (mirrors
      original card display/settings)
    - Added to logs build type, lib path,
      and set env. variables

Change-Id: Ic0333df355144ce2242cecea93fe4ce51caf311c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-09-07 22:17:54 -05:00
Galantsev, Dmitrii 4aef767596 Cleanup rocm_smi.cc
Change-Id: Ia676c237222b0dd5d9e8a054a93776f3b11e2225
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-07 15:50:40 -04:00
Bill(Shuzhou) Liu fab0542ab1 Fix doxygen warning messages
The Doxygen will enable warning as error message.

Change-Id: Ie7a7c9a823388c4140f31489604d65ec43005772
2023-09-07 08:48:38 -04:00
Oliveira, Daniel 328ce0150b rocm_smi_lib: Fix rocm-smi --showfan shows 'unable to detect fan'
Code changes related to the following:
  * Reverts earlier fix for the same issue
  * Check for existence of files before reading

Change-Id: I175b20c3343c414b12b79dc3fc404f53fbaabf3a
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-30 14:45:05 -05:00
Galantsev, Dmitrii 84e90e55d5 TESTS - Add 90402 and simplify description
Change-Id: Ie6ab12d4201841fcb832d6827a5ec0ae5bb65114
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-25 14:01:53 -05:00
Bill(Shuzhou) Liu 471fbfddc1 Numa affinity shows large number
Change the affinity from unsigned int to integer to represent -1.

Change-Id: I82dc6f476b45fa4ec03a3c686fe8e6e2b7761b56
2023-08-25 09:01:08 -04:00
Oliveira, Daniel 3602447109 rocm_smi_lib/rocm_smi.py: Fix Add 'GPU name' in rocm-smi output
Code changes related to the following:
  * rocm_smi.py

Change-Id: I600e776bf479f972b8d639ce5a658a24916aed3c
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 13:22:24 -05:00
Oliveira, Daniel 654f65118b rocm_smi_lib/rocm_smi.py: Fix rocm-smi --resetfans shows 'permission denied'
Properly handles 'Not supported' fan cases where:
 * sysfs file (pwm#_enable) exists
 * sysfs file (pwm#_enable) does not exist

Change-Id: Ifa3c290e5ee1d27a550e94d86cd25ad8dcef3f59
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 10:54:50 -05:00
Oliveira, Daniel f9fd6b0a96 rocm_smi_lib/rocm_smi.py: Fix rocm-smi --showfan shows 'unable to detect fan'
Properly handles 'Unable to detect' vs 'Not supported' fan cases where:
 * sysfs file (pwm#) exists, and readings report zero (0), "Unable to detect fan speed"
 * sysfs file (pwm#) does not exist, then "Not supported"

Change-Id: If4b0312c872b76647a3e54427ba2a3f3e8e6dab1
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-23 20:43:08 -05:00
Charis Poag f191c2753c Error handling for unset freqs
Sending RSMI_STATUS_UNEXPECTED_DATA for drivers
which do not set some clock freqs

Change-Id: I43a9515c2757dddd412bb25cfd54095e63367030
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-23 10:44:57 -05:00
Galantsev, Dmitrii 613bd8ad1d TESTS - Fix incorrect TestVoltCurvRead assert if not supported
Change-Id: I2242aa9be84543276c63f1f57fdc489754c9ee07
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-22 16:51:42 -04:00
Galantsev, Dmitrii 548b68cb67 .editorconfig - Remove broken whitespace rule
Change-Id: I67260f1f1952609dc89834d0763acd732bf39860
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-22 16:51:20 -04:00
Galantsev, Dmitrii 62f01cb150 TESTS - Use gpu version as a workaround for a missing name
Depends-On: Ifbd38f11fbde7ba28af4be1d611310dea1b5112a
Change-Id: Ia7b7975f03424854df0a470b2719cf2ff2cf8e40
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-21 19:18:22 -04:00
Bill(Shuzhou) Liu a10f00bf57 Fallback to kfd node when VRAM sysfs not available
The driver may not expose VRAM sysfs in certain system. Add a
fallback to it.

Change-Id: Ib3be71b4f4d2c79318d5026b0a97f3657d8a97b6
2023-08-17 14:36:03 -05:00
Charis Poag 755e14dbad [SWDEV-399953] Smart Temperature detection + partitioning display
* Updates:
    - Fix for devices which do not have edge sensors, but junction
    - Added partitioning (memory and dynamic) displays for
      base rocm-smi CLI calls
    - Added subheading for base rocm-smi call output
    - Added better hwmon and device detection logging

Change-Id: I8219884b2e532d6ed379527cacdc1f2b232a5451
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-10 19:53:38 -04:00
Oliveira, Daniel cc5ab079df Fix rsmitstReadWrite.TestPowerReadWrite test failure
Code changes related to the following:
  * All reinforcement work moved to their own files
  * Self contained changes only to support them
  * New files added to CMakeLists.txt

Change-Id: I761e91f54392824df9145eaed8b9805986861285
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-09 21:51:05 -05:00
Ranjith Ramakrishnan 9406cdd832 SWDEV-366827 - Disable file reorg backward compatibility support by default
Change-Id: I1de06d0d6a30c8c862d768b58460ef1b49d15e29
2023-08-07 09:21:19 -07:00
Charis Poag 9c7eed7edc [lib] Enhance Logger: gpu_metrics + enable console out
* Updates:
    - Env variable RSMI_LOGGING=0 or any other value
        -> all logging off
    - Env variable RSMI_LOGGING=1 -> logs only
    - Env variable RSMI_LOGGING=2 -> console only
    - Env variable RSMI_LOGGING=3 -> both logs + console
    - Metrics output includes hexdump of current file
      and decoded metrics (functions: logHexDump
      and log_gpu_metrics)
    - System info gathered, now includes if system's
      perceived endianness - little or big endian
      helpful for viewing decoded hexdump or any
      binary translation
    - Added templates for printing unsigned hex
      (print_unsigned_hex_and_int), unsigned integers
      (print_unsigned_int), and printing both unsigned
      hex and int with an optional header
      (print_unsigned_hex_and_int)
    - Fixed some build compile warnings/errors -
      ex. doing strncpys for sku or board names
      this operation is expected and needed
      and for temp file writes if unsuccessful
      we now properly send RSMI_STATUS_FILE_ERROR
    - Fixed on RHEL 8.8/9.x logrotate does not properly
      initialize

Change-Id: Ifa0f0218c9cafd0a8cd6aa8e7f94d61e9107200f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-01 21:46:19 -05:00
Bill(Shuzhou) Liu 0522439ac2 Crash when ecc count sysfile cannot be read
Replace assert with error handling code.

Change-Id: I6500ae4d38a8caea87828aa7d76373d20c8354c7
2023-07-31 08:36:53 -05:00
Bill(Shuzhou) Liu aeb6c61f54 Change reset power error message to logging
Since the reset will continue if the reset power and current power
is the same, error may confuse the user.

Change-Id: I35b9ef17afd47b5af5bd2b8882a44f63991fe509
2023-07-27 15:18:28 -05:00
Bill(Shuzhou) Liu 80d650b95a Handle csv output when the command is not based on the device
Fix the error only one csv line can be printed out when output
is not based on device.

Change-Id: Idacc5d98acc223e932fb3d46c888bfa04778b73c
2023-07-26 15:28:18 -05:00
Maisam Arif c78ec46671 SWDEV-394316 - Handle not applicable vbios
Change-Id: I3390078a63c9a5eff67024b84a3be1369c4b1460
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2023-07-25 16:33:22 -05:00
Charis Poag 4613e8dec3 Update logging and README for other project usage
Updates:
    * [rocm-smi] Logging now can update files on
      per-project-basis for install/remove
    * [rocm-smi] README now has latest build
      instructions, including test builds
    * [rocm-smi] Updated README to include
      revision dates

Change-Id: Ifb19a6f32ccf6938f47225db53fef88021909264
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-07-20 19:09:11 -05:00
Oliveira, Daniel 573620f586 Add revision to --showhw
Code changes related to the following:
  * Added 'rsmi_dev_revision_get()' related code
  * Test code
  * Functional tests

Change-Id: I8c2097c65384a028c8c8437b717d05d52fe45250
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-07-18 16:17:33 -05:00
Galantsev, Dmitrii 8fe848d10e Fix sys and id tests
The following read tests were failing:
*.TestIdInfoRead
*.TestSysInfoRead

1. *.TestIdInfoRead failed because rsmi_dev_brand_get did not specify
   dependency on vbios_version.

2. *.TestSysInfoRead failed because the test didn't expect vbios_version to
   be missing. Which is a new behavior in Aqua Vanjaram.

Change-Id: I9ee88a12fcf6cff2032049e2ecdfb2957efb03ab
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-07-17 15:52:23 -04:00
Galantsev, Dmitrii b0fe2fbd07 Add .cache to gitignore
Change-Id: Ida03bf1f50704bea44827d7578cd74c1896d4368
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-07-17 15:52:23 -04:00
Bill(Shuzhou) Liu 0aeb6025bd rocm-smi --showevents shows wrong gpuID
Use the gpuid returned from the event data instead.

Change-Id: I7f286cc105f7ea12985223e603504f0ef3d9724e
2023-07-13 08:28:53 -05:00
Galantsev, Dmitrii e6c42c6626 Simplify gitignore
Remove generic gitignore to simplify tracking of generated files

Change-Id: Idf1f9719b2cfd16b31332a3ed87be5943c2c1ce7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-07-07 11:48:09 -04:00
Jeremy Newton 2d2c73a5e6 Fix python loading of librocm_smi64
The librocm_smi64.so is used for development, while
librocm_smi64.so.MAJOR is used for runtime, thus the python front end
should not be loading the .so binary, but rather the .so.MAJOR binary.

As well, it's good not to hardcode "lib" as some distros will change
this.

rsmiBindings.py is now generated with CMake

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I7cb745f8936fdf10d3ebd6c1e606031f713184ca
2023-07-06 09:52:56 -04:00
Jeremy Newton 828f46b445 Only install asan license if enabled
Change-Id: I79c6fce84c23ed12e65db8e234a29dbfedd11f68
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-30 23:34:43 -04:00
Jeremy Newton 4f481dd7f3 Actually fix version string
There seems to be a scope issue with the existing variables, but just
putting in the pkg version string seems sufficient.

Change-Id: I4ccef872ff848a70cb2abc07bf605c5f29a608e8
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-30 23:34:14 -04:00
Tom Rix 19c3e2aff9 Improve handling of ContructBDFID errors
Building on this package on Fedora reports this warning
In file included from rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:62:
In member function 'amd::smi::Device::set_bdfid(unsigned long)',
    inlined from 'amd::smi::RocmSMI::Initialize(unsigned long)' at rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:330:27:
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/include/rocm_smi/rocm_smi_device.h:199:42: warning: 'bdfid' may be used uninitialized [-Wmaybe-uninitialized]
  199 |     void set_bdfid(uint64_t val) {bdfid_ = val;}
      |                                   ~~~~~~~^~~~~
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc: In member function 'amd::smi::RocmSMI::Initialize(unsigned long)':
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:324:12: note: 'bdfid' was declared here
  324 |   uint64_t bdfid;
      |            ^~~~~

Only set the bdfid when it is know to be valid.

Signed-off-by: Tom Rix <trix@redhat.com>
Change-Id: I839b4d2d2d4e3b25469cf5972245b9630da00c87
2023-06-30 00:16:44 -04:00
Jeremy Newton 74dc98114f Update default version to match tags
When building from github, these tags don't exist, so the defaults
should try to match the internal tags

Change-Id: Id570341f27e21916b1a7f3605ee2b5b9716cad9b
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-30 00:16:22 -04:00
Jeremy Newton 1a86dd75bb Fix version file generation
This looks like a typo, as the following variables are not defined:
- AMD_SMI_LIBS_TARGET_VERSION_MAJOR
- AMD_SMI_LIBS_TARGET_VERSION_MINOR
- AMD_SMI_LIBS_TARGET_VERSION_PATCH

Change-Id: I43449e7bd2a2de643d33e79fad063a7859679c8d
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-29 14:42:30 -04:00
Jeremy Newton d00d885394 Fix python script install permissions
The keyword "PROGRAMS" should be used in place of "FILES" in order to
make sure executable scripts have the correct permissions.

Change-Id: I6c287dc1291774ad6d97a04d621957dea0a1b697
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-27 14:57:59 -04:00
Bill(Shuzhou) Liu 910bf677a9 Crash if no hwmon sysfs
Return NOT_SUPPORTED if no hwmon sysfs.

Change-Id: I01356a21f004ab552ca6ef7ffb49934bfdfd5e31
2023-06-26 08:00:32 -05:00
Galantsev, Dmitrii 82078565e9 SWDEV-406542 - Add gtest to install targets
Change-Id: I116505aaa33109fce66ab8daf9921e2de11a27d4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-20 11:14:56 -05:00
Galantsev, Dmitrii 9519d5b8cf SWDEV-391041 - Disable TestPowerReadWrite
Change-Id: I56b5bea3e5206a6f0d5ecdb482103881f80f0b8b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-16 15:18:27 -04:00
Galantsev, Dmitrii e7585cc045 Assign tests to aqua_vanjaram
Change-Id: Iee78b1e810356327261006087b081e39dab0b9e8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-16 15:18:27 -04:00
Bill(Shuzhou) Liu d9b6af7a09 Expand showpids to provide more details
Provide details of GPU usage by an application.

Change-Id: I0f36df7d358754c2c8a60432b736d98f667ee99c
2023-06-16 08:52:18 -04:00
Galantsev, Dmitrii 0478d53e23 SWDEV-340919 - Package rsmitst
Similar to I879b21428e6642f19fda67092b365d8b78b7ba7b.

Main CMake improvements:

* Add rsmitst with -DBUILD_TESTS=ON
* Package tests into rocm-smi-lib-tests.deb and .rpm
* Note - this breaks build_rsmitst.sh

Misc improvements:

* Add .editorconfig to normalize code formatting
* Export compile_commands.json
* Remove gtest source and pull from github instead

Change-Id: Ib87ed4a5acd9f78badae6d028e5ff3d4f56dafc2
Depends-On: I8b26795471ad1432c805e45d8b58d7bb34abfcfc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-13 22:52:10 -05:00
Galantsev, Dmitrii ac94bf5ed5 Temporarily ignore TestFrequencies
See SWDEV-391039 and SWDEV-391040 for details

Change-Id: I662ba43363d949465454ea4af4d4586b3d47a811
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-12 19:26:21 -05:00
Galantsev, Dmitrii 713f85721b --showtempgraph - Show N/A when no temp found
If temp in hwmon was missing - rocm-smi crashed.
e.g. /sys/class/drm/card1/device/hwmon/hwmon5/temp1_input

This change displays "N/A" for temp instead of crashing.

Change-Id: I02f84a466bd3acfbd9b65e7e4ca0f18e76606c3b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-12 19:16:39 -05:00
Maisam Arif 00e170c2f5 SWDEV-404157 - Fixed printLog delimiter parsing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I3d8e22d185790f4325aeacc18e4bfcfe8777d356
2023-06-08 20:02:51 -05:00
Galantsev, Dmitrii f78f9a4082 Fix test temp blacklist, ignore TestVoltCurvRead
Change-Id: I86fa14fdc06e1b170a0bc0c0727fc08e4f4e2074
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-06 17:02:14 -04:00
Charis Poag e2dec17284 [SWDEV-402336 + SWDEV-398070] Fix RPM install part2
Updates:
    [rocm-smi] RPM installation comment included a macro,
    now removed

Change-Id: Ifa7a8d2d1a713940c39e20df9d02635e0e623dd8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-06-05 13:50:57 -05:00
Galantsev, Dmitrii e8391c9d7c Clean-up python errors and warnings
Used pyright to show errors and warnings and resolved most

Change-Id: I0fdf7dcdf08db5c35dec80f6645e0a395fbe4197
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-01 17:37:57 -04:00
Charis Poag b0f2a9d2ef [SWDEV-402336 + SWDEV-398070] Fix RPM install - override macros
Updates:
    * [rocm-smi] RPM installation now overrides macro usage

Change-Id: I2a5ba14670becc178f672182eabe71965a526178
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-06-01 11:58:42 -04:00