* Updates:
- Fix for devices which do not have edge sensors, but junction
- Added partitioning (memory and dynamic) displays for
base rocm-smi CLI calls
- Added subheading for base rocm-smi call output
- Added better hwmon and device detection logging
Change-Id: I8219884b2e532d6ed379527cacdc1f2b232a5451
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Code changes related to the following:
* All reinforcement work moved to their own files
* Self contained changes only to support them
* New files added to CMakeLists.txt
Change-Id: I761e91f54392824df9145eaed8b9805986861285
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
* Updates:
- Env variable RSMI_LOGGING=0 or any other value
-> all logging off
- Env variable RSMI_LOGGING=1 -> logs only
- Env variable RSMI_LOGGING=2 -> console only
- Env variable RSMI_LOGGING=3 -> both logs + console
- Metrics output includes hexdump of current file
and decoded metrics (functions: logHexDump
and log_gpu_metrics)
- System info gathered, now includes if system's
perceived endianness - little or big endian
helpful for viewing decoded hexdump or any
binary translation
- Added templates for printing unsigned hex
(print_unsigned_hex_and_int), unsigned integers
(print_unsigned_int), and printing both unsigned
hex and int with an optional header
(print_unsigned_hex_and_int)
- Fixed some build compile warnings/errors -
ex. doing strncpys for sku or board names
this operation is expected and needed
and for temp file writes if unsuccessful
we now properly send RSMI_STATUS_FILE_ERROR
- Fixed on RHEL 8.8/9.x logrotate does not properly
initialize
Change-Id: Ifa0f0218c9cafd0a8cd6aa8e7f94d61e9107200f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Since the reset will continue if the reset power and current power
is the same, error may confuse the user.
Change-Id: I35b9ef17afd47b5af5bd2b8882a44f63991fe509
Updates:
* [rocm-smi] Logging now can update files on
per-project-basis for install/remove
* [rocm-smi] README now has latest build
instructions, including test builds
* [rocm-smi] Updated README to include
revision dates
Change-Id: Ifb19a6f32ccf6938f47225db53fef88021909264
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Code changes related to the following:
* Added 'rsmi_dev_revision_get()' related code
* Test code
* Functional tests
Change-Id: I8c2097c65384a028c8c8437b717d05d52fe45250
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
The following read tests were failing:
*.TestIdInfoRead
*.TestSysInfoRead
1. *.TestIdInfoRead failed because rsmi_dev_brand_get did not specify
dependency on vbios_version.
2. *.TestSysInfoRead failed because the test didn't expect vbios_version to
be missing. Which is a new behavior in Aqua Vanjaram.
Change-Id: I9ee88a12fcf6cff2032049e2ecdfb2957efb03ab
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
The librocm_smi64.so is used for development, while
librocm_smi64.so.MAJOR is used for runtime, thus the python front end
should not be loading the .so binary, but rather the .so.MAJOR binary.
As well, it's good not to hardcode "lib" as some distros will change
this.
rsmiBindings.py is now generated with CMake
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I7cb745f8936fdf10d3ebd6c1e606031f713184ca
There seems to be a scope issue with the existing variables, but just
putting in the pkg version string seems sufficient.
Change-Id: I4ccef872ff848a70cb2abc07bf605c5f29a608e8
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Building on this package on Fedora reports this warning
In file included from rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:62:
In member function 'amd::smi::Device::set_bdfid(unsigned long)',
inlined from 'amd::smi::RocmSMI::Initialize(unsigned long)' at rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:330:27:
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/include/rocm_smi/rocm_smi_device.h:199:42: warning: 'bdfid' may be used uninitialized [-Wmaybe-uninitialized]
199 | void set_bdfid(uint64_t val) {bdfid_ = val;}
| ~~~~~~~^~~~~
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc: In member function 'amd::smi::RocmSMI::Initialize(unsigned long)':
rpmbuild/BUILD/rocm_smi_lib-rocm-5.5.1/src/rocm_smi_main.cc:324:12: note: 'bdfid' was declared here
324 | uint64_t bdfid;
| ^~~~~
Only set the bdfid when it is know to be valid.
Signed-off-by: Tom Rix <trix@redhat.com>
Change-Id: I839b4d2d2d4e3b25469cf5972245b9630da00c87
When building from github, these tags don't exist, so the defaults
should try to match the internal tags
Change-Id: Id570341f27e21916b1a7f3605ee2b5b9716cad9b
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
This looks like a typo, as the following variables are not defined:
- AMD_SMI_LIBS_TARGET_VERSION_MAJOR
- AMD_SMI_LIBS_TARGET_VERSION_MINOR
- AMD_SMI_LIBS_TARGET_VERSION_PATCH
Change-Id: I43449e7bd2a2de643d33e79fad063a7859679c8d
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
The keyword "PROGRAMS" should be used in place of "FILES" in order to
make sure executable scripts have the correct permissions.
Change-Id: I6c287dc1291774ad6d97a04d621957dea0a1b697
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
See SWDEV-391039 and SWDEV-391040 for details
Change-Id: I662ba43363d949465454ea4af4d4586b3d47a811
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
If temp in hwmon was missing - rocm-smi crashed.
e.g. /sys/class/drm/card1/device/hwmon/hwmon5/temp1_input
This change displays "N/A" for temp instead of crashing.
Change-Id: I02f84a466bd3acfbd9b65e7e4ca0f18e76606c3b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Used pyright to show errors and warnings and resolved most
Change-Id: I0fdf7dcdf08db5c35dec80f6645e0a395fbe4197
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Updates:
* [rocm-smi] Provide a thread-safe logging feature
* [rocm-smi] Adding logrotation into install/upgrade/remove
scripts
* [rocm-smi] Updated cmake lists to include rocm_smi_logger
* [rocm-smi] Updated DEB/RPM install/remove logging file &
folder with all users having r/w privledges for
/var/log/rocm_smi_lib/ROCm-SMI-lib.log
* [rocm-smi] Added ability to do a glob search for multiple files
(globFileExists), assists doing file searches with * strings
* [rocm-smi] Added ability to log system details when RSMI_LOGGING
is turned on (getSystemDetails())
* [rocm-smi] Added logging to provide which ROCm API is being called
when RSMI_LOGGING is on
* [rocm-smi] Added logging to provide SYSFS path and read value,
when RSMI_LOGGING is on. Provides error reponse on failure.
* [rocm-smi] Added logging to provide SYSFS path and read value,
when RSMI_LOGGING is on. Provides error reponse on failure.
* [rocm-smi] Added environment variable RSMI_LOGGING to control
when logging is enabled or disabled. By default, by not
setting this env. variable, logging is turned off. When
setting RSMI_LOGGING=<any value>, logging is enabled
which is placed in /var/log/rocm_smi_lib/ROCm-SMI-lib.log file.
Setting RSMI_LOGGING is allowed in both debug and release builds.
* [rocm-smi] Removed an initialize procedure which keeps
debug_inf_loop. Seems this feature is not being used.
Change-Id: I79b48387609c6233c6f05b04fb8bba66b68c2399
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Updates:
* [rocm-smi] Added larger app width size, which helps
display missing device info
* [rocm-smi] Added better context when rsmi_ret_ok
does not return with RSMI_STATUS_SUCCESS
* [rocm-smi] Removed all references to an
undefined function (printLogNoDev())
* [rocm-smi] Fixed not detecting non-int
values when setting the voltage curve
* [rocm-smi] Added better context on missing
sysfs file when setting clock overdrive
values
* [rocm-smi] Fixed getMemInfo() calls not
referencing tuple values (making it easier
to read)
* [rocm-smi] Silenced concise info spitting
out errors for missing VRAM files, instead
display which metric is "unsupported" if
the files are missing
* [rocm-smi] Updated function descriptions for
rsmi_ret_ok & getMemInfo
* [rocm-smi] Updated getMemInfo to provide a
quiet call, to silence for concise info calls.
This provides a way to keep the output clean.
* [rocm-smi-lib] Added when using debug sysfs
files, to state, which enums are enabled
for debug
Change-Id: I0e9e0c97ccf71467ced0e1a1f71803327a8be2b7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Updates:
* VoltRead - needed to properly send out RSMI_STATUS_NOT_SUPPORTED
when device does not have voltage hwmon files
* ComputePart. - test failure was likely caused due to EvtNotif
causing conflicts (unknown exactly why). Test passes when
moving it ahead of the event notifier. Both API calls may have
a system resource issue, TBD.
* rocm_smi_example - now indicates when an API call
returns RSMI_STATUS_NOT_SUPPORTED or
RSMI_STATUS_NOT_YET_IMPLEMENTED. Allows example to fully complete
on systems which may not provide support for all API calls.
Change-Id: I520b8584e078d412414e8e5797c664220a7e823a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
The rsmi_dev_subsystem_name_get() only matches subvendor id and
subdevice id for a vendor. The change will also match device id.
Change-Id: Ife3aedaf6fc7390ed7fa62edbde40c2340689b23
Fix was needed due to hwmon updates.
Several voltage sensors (ex. vddgfx/vddnb)
are unsupported or not applicable
to upcoming hardware. This was not the case
for previous hardware sensors, resulting in
the rocm-smi crash observed.
Change-Id: Ib8593e10811638def26fc7a1eda29309e328db09
Signed-off-by: Charis Poag <Charis.Poag@amd.com>