When built with LTO enabled, the linking of liboam.so chokes on the
following error, which is somewhat similar to the Debian bug #1030876
affecting PA-RISC, although the symptoms subtly differs in that it
suggests to build using -fPIC:
/usr/bin/ld: /tmp/cc0wF8Kx.ltrans0.ltrans.o: relocation R_X86_64_PC32 against symbol `_ZTVSt9exception@@GLIBCXX_3.4' can not be used when making a shared object; recompile with -fPIC
The -fPIC argument is passed appropriately down to the build command,
however it looks to be erased by the late introduction of -fPIE flag
by upstream build system. Erasing this flag allows the build to go
through, both with LTO and on PA-RISC.
Bug: https://github.com/RadeonOpenCompute/rocm_smi_lib/issues/111
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1015653
Change-Id: I8b35fd4b62cfa1a9ddb145362464df5dd276e2f5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
* Updates:
- [API] After discovering all amd gpus, we now properly
map correct bdf (xgmi nodes). Especially important for
partition changes - aka secondary nodes.
- [API] While adding new secondary nodes we now have
better grouping -> due to resorting based on
kfd properties list & matching to primary uniqueid
- [API] All secondary nodes are now AddToDeviceList
with correct bdf (location id), provided by kfd
- [API] Modified AddToDeviceList(..., uint64_t bdfid):
providing an optional field - bdfid. This allows working
around primary pcie cards with xgmi nodes
- [API] Utils - cpplint minor fixes
- [Example] Removed all endl references w/ newline, fixed
spacing, and some incorrect values displaying as hex
(needed dec representation)
- [API] kfd node functions - now print full path of file
for trace logs
- [Tests] power_read.cc: Added in generic power test to
confirm guaranteeing specific return values
Change-Id: I143474e8d64c4915a966e789be6bcea4fa7f4472
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
* Updates:
- [API] Added rsmi_dev_power_get(uint32_t dv_ind,
uint64_t *power,
RSMI_POWER_TYPE
*type)
provides generic get to average or
current power & provides backwards
compatibility
- Added a utility function to get MonitorTypes
(monitor_type_string(type)) &
RSMI_POWER_TYPE (power_type_string(type))
strings
- [Tests] Added rsmi_dev_power_get tests and
provided better verification of return values for
all power APIs
- [Tests] Updated power outputs to show correct
units
- [example] Now uses avg, current, and generic
power functions with type output response
Change-Id: I5ca06ca37fd5f61e100f2835b664d6cdd1ca42e6
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Adds support for 'gpu_metrics_v1_4' and new counters
Code changes related to the following:
* rsmi gpu_metrics APIs
* rsmi gpu_metrics Logs
* The new gpu_metrics are now part of the Device
Build changes related to the following: None
Change-Id: Ie748e977cd0a01c6a2fb82260014c0699605dbb3
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
* Updates:
- rocm_smi_lib + CLI:
Rename all "NPS mode" -> "memory partition"
related files/functions/API/CLI to align with correct
technical naming
- rocm_smi_main: fixed identifying primary card's unique id
utilize rsmi_dev_unique_id_get to map which
KFD nodes belong to it
- rsmi_dev_*_partition*: now have better logging output
- compute partition tests:
Added 20 sec delay for workaround until GPU
busy is confirmed as the issue
- CPPLint fixes/formatting
- [Example] Moved all endl to "\n" for efficiency
- [Example] Added Edge & Junction temperature examples
- [Example] Added rsmi_minmax_bandwidth_get() example - WIP
Change-Id: Ida6db6fda7e0ac9d696a34cb15b4746e69d58d51
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
- Return from freq_output function early if clock is unsupported
- Right-align frequencies
Change-Id: I799c9351dac8a5be161bc9243cd3816539728357
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
The purpose of this patch is to add the following missing firmware
blocks to the SMI CLI:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: If9cabdc60ffcf08f27c9e6bdc20e8a26b192a738
Also change the TARGET from amd_smi_libraries to rocm_smi_libraries
This helps reduce confusion between rocm-smi and amd-smi
Change-Id: Ie54cedd831ba24bd9afc341ad15b7e8e20732059
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
get_od_clk_volt_info assumed the size of the file instead of checking
the length. This caused out-of-bounds array element access.
Change-Id: Ibda8f0c3a6d1623d48964641ae5ef610d2072e94
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
* Updates:
- rocm_smi_logger:
General cleanup &
Aligned to cpplint rules for usage
- rocm_smi_monitor:
Fixed MonitorTypes
from not displaying properly in logs
& Added socket power label + current
socket power MonitorTypes
- rocm_smi API:
Added rsmi_dev_current_socket_power_get API
- rocm_smi CLI:
General cleanup,
Concise info now displays device data
in variable width (see printLogSpacer's
new field),
printLogSpacer now as an adjustable
variable that overrides appWidth,
Added Socket Power to base rocm-smi +
--showpower CLI calls,
--showpower & base rocm-smi CLI defaults
to printing socket power (if not available,
displays average power)
- Cleaned up temp label references
- power_read gtests:
Added current socket power to testing
Change-Id: Ica57e6f98ad96e2584e7c7955e188f68d2dab89d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
The purpose of this patch is to add the following missing firmware
blocks to the SMI LIB:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5d4d37d883878dd02ef8533d4eb8891d54d70630
This commit makes sure GTest is always compiled with rocm_smi_lib_tests.
GTest installation was inconsistent outside of AMD CI environment.
libgtest.so wouldn't get installed with rocm_smi_lib_tests if gtest
existed on the build machine. Which is undesirable when packaging.
Change-Id: I607df6c67c81480e3b6487b28f14924e8bf56ad4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Implements APIs for 'gpu_metrics_v1_3' utilization averages
Code changes related to the following:
* rsmi_dev_activity_metric_get()
* rsmi_dev_activity_avg_mm_get()
* CLI shows "Avg.Memory Bandwidth" under "--showmemuse"
Change-Id: I8e4600f350a7c18499abf022534db2b875f09d5f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Allow for configureLogrotate to fail without failing configure
In previous commit I forgot to invert the check when switching
"IS_SYSTEMD" and "!IS_SYSTEMD" if-else statements.
Change-Id: I8eb8e7981c6353a2e60064eb3a6e35821ea2a0d0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
- Clean-up packaging scripts. More consistent with RDC.
- Remove all 'sudo' calls. all these scripts are to be ran by root.
- Reduce scope of variables.
- Remove unnecessary functions
Change-Id: Ib90f8e66ef4eae24f73e940fff44f515e12233f5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
When piping rocm_smi into 'head' it failed with "Broken pipe" error. The
error can be safely ignored. head closes the pipe early which causes
calls a SIGPIPE signal to be raised.
https://docs.python.org/3/library/signal.html#note-on-sigpipe
Change-Id: I4a589c6ed9a8c5b50de84b33e28115c6b510045f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Library path was printed at all times even with --json flag.
This commit adds a mandatory initRsmiBindings function which is a core
component of the rsmiBindings.py library.
It **MUST** be called on import.
Change-Id: Ic6ae1ec5d1fabba288910e6aed6c4706e53e5cd7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
It is not guaranteed that power can be read or set for some GPUs
(MI300). It is also not guaranteed that frequencies can be set.
As this is not a tool issue - we simply skip the failing test.
Change-Id: I134e96a476040cef513cd924f00e30cd6dea42a5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
For operations related to:
--resetfans
--setfan
We report 'Not supported' for these cases instead of 'Permission denied'
Code changes related to the following:
* rocm_smi_properties
* rocm_smi related APIs
Change-Id: I144646efc3804fabd45cc5a46351803950b4feb7
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
* Updates:
- Fixed infinit loop on systems
which did not have VRAM files
- Fixed concise info from throwing exception
with no amdgpu driver loaded
- Fix for ability to see all nodes when
after switching partitions (mirrors
original card display/settings)
- Added to logs build type, lib path,
and set env. variables
Change-Id: Ic0333df355144ce2242cecea93fe4ce51caf311c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Code changes related to the following:
* Reverts earlier fix for the same issue
* Check for existence of files before reading
Change-Id: I175b20c3343c414b12b79dc3fc404f53fbaabf3a
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Code changes related to the following:
* rocm_smi.py
Change-Id: I600e776bf479f972b8d639ce5a658a24916aed3c
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Properly handles 'Unable to detect' vs 'Not supported' fan cases where:
* sysfs file (pwm#) exists, and readings report zero (0), "Unable to detect fan speed"
* sysfs file (pwm#) does not exist, then "Not supported"
Change-Id: If4b0312c872b76647a3e54427ba2a3f3e8e6dab1
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>