Previously we relied on "render" to be enough and only used "video" as a
fallback. On some systems like SLES this might not be sufficient.
One issue happened when starting rocprofiler as part of RDC
initialization:
what(): hsa error code: 4104 HSA_STATUS_ERROR_OUT_OF_RESOURCES
The issue only happened when RDC was started with systemd.
Turns out "rdc" user (under which systemctl starts RDC) only had render
but not video group. Adding video group solved the issue.
Change-Id: Idf6a9521ae72a0b28a428869aa7ab1edde3ae259
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
When the rdc is built with this configure option
-DBUILD_STANDALONE=OFF
This error is caused
CMake Error at rdc_libs/CMakeLists.txt:106 (export):
export given target "rdc_client" which is not built by this project.
Resolve this by using conditional
Change-Id: I3f6bb2946c609c7db9fc38015b7d9c8ae766f3a0
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
RAS plugin loaded rocm-smi which is in conflict with amd-smi library
Main source of grief was the map 'devInfoTypesStrings' that is defined
in both rocm-smi and amd-smi
We assume that rocm-smi would get lazy-loaded by RAS library and
overwrite symbols defined in amd-smi. devInfoTypesStrings in rocm-smi
contains different number of elements, the enums are also different.
RDC relies on amd-smi's enums.
One such enum is kDevGpuMetrics:
rocm-smi: kDevGpuMetrics = 68
amd-smi: kDevGpuMetrics = 75
Example of overlapping map definitions:
$ objdump --dynamic-syms /opt/rocm/lib/libamd_smi.so | grep devInfoTypesStrings
00000000003c4980 g DO .data.rel.ro0000000000000008 Base devInfoTypesStrings
00000000003db830 g DO .bss0000000000000030 Base _ZN3amd3smi6Device19devInfoTypesStringsE
$ objdump --dynamic-syms /opt/rocm/lib/librocm_smi64.so | grep devInfoTypesStrings
00000000003dc590 g DO .bss0000000000000030 Base _ZN3amd3smi6Device19devInfoTypesStringsE
00000000003c9c68 g DO .data.rel.ro0000000000000008 Base devInfoTypesStrings
Change-Id: Ib2f2db32b6abd7ebe84e7807c25581461eb86bae
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
- Replace non-working fields with working ones
- remove CU_OCCUPANCY completely as it isn't well supported
- Fix rocprofiler initialization with shared_ptr and rdc_module_init
- Replace env var ROCPROFILER_METRICS_PATH with ROCP_METRICS
- ROCPROFILER_METRICS_PATH is only relevant for rocprofv2
- ROCP_METRICS is only relevant for rocprofv1 (which we are using)
Change-Id: I21e6fa3f0e1694c38f44ca0e5659d672559f7380
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
For new ASIC, the RDC_EVNT_XGMI, RDC_FI_PCIE_RX and RDC_FI_PCIE_TX
are not supported. New fileds RDC_FI_XGMI and RDC_FI_PCIE_BANDWIDTH
should be used.
Change-Id: Iff5bbef4c07994090fa7c4e9b319966215525283
Modifying the /opt/rocm/etc/rdc file modifies RDC launch options. If
the file doesn't exist, the service should still launch (though a new
file should likely be included with the next released package of 'rdc'.
Change-Id: I1a1891e9c5c3e6048754eb555779a97a170754c0
There are several ways to ignore the formatting commit:
1. Configure local project:
git config --local blame.ignoreRevsFile .git-blame-ignore-revs
2. Run blame with an argument:
--ignore-revs-file .git-blame-ignore-revs
example:
git blame --ignore-revs-file .git-blame-ignore-revs rdci/src/rdci.cc
Change-Id: Ic6eaa740850d9f1462d841361480307646e46b5e
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
The executable rdcd was using an absolute path in rdc.service. Using update-alternatives gives the flexibility to invoke the binary from anywhere and no absolute path is required.
Change-Id: I2f3d6fcbf9dd854870cfc2e00532c504ce6cd6fc
The starting of rdc.service was done in preinstall scripts. It should be started after installing rdc package.
Moved the functionality to postinstall scripts
Change-Id: I9a8c733beea43f95474b990a35a431db287b9a8e
These RUNPATH changes make it so libraries can be found without setting
LD_LIBRARY_PATH.
Mostly tested on installed RDC binaries and libraries. The
build binaries should also work.
Change-Id: Ifd908a5b61d24dfcbb1d08d21b4ee830156d8643
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
If the card does not have edge temperature, fallback to junction
temperature. If the card only have socket power, then use socket
power instead.
Change-Id: I053a67a89cf3b29a34e82123f522c08d7dd68916
Also add stddef.h workaround for old GCC.
RHEL-8 still uses GCC 8.5 and templates are not well supported.
Change-Id: Ia4dae23892ec63682ea848c46ba81de85cf6d209
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
NOTE: RVS Build is disabled by default due to CI build issues.
Change-Id: I1593f0fe22075a9f86f54afa3ac151e109f1f7bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>