Also force unset HSA_TOOLS_LIB so it doesn't break rocprofiler-sdk
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
[ROCm/rdc commit: e73eaf8115]
Provide support for reliable metrics and hide experimental in current
release.
Further ROCMTools integration development is pushed out to ROCm 5.6.
Change-Id: Iae7a0ed3991588c833bd8ef580b02b9c71390d55
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
[ROCm/rdc commit: 4536a453db]
This commit adds integration with ROCmTools
Additional changes:
- Fix DEB and RPM installation issue when systemd is not present
- Fix typos in rdc.h
- Wrap negative values in parentheses in rdc.h
- CMAKE: Improve rocm_smi searching
- README: Improve formatting, add info about ROCmTools
Metrics added: 700-714
Metrics can be listed with `rdci dmon --list-all`
Majority of the metrics are only supported by Instict (MI) series GPUs
700 RDC_FI_PROF_ELAPSED_CYCLES should be available on most devices
See README for more information
Change-Id: I907d3eacdc92fc5588ca6c76c2fa1ce0ad900770
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
[ROCm/rdc commit: 861a843ed7]
Main CMake improvements:
* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm
Misc improvements:
* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b
[ROCm/rdc commit: 2c171767b3]
A new diagnostic module librdc_rocr.so is created. The
module uses Rocr to test the memory allocation, memory access
and compute queue ready status.
Change-Id: I9098f4fc3209bf381b7cb3658a4e94c2e22f2fe9
[ROCm/rdc commit: 78e2f2486b]
Installing files to standard path across each version and using
ldconfig has issues with side-by-side install.
Usage of RUNPATH/RPATH for ROCm to ensure all ROCm libraries are
picked without the need for ldconfig.
For RDC server to be picked up by systemctl, service config file
shall be a symlink from /lib/systemctl/system/rdc.service to
corresponding RDC file path in a given version of ROCm
For side-by-side install packages of RDC post install scripts
will be removed. Hence Use will have to set the symlink explicitly
for now.
Change-Id: I916da7cf132f0f9c667e2470fac2b0875e3db9d0
[ROCm/rdc commit: fe1593dda5]
When RDC are only used as the libraries, the user can choose not to build
the rdci and rdcd, which will remove the dependencies to the gRPC and protoc.
The -DBUILD_STANDALONE=off should be pass to the cmake.
* Change README.md for the instructions.
* Move the python_binding installation from client/CMakeLists.txt to CMakeLists.txt
so that the RDC library only build will also install the folder.
* Change CMakeLists.txt and rdc_libs/CMakeLists.txt to build with gRPC only if
the BUILD_STANDALONE is enabled.
Change-Id: If9cfe9fc298a83636d85fe352a311fe2fe041661
[ROCm/rdc commit: 105675aeeb]
Also:
* consolidated the info in the previous rdc/README.md into
the README.md that was moved from docs/ directory.
* added missing information to get grpc into the default
library path (needed to add the grpc dir with ldconfig).
* formatting fixes
Change-Id: Id61e761ad7bdee40364bb8837be8705ed5ca53d1
[ROCm/rdc commit: bf412e3f76]
Add the function to start and stop the job recording.
Add the function to get the job stats for each GPU and summary of multiple GPUs
Add the function to remove the jobs.
Add a class RdcLogger which can control the log level using the environment variable RDC_LOG.
This is similar to GRPC_VERBOSITY gRPC. When the customer has the issues, he can enable the verbose
log to help us to troubleshoot the issues.
Add the -u support in the rdci group, fieldgroup and dmon for connecting to rdcd without authentication.
Change-Id: I22c591823c1ee6485db106b911bed8271d1b2769
[ROCm/rdc commit: a547dc7efd]
Add the support for rdci subsystem group create, delete and query
Add the support for rdci subsystem fieldgroup create, delete and query
Add the support for rdci dmon system. The dmon system may show the stats every
a few seconds until press Ctrl-C. To cleanup the resources (for example, unwatch),
a signal handler is added.
Change-Id: Ib22a8a43b7083c7c72819ca21145e22702d9ad6c
[ROCm/rdc commit: 16bce67835]
The RDC API is changed to pass the certificates to the gRPC.
Add the support to add all GPUs in the host to a group. Also before
add a GPU to a group, the RDC API will verify that GPU exists or not.
Add the support to fetch the temperature metrics.
Change-Id: I5857ef03fede233d16e8b2836be120f33172da93
[ROCm/rdc commit: 66e4e790c3]
Create the skeleton implementation of rdc_client.so and rdci. Modify current rdcd to
integrate the RDC API service:
rdc.proto is changed to add a new RdcAPI service which defined the interfaces for the RDC API.
RdcStandaloneHandler.cpp is added to send the request using gRPC to the rdcd. It is built into
the rdc_client.so
rdci.cc, RdciDisCoverySubSystem.cc and RdciSubSystem.cc are added to implement skeleton rdci.
Currently, the discovery subsystem is supported.
rdc_api_service.cc is added to the server as a skeleton to implement the RdcAPI service. Currently,
only discovery API is implemented. Note: we disabled the rdc_rsmi_service, which will be removed
in the future. The original rdc_client.so is renamed to rdc_client_smi.so which should also be
removed in the future.
Add the instruction how to run the rdcd and rdci in the build folder in the README.md.
Change-Id: Id232f9f83787e5812d4a295dc8cf0daa7728b06c
[ROCm/rdc commit: 020f6939f7]