Modifying the /opt/rocm/etc/rdc file modifies RDC launch options. If
the file doesn't exist, the service should still launch (though a new
file should likely be included with the next released package of 'rdc'.
Change-Id: I1a1891e9c5c3e6048754eb555779a97a170754c0
The executable rdcd was using an absolute path in rdc.service. Using update-alternatives gives the flexibility to invoke the binary from anywhere and no absolute path is required.
Change-Id: I2f3d6fcbf9dd854870cfc2e00532c504ce6cd6fc
These RUNPATH changes make it so libraries can be found without setting
LD_LIBRARY_PATH.
Mostly tested on installed RDC binaries and libraries. The
build binaries should also work.
Change-Id: Ifd908a5b61d24dfcbb1d08d21b4ee830156d8643
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Also add stddef.h workaround for old GCC.
RHEL-8 still uses GCC 8.5 and templates are not well supported.
Change-Id: Ia4dae23892ec63682ea848c46ba81de85cf6d209
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
NOTE: RVS Build is disabled by default due to CI build issues.
Change-Id: I1593f0fe22075a9f86f54afa3ac151e109f1f7bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Join the signal handling thread instead of cancel it to prevent
crash with "terminate called without an active exception".
Change-Id: I2e18eb825728fd3a94f67b1b0049516bb7b6ebbc
- Replace gRPC library with gRPC package
- Relax RUNPATH
- Make LINKER_FLAGS global
gRPC package includes its dependencies:
SSL, UPB, ABSL, and etc.
Change-Id: Ieb198ad96e26e89b09cb85986214a5b1451b17a6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
- Respect CMAKE_INSTALL_PREFIX and ignore RDC_CLIENT_INSTALL_PREFIX
- Move example and rdctst from rocm/bin to rocm/share/rdc
- Add README for examples
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I0b1d996d206327fd1b51ac6e82d548829bdb1570
Main CMake improvements:
* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm
Misc improvements:
* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b
With file reorganization changes binaries are moved to /opt/rocm-ver/bin.
Similarly rdc.service moved to /opt/rocm-ver/libexec/rdc
Test suites still used old paths
Once test suites changes are made, backward compatibility for binaries and rdc.service can be removed
Corrcted binary path in rdc.service.in
Corrected GRPC runpath
Change-Id: I306924d81cedc19586305a79d51eea8af6e70e83
SWDEV-291455 - Binary , header files and libraries installed in bin,include and lib folder under /opt/rocm-ver
Prebuilt ras library with updated search path
cmake config files in lib/cmake/rdc
grpc,sp3,hsaco and private libraries installed in lib/rdc
config installed in share/rdc
authentication and python_binding installed in libexec/rdc
Backward compatibility added for header files and libraries
Depends-On: I3f3d192935923f71737b3fe55ded536654a73dd7
Change-Id: Ia1a6cadc59034b155631a1ee5fdbe692d2a8a71b
grpc v1.44.0 needs to link to library absl_synchronization. The
CMakeLists.txt is changed to link to that library if available.
Change-Id: I92f7247473a70e7a83416b9744e788e45d104565
Provides a RdcSmiDiagnostic module, which will call rocm_smi_lib.
It will support following diagnostics: Get GPU Topology, Check GPU
parameters and check processes running on the GPUs.
The grpc client and server side diagnostics function is added.
The diag module is added to the rdci.
Change-Id: I10a0cf3c20556a61373ab686f82cae75acaa40dd
Write access is required for some RSMI services. This change
temporarily permits write access so configuration can be done,
and then turns it off.
To help with this, the ScopedCapability struct is introduced to
provide scope limited access, helping to ensure a process is not
left with extra capability, should an exception occur.
Change-Id: I4978a1a688db935b8bfc27b3b537a0dd07959d3f
Install the grpc lib to rdc/grpc/lib and add miss libraries.
Add “--no-as-needed” and all extra grpc libraries in rdci/rdcd as
RUNPATH will only search direct dependencies.
Change-Id: I596acb2eb3a7228d703e79db64699bc20d0e7c09
Installing files to standard path across each version and using
ldconfig has issues with side-by-side install.
Usage of RUNPATH/RPATH for ROCm to ensure all ROCm libraries are
picked without the need for ldconfig.
For RDC server to be picked up by systemctl, service config file
shall be a symlink from /lib/systemctl/system/rdc.service to
corresponding RDC file path in a given version of ROCm
For side-by-side install packages of RDC post install scripts
will be removed. Hence Use will have to set the symlink explicitly
for now.
Change-Id: I916da7cf132f0f9c667e2470fac2b0875e3db9d0
Also:
* print header line every 50 line on output
* print events that are being listened for with header
* cpplint clean-up
Change-Id: Ic049eb79156a9528b556e56f0fa43e1344f898cc
Make the RDC use the new rdc_field_t enum instead of uint32_t.
This will help prevent invalid field types from being passed in.
Also, centralize where data related to fields is kept. This will
reduce the number of places where changes are required each
time a new field is added.
Finally, cleaned up several cpplint issues.
Change-Id: I48e4512e18c164411d8b09ae3d4bed99fba359ec
In the job stats, in addition to the max, min and average,
it will also display the standard deviation.
A new option --json is added to the rdci to output the results
in json format.
In the job stats, using the GMT time instead of timestamp
for start and end time.
Change-Id: If245c4fc4854a1dc867f97ff5aa9112af7962eca
Also:
* update README documentation
* correct postinst scripts for deb and rpm
* add lib64/ to link_directories (needed for CentOS and others)
* remove a redundant "rdc" from the package names
* rearrange the package names to conform to convention
For example:
rdc-server_1.0.0.0.local-build-0-c3187fb-dirty_amd64.deb
rdc-server_1.0.0.0.local-build-0-c3187fb-dirty.x86_64.rpm
* fix issues that result from having, in essence, 2 different
install prefixes, 1 for the client and 1 for the server.
Change-Id: I88f0e1b8b72df2793c35ed71534afd91142da012
Remove the check whether the rdcd is started by rdc user.
Add the read access check for the private key and certificates if
the authentication is enabled.
Change-Id: I0e7a7eafb7985801572f809da0cb3e4012683153
Remove the * in the rdci stats
When a group is created, the GPUs can be added in the same command.
Add the support to the memory temperature.
Add the support to the memory clock.
Add the support to report the ECC errors.
Add the support to report the PCIe bandwidth throughput.
Since the RX/TX throughput may take 1 second to retreive, an async fetch is implemented
in the RdcMetricFetcherImpl.
Change-Id: If04f602fe1f2d14dbf7c2fb189549fd030523f9a
Add the job stats APIs in the rdc_api_service at the server side rdcd
Add the job stats APIs for the RdcStandaloneHandler at the client side
Make the load librdc.so and librdc_client.so thread safe.
Impelement async update all fields in RdcEmbeddedHandler.
Change-Id: I659d91efb32d1094d3b7f0f2cec39518cd7336ce
Depending on how a user starts rdcd, rdcd will either have
full monitor/control capabilities or have just monitoring
capabilties.
The only 2 user ids allowed are "rdc" and root.
Change-Id: Ie296a2f68c9723bef5945b1af1070ef99eeea93b
Implement the APIs defined in the RdcStandaloneHandler to make gRPC call to daemon
Implement the APIs defined in the RdcAPIServiceImpl to handle the gRPC calls in daemon
Add two APIs to get all GPU groups and field groups: rdc_group_get_all_ids()
and rdc_group_field_all_ids()
Those two APIs are required by the rdci group and fieldgroup
sub-modules.
Change-Id: I066091423146dea180c16af212688ed43dc44611
Create the skeleton implementation of rdc_client.so and rdci. Modify current rdcd to
integrate the RDC API service:
rdc.proto is changed to add a new RdcAPI service which defined the interfaces for the RDC API.
RdcStandaloneHandler.cpp is added to send the request using gRPC to the rdcd. It is built into
the rdc_client.so
rdci.cc, RdciDisCoverySubSystem.cc and RdciSubSystem.cc are added to implement skeleton rdci.
Currently, the discovery subsystem is supported.
rdc_api_service.cc is added to the server as a skeleton to implement the RdcAPI service. Currently,
only discovery API is implemented. Note: we disabled the rdc_rsmi_service, which will be removed
in the future. The original rdc_client.so is renamed to rdc_client_smi.so which should also be
removed in the future.
Add the instruction how to run the rdcd and rdci in the build folder in the README.md.
Change-Id: Id232f9f83787e5812d4a295dc8cf0daa7728b06c
The rdc account will be created on installation if it does
not already exist. It will be a system account with no
home directory.
rdcd will be started as a systemd service, but change to
user "rdc". The rdc user will drop all priviliges except
CAP_DAC_OVERRIDE, permitted. This means the default mode
will have no special privileges, but have the ability to
gain write access (e.g., to sysfs) when needed.
rdc tests were being inadvertantly added to the
installation. This was adversely impacting the new
functionality, so it was corrected in this commit.
Also included are a few small formatting changes.
Change-Id: I9c6bb132fee28119fd3960594dfb97bd2e7b282a
Initial testing include an "id test", which really just a
template test at this point, and a temperature sensor test.
The google test code is included in this commit. It will
eventually be taken out and replaced with a pull from a google
external repo.
Change-Id: I591818a9c169f4654fc8d8f17cf648f227d72545