Commit graph

46 Commits

Autor SHA1 Nachricht Datum
Galantsev, Dmitrii 796435c568 Fix runpath for rdci and rdcd
Change-Id: Ic131e9a5abfdf26f2b8e78799fe0e3450171d20d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-05-07 04:39:39 -05:00
Brandon Bagwell de3cb36ce0 Adds the ability to modify 'rdc' options
Modifying the /opt/rocm/etc/rdc file modifies RDC launch options.  If
the file doesn't exist, the service should still launch (though a new
file should likely be included with the next released package of 'rdc'.

Change-Id: I1a1891e9c5c3e6048754eb555779a97a170754c0
2024-04-30 10:28:16 -05:00
Galantsev, Dmitrii 9702d0f2d7 SWDEV-439576 - rocmsmi -> amdsmi
- Migrate to amdsmi library
- NOTE: raslib still uses rocmsmi
- Remove unused rocmsmi service
- Remove unused RDC client code
- Remove RSMI calls from protos/rdc.proto

Change-Id: Ifc34a264c506b0ec5792307ee56b34526268762d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-04-09 20:19:28 -05:00
Ranjith Ramakrishnan 0ca6d6fa59 Remove hard coded ROCm path in rdc.service
The executable rdcd was using an absolute path in rdc.service. Using update-alternatives gives the flexibility to invoke the binary from anywhere and no absolute path is required.

Change-Id: I2f3d6fcbf9dd854870cfc2e00532c504ce6cd6fc
2024-04-09 10:27:19 -05:00
Galantsev, Dmitrii 32806681ca SWDEV-444700 - CMAKE - Fix RUNPATH
These RUNPATH changes make it so libraries can be found without setting
LD_LIBRARY_PATH.

Mostly tested on installed RDC binaries and libraries. The
build binaries should also work.

Change-Id: Ifd908a5b61d24dfcbb1d08d21b4ee830156d8643
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-02-13 16:56:28 -06:00
Galantsev, Dmitrii f9e80cc37a Use templates for module population
Also add stddef.h workaround for old GCC.
RHEL-8 still uses GCC 8.5 and templates are not well supported.

Change-Id: Ia4dae23892ec63682ea848c46ba81de85cf6d209
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-10 00:27:09 -06:00
Galantsev, Dmitrii eaa1862a80 RVS: Finish initial RVS integration
NOTE: RVS Build is disabled by default due to CI build issues.

Change-Id: I1593f0fe22075a9f86f54afa3ac151e109f1f7bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-10 00:27:04 -06:00
Galantsev, Dmitrii 434e40305d LINT: Add cpplint, clang-format and pre-commit support
Change-Id: I3cbb787ef27d90486b212dfb1a8c77c460acc2ac
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-09 11:37:11 -06:00
Galantsev, Dmitrii ed3cfffd7e Server - Add -a/--address option
Change-Id: Ia9e8d76b9a4ba0aadc567142601a87f0ad0b69e4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-12-04 15:26:44 -06:00
Bill(Shuzhou) Liu 1ab4110d46 RDC crash when exit
Join the signal handling thread instead of cancel it to prevent
crash with "terminate called without an active exception".

Change-Id: I2e18eb825728fd3a94f67b1b0049516bb7b6ebbc
2023-11-03 09:10:22 -04:00
Galantsev, Dmitrii 8f9a6796f1 Upgrade to CXX-17 gtest-1.14
Change-Id: I1c7316f151128cbc9318b226dac14950e399d2c7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-28 12:54:49 -05:00
Galantsev, Dmitrii 8f6bf948cc ASAN: Shutdown the signaling thread on exit
Change-Id: Ica546db354430f5f4adc33d8d92e09927d40f75b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-03-28 11:26:38 -05:00
Galantsev, Dmitrii 35edaa2322 Remove rocmtools environment variable
- Set ROCMTOOLS_METRICS_PATH inside rdcd
- Add nullptr checks for rocmtools library functions

Change-Id: Ibbe4fed90df20e68b1a7971533765d831860c16f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-16 19:16:26 -06:00
Galantsev, Dmitrii 6e0c5d1d56 Fix rdcd crash on rocmtools fields read
- Solve issue that resulted in rdcd crash when reading registers 700-799
  by setting ROCMTOOLS_METRICS_PATH in rdc.service

README changes:
- Change default install path for gRPC
- Simplify install instructions
- Make more commands copy-pasteable
- Replace /opt/rocm-<version> with /opt/rocm
- Misc fixes

Change-Id: I39a2896ed2af5a3889f4b36cd8bcc8d3e9593585
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-06 16:39:17 -06:00
Galantsev, Dmitrii 3e4c55ec6c SWDEV-352414 - Fix gRPC linker issues
- Replace gRPC library with gRPC package
- Relax RUNPATH
- Make LINKER_FLAGS global

gRPC package includes its dependencies:
SSL, UPB, ABSL, and etc.

Change-Id: Ieb198ad96e26e89b09cb85986214a5b1451b17a6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-04 18:50:07 -06:00
Galantsev, Dmitrii f6efd7fbf6 Improve CMake and relocate tests
- Respect CMAKE_INSTALL_PREFIX and ignore RDC_CLIENT_INSTALL_PREFIX
- Move example and rdctst from rocm/bin to rocm/share/rdc
- Add README for examples

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I0b1d996d206327fd1b51ac6e82d548829bdb1570
2022-10-27 13:49:54 -05:00
Galantsev, Dmitrii 2c171767b3 Compile rdctst and improve CMakeLists
Main CMake improvements:

* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm

Misc improvements:

* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b
2022-10-07 13:58:50 -05:00
Ranjith Ramakrishnan c3ea96dd71 SWDEV-350674 - Added backward compatibility for binary files and rdc.service
With file reorganization changes binaries are moved to /opt/rocm-ver/bin.
Similarly rdc.service moved to /opt/rocm-ver/libexec/rdc
Test suites still used old paths
Once test suites changes are made, backward compatibility for binaries and rdc.service can be removed
Corrcted binary path in rdc.service.in
Corrected GRPC runpath

Change-Id: I306924d81cedc19586305a79d51eea8af6e70e83
2022-08-09 17:45:22 -07:00
Ranjith Ramakrishnan 52a3463147 File reorganization with backward compatibility
SWDEV-291455 -  Binary , header files and libraries installed in bin,include and lib folder under /opt/rocm-ver
Prebuilt ras library with updated search path
cmake config files in lib/cmake/rdc
grpc,sp3,hsaco and private libraries installed in lib/rdc
config  installed in share/rdc
authentication and python_binding installed in libexec/rdc
Backward compatibility added for header files and libraries

Depends-On: I3f3d192935923f71737b3fe55ded536654a73dd7
Change-Id: Ia1a6cadc59034b155631a1ee5fdbe692d2a8a71b
2022-08-04 23:42:42 -07:00
Bill(Shuzhou) Liu c4dab3b2bd Add run path dependency on grpc libraries
Add run path dependency for grpc libabsl_*.so required by RHEL.

Change-Id: Ie033cc25019e0cb46a895e8c3e583a0d22ab4561
2022-03-28 09:04:17 -04:00
Bill(Shuzhou) Liu 2a46ee2ab2 Enable the support to grpc v1.44.0
grpc v1.44.0 needs to link to library absl_synchronization. The
CMakeLists.txt is changed to link to that library if available.

Change-Id: I92f7247473a70e7a83416b9744e788e45d104565
2022-03-03 16:11:09 -05:00
Bill(Shuzhou) Liu 76ccf58008 Add the RdcSmiDiagnostic module
Provides a RdcSmiDiagnostic module, which will call rocm_smi_lib.

It will support following diagnostics: Get GPU Topology, Check GPU
parameters and check processes running on the GPUs.

The grpc client and server side diagnostics function is added.

The diag module is added to the rdci.

Change-Id: I10a0cf3c20556a61373ab686f82cae75acaa40dd
2021-07-26 14:56:17 -04:00
Chris Freehill 7a05145542 Fix some lintian errors
Fix lintian errors related to maintainer, postinst script and
permissions.

Change-Id: I6924ff92ff5453fa7e562a6188c2c91cea87df68
2021-03-03 19:35:24 -06:00
Chris Freehill 6b5aeaaa23 Turn on/off DAC capabilities as needed
Write access is required for some RSMI services. This change
temporarily permits write access so configuration can be done,
and then turns it off.

To help with this, the ScopedCapability struct is introduced to
provide scope limited access, helping to ensure a process is not
left with extra capability, should an exception occur.

Change-Id: I4978a1a688db935b8bfc27b3b537a0dd07959d3f
2021-02-04 12:25:26 -06:00
Bill(Shuzhou) Liu 07d4d5376e Install grpc lib to rdc folder
Install the grpc lib to rdc/grpc/lib and add miss libraries.

Add “--no-as-needed” and all extra grpc libraries in rdci/rdcd as
RUNPATH will only search direct dependencies.

Change-Id: I596acb2eb3a7228d703e79db64699bc20d0e7c09
2021-01-25 14:55:45 -05:00
Freddy Paul fe1593dda5 RDC:Move rdc deamon to rocm path.
Installing files to standard path across each version and using
ldconfig has issues with side-by-side install.

Usage of RUNPATH/RPATH for ROCm to ensure all ROCm libraries are
picked without the need for ldconfig.

For RDC server to be picked up by systemctl, service config file
shall be a symlink from /lib/systemctl/system/rdc.service to
corresponding RDC file path in a given version of ROCm

For side-by-side install packages of RDC post install scripts
will be removed. Hence Use will have to set the symlink explicitly
for now.

Change-Id: I916da7cf132f0f9c667e2470fac2b0875e3db9d0
2020-12-04 14:43:06 -05:00
Chris Freehill b278cd379b Add event notification support and rdci timestamps
Also:
* print header line every 50 line on output
* print events that are being listened for with header
* cpplint clean-up

Change-Id: Ic049eb79156a9528b556e56f0fa43e1344f898cc
2020-11-22 07:10:39 -05:00
Chris Freehill 5950ebadc4 rdc_field_t replaces uint32_t; centralize field data
Make the RDC use the new rdc_field_t enum instead of uint32_t.
This will help prevent invalid field types from being passed in.

Also, centralize where data related to fields is kept. This will
reduce the number of places where changes are required each
time a new field is added.

Finally, cleaned up several cpplint issues.

Change-Id: I48e4512e18c164411d8b09ae3d4bed99fba359ec
2020-08-17 14:09:37 -05:00
Bill(Shuzhou) Liu e6d910f67a Support standard deviation and json output for job stats
In the job stats, in addition to the max, min and average,
it will also display the standard deviation.

A new option --json is added to the rdci to output the results
in json format.

In the job stats, using the GMT time instead of timestamp
for start and end time.

Change-Id: If245c4fc4854a1dc867f97ff5aa9112af7962eca
2020-08-17 14:09:37 -05:00
Chris Freehill 4008dd8eac Add .x86_64 and _amd64 suffixes to .rpm and .debs
Also:
* update README documentation
* correct postinst scripts for deb and rpm
* add lib64/ to link_directories (needed for CentOS and others)
* remove a redundant "rdc" from the package names
* rearrange the package names to conform to convention
For example:
rdc-server_1.0.0.0.local-build-0-c3187fb-dirty_amd64.deb
rdc-server_1.0.0.0.local-build-0-c3187fb-dirty.x86_64.rpm

* fix issues that result from having, in essence, 2 different
  install prefixes, 1 for the client and 1 for the server.

Change-Id: I88f0e1b8b72df2793c35ed71534afd91142da012
2020-08-17 14:09:37 -05:00
Chris Freehill 8e4d1e7f33 Make job id char array const in rdc api
Also make adjustments to packaging.

Change-Id: I73cc18ce67f833ff563cb1488b000b69b315979a
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 96afb24845 Allow the rdcd to be started by user other than rdc or root
Remove the check whether the rdcd is started by rdc user.
Add the read access check for the private key and certificates if
the authentication is enabled.

Change-Id: I0e7a7eafb7985801572f809da0cb3e4012683153
2020-08-17 14:07:25 -05:00
Chris Freehill b6da10f1f4 Separate client/server packages, build_ scripts mods
Change-Id: Ic553be523e7c6ae8ac930fa2126add45f33645b7
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu f4a3fd4dda Support extra metrics in the RDC
Remove the * in the rdci stats
When a group is created, the GPUs can be added in the same command.
Add the support to the memory temperature.
Add the support to the memory clock.
Add the support to report the ECC errors.
Add the support to report the PCIe bandwidth throughput.

Since the RX/TX throughput may take 1 second to retreive, an async fetch is implemented
in the RdcMetricFetcherImpl.

Change-Id: If04f602fe1f2d14dbf7c2fb189549fd030523f9a
2020-08-17 14:07:25 -05:00
Chris Freehill 1b58033183 Make GPRC and protobuf external components to RDC
Pass in GRPC root (or use default location) for RDC to use
when building RDC components.

Change-Id: I89db2ac2be27ab6449c817d210a94c11fef965fd
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu fe3e75edfa Implement the gRPC APIs for the job stats
Add the job stats APIs in the rdc_api_service at the server side rdcd
Add the job stats APIs for the RdcStandaloneHandler at the client side
Make the load librdc.so and librdc_client.so thread safe.
Impelement async update all fields in RdcEmbeddedHandler.

Change-Id: I659d91efb32d1094d3b7f0f2cec39518cd7336ce
2020-08-17 14:07:25 -05:00
Chris Freehill a6acf24ae7 Handle different levels of rdcd privilege
Depending on how a user starts rdcd, rdcd will either have
full monitor/control capabilities or have just monitoring
capabilties.

The only 2 user ids allowed are "rdc" and root.

Change-Id: Ie296a2f68c9723bef5945b1af1070ef99eeea93b
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 7ee29b6cdd Implement the APIs for gRPC calls in client/server
Implement the APIs defined in the RdcStandaloneHandler to make gRPC call to daemon

Implement the APIs defined in the RdcAPIServiceImpl to handle the gRPC calls in daemon

Add two APIs to get all GPU groups and field groups: rdc_group_get_all_ids()
and rdc_group_field_all_ids()
Those two APIs are required by the rdci group and fieldgroup
sub-modules.

Change-Id: I066091423146dea180c16af212688ed43dc44611
2020-08-17 14:07:25 -05:00
Chris Freehill 47fdfa4c7e Add support for gRPC authenticated communications
Also, make a few namespace corrections and some minor refactoring.

Change-Id: Iedcaf6b43cb7576bc11dfefe980abd190c838831
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 020f6939f7 SWDEV-209060 - Create the Skeleton RDC CLI and daemon
Create the skeleton implementation of rdc_client.so and rdci. Modify current rdcd to
integrate the RDC API service:

rdc.proto is changed to add a new RdcAPI service which defined the interfaces for the RDC API.

RdcStandaloneHandler.cpp is added to send the request using gRPC to the rdcd. It is built into
the rdc_client.so

rdci.cc, RdciDisCoverySubSystem.cc and RdciSubSystem.cc are added to implement skeleton rdci.
Currently, the discovery subsystem is supported.

rdc_api_service.cc is added to the server as a skeleton to implement the RdcAPI service. Currently,
only discovery API is implemented. Note: we disabled the rdc_rsmi_service, which will be removed
in the future. The original rdc_client.so is renamed to rdc_client_smi.so which should also be
removed in the future.

Add the instruction how to run the rdcd and rdci in the build folder in the README.md.

Change-Id: Id232f9f83787e5812d4a295dc8cf0daa7728b06c
2020-08-17 14:07:25 -05:00
Chris Freehill 5cc498c6aa Make rdcd run as user "rdc"
The rdc account will be created on installation if it does
not already exist. It will be a system account with no
home directory.

rdcd will be started as a systemd service, but change to
user "rdc". The rdc user will drop all priviliges except
CAP_DAC_OVERRIDE, permitted. This means the default mode
will have no special privileges, but have the ability to
gain write access (e.g., to sysfs) when needed.

rdc tests were being inadvertantly added to the
installation. This was adversely impacting the new
functionality, so it was corrected in this commit.

Also included are a few small formatting changes.

Change-Id: I9c6bb132fee28119fd3960594dfb97bd2e7b282a
2020-08-17 14:07:25 -05:00
Chris Freehill 4729c47866 Add read fan values and associated tests
Change-Id: I89322e93d5f3110adace15e5a576f00d4934be79
2020-08-17 14:07:25 -05:00
Chris Freehill 02c6d3fb4d Add use of namespaces
Change-Id: I962eb808b3b874d1c3bf4cb418bf36952f88e3e2
2020-08-17 14:07:25 -05:00
Chris Freehill ca4344f5fa Add Google test based tests.
Initial testing include an "id test", which really just a
template test at this point, and a temperature sensor test.

The google test code is included in this commit. It will
eventually be taken out and replaced with a pull from a google
external repo.

Change-Id: I591818a9c169f4654fc8d8f17cf648f227d72545
2020-08-17 14:06:56 -05:00
Chris Freehill dc6f6f3e9a Break srvs. into rsmi & admin srvs. Add VerifyConnection api.
Change-Id: I67567264c37e31f3409062a14e56eba4801cd944
2020-01-09 20:02:33 -06:00
Chris Freehill 5898345d17 Initial RDC commit
Includes server, client and example targets.

Change-Id: I30596fb0453af71d49b8390a8468a6d073200836
2020-01-09 17:57:29 -06:00