Grafik Komit

70 Melakukan

Penulis SHA1 Pesan Tanggal
Pryor, Adam 0e9c3b2c4f [SWDEV-243250] RDC Process Start/Stop integration (#189)
Change-Id: I3d2be33b5d23cd259b3d06fb572f81d19e6c3798

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-06-02 14:42:21 -05:00
Galantsev, Dmitrii fa8b89f4ae CMAKE - Format with cmake-format
Change-Id: I08e71fc5060b1f6e0168225cc5fe66886c2044bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-05-06 17:28:14 -05:00
Galantsev, Dmitrii ac50573e67 CMAKE - Bump version to 1.1.0
Change-Id: I0fbc0f6d842c034ad858f30fa6418afd01e11a4f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-04-11 17:27:27 -05:00
Galantsev, Dmitrii 24024f0e4f Revert "Implement CPU discovery support"
This reverts commit f967f8a17d15e148464393fcd145af01dc0e1525.
2025-04-07 20:45:19 -05:00
Yuan, Perry 3bdca8b8b6 Implement CPU discovery support (#77)
* Implement CPU discovery support

SWDEV-482949:

enable the CPU model name info support to the RDC, rdci command
can detect GPU and CPU modules at the same time.
It will query the CPU info through the amdsmi interface like below:

1 GPUs found.
-----------------------------------------------------------------
GPU Index        Device Information
0               AMD Radeon PRO W7800
=================================================================
1 CPUs found.
-----------------------------------------------------------------
CPU Index        Device Information
0               AMD Ryzen Threadripper PRO 7995WX 96-Cores
-----------------------------------------------------------------

Change-Id: Ibc6533c9a61000cd86c45b1bae14c3eb6788c119
Signed-off-by: Perry Yuan <perry.yuan@amd.com>

* CMAKE - Add required version for amdsmi

Change-Id: I341a89351d196ec66cce215a5d1d3953302fcc66
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

---------

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-31 10:58:36 +08:00
Galantsev, Dmitrii 80ee980cdb CMAKE - Fix build types
Addresses issue https://github.com/ROCm/rdc/issues/43

Change-Id: I456184358524a6feef4bf83eecb655678c3bc42d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-30 18:54:54 -05:00
Pryor, Adam 47692d3ed5 [SWDEV-498711] RDC Partition Implementation (#119)
* [SWDEV-498711] RDC Partition Implementation

Change-Id: Ibfc3709793770537e4c9d36458f34c6b4f461724
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-03-27 14:10:11 -05:00
Galantsev, Dmitrii a8d479c147 CMAKE - Fix ABSL in clang18+ (#106)
Please see:
- https://github.com/abseil/abseil-cpp/issues/1747
- https://github.com/llvm/llvm-project/issues/102443

When GRPC is compiled with different compiler from RDC - ABI broke.
Possibly because some templates were not instantiated.
Setting '-fclang-abi-compat=17' fixes the issue.

Change-Id: Ic6409cf413c87b135f334e5b03145cb1c63356d4

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-30 10:33:58 -06:00
Galantsev, Dmitrii 99d4d77e20 CMAKE - Move rdc_options into share/rdc/conf/
Change-Id: Ib2e792aef180f0f267d86d68c57b852b2cdc8ea6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-24 12:06:05 -06:00
Galantsev, Dmitrii e033fd4c55 CMAKE - Rename SMI_*_DIR into AMD_SMI_*_DIR
Change-Id: I3b8b852e6b68f1448c8ed5d5e6ea4579c470ff53
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-23 20:56:00 -06:00
stali a76760db8c fix group policy reg issue 2025-01-07 15:02:17 +08:00
Li, Star bd7d7c99c1 Fix unit issue in policy feature (#78)
1. For temperature the unit in milli Celsius
2. For power the unit in microwatts.
3. Fix second register call to rdcd doesn't functional because start flag

Co-authored-by: Chao Fei <chao.fei@amd.com>
2025-01-06 09:21:08 +08:00
Pryor, Adam 60b7359161 Implementation for adding pcie_total (#40)
* Implementation for adding pcie_total

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I4b0cfd7095e9d984e939283ee7169d01f55a1847
Signed-off-by: adapryor <Adam.pryor@amd.com>

* Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I021f29083de651cab9fbe7db98acbe20f65948d4

* Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I42f3207b745fa787dabe30a85c8e063159d1337d

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-26 18:36:41 -06:00
stali 29b6699b62 Enable RDC link Status feature
1.add link status APIs
   2.Add link status example for link status API usage
2024-12-23 09:30:21 +08:00
Adam Pryor df170c8801 Implementation for SWDEV-479728:[RDC] - Clock Speed/Power Cap Control
Change-Id: I767a71325527aa3c691e9607953ceafebacfb4d5
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-20 16:03:33 -06:00
stali 8bcb5f7068 Enable RDC topology feature
1.Add topology APIs
2.Add topology example for topology API usage

Change-Id: Ib79c06d0bac85119672f194ba685ebf25029979c
2024-12-16 10:02:41 +08:00
limeng12 853d3b0cc5 Backgroud health check
Add the RdcSmiHealth module, which will call rocm_smi_lib.
It will support following health:
 - XGMI error detected
 - PCIE replay count detected
 - Memory check
 - InfoROM check
 - Power/Thermal check
The grpc client and server side health function is added.
The health module is added to the rdci.

At present, XGMI/PCIE and a part of Memory have been implemented.
Others will be added as soon as possible.

Change-Id: I1bd99290bdc7dea733f21a41a8c4bcefb2138112
2024-11-19 14:00:49 +08:00
Galantsev, Dmitrii 9c77312c51 Finish basic logging impl
Change-Id: Ia3d6ac80f4832f1bfb63573c543659abd5f84341
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-07 11:21:22 -06:00
Chao Fei 345ac64a43 Enable RDC policy feature
1. Add policy APIs
2. Add policy example for policy API usage

Change-Id: I14deb7c809d0b865b7bb083842092fc37868025e
Signed-off-by: Chao Fei <Chao.Fei@amd.com>
2024-10-23 20:37:27 -04:00
Li Ma ca569346a3 SWDEV-445415 - Pthread detach instead of pthread join
Detcah the thread which handle shutdown signals instead of joining
thread can avoid the segfault issue on specific ASIC.

Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I74ac53c027ac370605caaa87115c83fd8027526a
2024-09-13 18:32:37 -04:00
Chen Gong 1edd04d84e Implement the code related to the GetMixedComponentVersion()
Change-Id: I98aad97b4cb6498b7f2fc03a2d5ee7c9e949d5f1
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 5a3fd9fbc1 Implement rdc_device_get_component_version API related code
Implement an API to obtain the version information of the rdc calling component.
See rdc_component_t for details on available components.
It can be expanded later if necessary.

Change-Id: I03b48f774179c52c57b606704283add74ca39a02
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 9f8d447e75 Add the function of outputting rdcd version information
Change-Id: I0572fd4b98f697660ab9099deabfd4f0fce802f3
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong ac874d3921 Get the hash value and pass it to rdcd and rdci
Want to display version information along with the hash value.

Change-Id: I0f9ad576f8f66747ce2e84d4f524ccd16d399927
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Galantsev, Dmitrii 796435c568 Fix runpath for rdci and rdcd
Change-Id: Ic131e9a5abfdf26f2b8e78799fe0e3450171d20d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-05-07 04:39:39 -05:00
Brandon Bagwell de3cb36ce0 Adds the ability to modify 'rdc' options
Modifying the /opt/rocm/etc/rdc file modifies RDC launch options.  If
the file doesn't exist, the service should still launch (though a new
file should likely be included with the next released package of 'rdc'.

Change-Id: I1a1891e9c5c3e6048754eb555779a97a170754c0
2024-04-30 10:28:16 -05:00
Galantsev, Dmitrii 9702d0f2d7 SWDEV-439576 - rocmsmi -> amdsmi
- Migrate to amdsmi library
- NOTE: raslib still uses rocmsmi
- Remove unused rocmsmi service
- Remove unused RDC client code
- Remove RSMI calls from protos/rdc.proto

Change-Id: Ifc34a264c506b0ec5792307ee56b34526268762d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-04-09 20:19:28 -05:00
Ranjith Ramakrishnan 0ca6d6fa59 Remove hard coded ROCm path in rdc.service
The executable rdcd was using an absolute path in rdc.service. Using update-alternatives gives the flexibility to invoke the binary from anywhere and no absolute path is required.

Change-Id: I2f3d6fcbf9dd854870cfc2e00532c504ce6cd6fc
2024-04-09 10:27:19 -05:00
Galantsev, Dmitrii 32806681ca SWDEV-444700 - CMAKE - Fix RUNPATH
These RUNPATH changes make it so libraries can be found without setting
LD_LIBRARY_PATH.

Mostly tested on installed RDC binaries and libraries. The
build binaries should also work.

Change-Id: Ifd908a5b61d24dfcbb1d08d21b4ee830156d8643
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-02-13 16:56:28 -06:00
Galantsev, Dmitrii f9e80cc37a Use templates for module population
Also add stddef.h workaround for old GCC.
RHEL-8 still uses GCC 8.5 and templates are not well supported.

Change-Id: Ia4dae23892ec63682ea848c46ba81de85cf6d209
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-10 00:27:09 -06:00
Galantsev, Dmitrii eaa1862a80 RVS: Finish initial RVS integration
NOTE: RVS Build is disabled by default due to CI build issues.

Change-Id: I1593f0fe22075a9f86f54afa3ac151e109f1f7bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-10 00:27:04 -06:00
Galantsev, Dmitrii 434e40305d LINT: Add cpplint, clang-format and pre-commit support
Change-Id: I3cbb787ef27d90486b212dfb1a8c77c460acc2ac
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-09 11:37:11 -06:00
Galantsev, Dmitrii ed3cfffd7e Server - Add -a/--address option
Change-Id: Ia9e8d76b9a4ba0aadc567142601a87f0ad0b69e4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-12-04 15:26:44 -06:00
Bill(Shuzhou) Liu 1ab4110d46 RDC crash when exit
Join the signal handling thread instead of cancel it to prevent
crash with "terminate called without an active exception".

Change-Id: I2e18eb825728fd3a94f67b1b0049516bb7b6ebbc
2023-11-03 09:10:22 -04:00
Galantsev, Dmitrii 8f9a6796f1 Upgrade to CXX-17 gtest-1.14
Change-Id: I1c7316f151128cbc9318b226dac14950e399d2c7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-28 12:54:49 -05:00
Galantsev, Dmitrii 8f6bf948cc ASAN: Shutdown the signaling thread on exit
Change-Id: Ica546db354430f5f4adc33d8d92e09927d40f75b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-03-28 11:26:38 -05:00
Galantsev, Dmitrii 35edaa2322 Remove rocmtools environment variable
- Set ROCMTOOLS_METRICS_PATH inside rdcd
- Add nullptr checks for rocmtools library functions

Change-Id: Ibbe4fed90df20e68b1a7971533765d831860c16f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-16 19:16:26 -06:00
Galantsev, Dmitrii 6e0c5d1d56 Fix rdcd crash on rocmtools fields read
- Solve issue that resulted in rdcd crash when reading registers 700-799
  by setting ROCMTOOLS_METRICS_PATH in rdc.service

README changes:
- Change default install path for gRPC
- Simplify install instructions
- Make more commands copy-pasteable
- Replace /opt/rocm-<version> with /opt/rocm
- Misc fixes

Change-Id: I39a2896ed2af5a3889f4b36cd8bcc8d3e9593585
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-06 16:39:17 -06:00
Galantsev, Dmitrii 3e4c55ec6c SWDEV-352414 - Fix gRPC linker issues
- Replace gRPC library with gRPC package
- Relax RUNPATH
- Make LINKER_FLAGS global

gRPC package includes its dependencies:
SSL, UPB, ABSL, and etc.

Change-Id: Ieb198ad96e26e89b09cb85986214a5b1451b17a6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-04 18:50:07 -06:00
Galantsev, Dmitrii f6efd7fbf6 Improve CMake and relocate tests
- Respect CMAKE_INSTALL_PREFIX and ignore RDC_CLIENT_INSTALL_PREFIX
- Move example and rdctst from rocm/bin to rocm/share/rdc
- Add README for examples

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I0b1d996d206327fd1b51ac6e82d548829bdb1570
2022-10-27 13:49:54 -05:00
Galantsev, Dmitrii 2c171767b3 Compile rdctst and improve CMakeLists
Main CMake improvements:

* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm

Misc improvements:

* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b
2022-10-07 13:58:50 -05:00
Ranjith Ramakrishnan c3ea96dd71 SWDEV-350674 - Added backward compatibility for binary files and rdc.service
With file reorganization changes binaries are moved to /opt/rocm-ver/bin.
Similarly rdc.service moved to /opt/rocm-ver/libexec/rdc
Test suites still used old paths
Once test suites changes are made, backward compatibility for binaries and rdc.service can be removed
Corrcted binary path in rdc.service.in
Corrected GRPC runpath

Change-Id: I306924d81cedc19586305a79d51eea8af6e70e83
2022-08-09 17:45:22 -07:00
Ranjith Ramakrishnan 52a3463147 File reorganization with backward compatibility
SWDEV-291455 -  Binary , header files and libraries installed in bin,include and lib folder under /opt/rocm-ver
Prebuilt ras library with updated search path
cmake config files in lib/cmake/rdc
grpc,sp3,hsaco and private libraries installed in lib/rdc
config  installed in share/rdc
authentication and python_binding installed in libexec/rdc
Backward compatibility added for header files and libraries

Depends-On: I3f3d192935923f71737b3fe55ded536654a73dd7
Change-Id: Ia1a6cadc59034b155631a1ee5fdbe692d2a8a71b
2022-08-04 23:42:42 -07:00
Bill(Shuzhou) Liu c4dab3b2bd Add run path dependency on grpc libraries
Add run path dependency for grpc libabsl_*.so required by RHEL.

Change-Id: Ie033cc25019e0cb46a895e8c3e583a0d22ab4561
2022-03-28 09:04:17 -04:00
Bill(Shuzhou) Liu 2a46ee2ab2 Enable the support to grpc v1.44.0
grpc v1.44.0 needs to link to library absl_synchronization. The
CMakeLists.txt is changed to link to that library if available.

Change-Id: I92f7247473a70e7a83416b9744e788e45d104565
2022-03-03 16:11:09 -05:00
Bill(Shuzhou) Liu 76ccf58008 Add the RdcSmiDiagnostic module
Provides a RdcSmiDiagnostic module, which will call rocm_smi_lib.

It will support following diagnostics: Get GPU Topology, Check GPU
parameters and check processes running on the GPUs.

The grpc client and server side diagnostics function is added.

The diag module is added to the rdci.

Change-Id: I10a0cf3c20556a61373ab686f82cae75acaa40dd
2021-07-26 14:56:17 -04:00
Chris Freehill 7a05145542 Fix some lintian errors
Fix lintian errors related to maintainer, postinst script and
permissions.

Change-Id: I6924ff92ff5453fa7e562a6188c2c91cea87df68
2021-03-03 19:35:24 -06:00
Chris Freehill 6b5aeaaa23 Turn on/off DAC capabilities as needed
Write access is required for some RSMI services. This change
temporarily permits write access so configuration can be done,
and then turns it off.

To help with this, the ScopedCapability struct is introduced to
provide scope limited access, helping to ensure a process is not
left with extra capability, should an exception occur.

Change-Id: I4978a1a688db935b8bfc27b3b537a0dd07959d3f
2021-02-04 12:25:26 -06:00
Bill(Shuzhou) Liu 07d4d5376e Install grpc lib to rdc folder
Install the grpc lib to rdc/grpc/lib and add miss libraries.

Add “--no-as-needed” and all extra grpc libraries in rdci/rdcd as
RUNPATH will only search direct dependencies.

Change-Id: I596acb2eb3a7228d703e79db64699bc20d0e7c09
2021-01-25 14:55:45 -05:00
Freddy Paul fe1593dda5 RDC:Move rdc deamon to rocm path.
Installing files to standard path across each version and using
ldconfig has issues with side-by-side install.

Usage of RUNPATH/RPATH for ROCm to ensure all ROCm libraries are
picked without the need for ldconfig.

For RDC server to be picked up by systemctl, service config file
shall be a symlink from /lib/systemctl/system/rdc.service to
corresponding RDC file path in a given version of ROCm

For side-by-side install packages of RDC post install scripts
will be removed. Hence Use will have to set the symlink explicitly
for now.

Change-Id: I916da7cf132f0f9c667e2470fac2b0875e3db9d0
2020-12-04 14:43:06 -05:00