44 Révisions

Auteur SHA1 Message Date
Swati Rawat cb257ab9f7 [rdc] Replace readme link rdc -> rocm-systems/projects/rdc (#1758)
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-11-14 13:19:26 +01:00
Galantsev, Dmitrii cccfe3e0f1 README - Add libcap-dev dependency
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: c401a6bed6]
2025-07-18 12:51:55 -05:00
Galantsev, Dmitrii 89a495e493 Profiler - Remove rocprofiler-v1 remnants
Also force unset HSA_TOOLS_LIB so it doesn't break rocprofiler-sdk

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e73eaf8115]
2025-06-27 13:15:52 -05:00
srawat e56a809946 Refactor RDC documentation
Change-Id: Ieaba84992a8cbd185f4c2d1dc36a175c0429b754


[ROCm/rdc commit: a865793b70]
2025-03-07 19:50:08 -06:00
Galantsev, Dmitrii b48d03515e Update gRPC to 1.67.1
Change-Id: I911878a3aeec8c9234b0e1ac4447364f2ed845cc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 8b249046c0]
2025-03-07 18:36:34 -06:00
Justin Williams 1a0e1ff280 Make README.md pretty
Change-Id: I7c3341deaf3621ebbc9e495b023b1dd4971a5f1d


[ROCm/rdc commit: f106364fc7]
2025-01-31 12:22:45 -06:00
Galantsev, Dmitrii 0bb38058e7 Revert "Dgalants/add auth script location (#108)"
This reverts commit 2f68fe1efe.


[ROCm/rdc commit: bee9991c4a]
2025-01-31 12:22:45 -06:00
Pryor, Adam 2f68fe1efe Dgalants/add auth script location (#108)
* DOCS: Add authentication scripts location

Change-Id: Ie285d80ea6d9bb8f710998208d0aa7c6db661d02
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

* Make README.md pretty (#44)

Change-Id: I7c3341deaf3621ebbc9e495b023b1dd4971a5f1d

---------

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Williams, Justin <Justin.Williams@amd.com>

[ROCm/rdc commit: a70aa81cfd]
2025-01-30 12:08:11 -06:00
Galantsev, Dmitrii b4dd8b40ab CMAKE - Fix ABSL in clang18+ (#106)
Please see:
- https://github.com/abseil/abseil-cpp/issues/1747
- https://github.com/llvm/llvm-project/issues/102443

When GRPC is compiled with different compiler from RDC - ABI broke.
Possibly because some templates were not instantiated.
Setting '-fclang-abi-compat=17' fixes the issue.

Change-Id: Ic6409cf413c87b135f334e5b03145cb1c63356d4

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

[ROCm/rdc commit: a8d479c147]
2025-01-30 10:33:58 -06:00
Galantsev, Dmitrii d5ce61d95e CMAKE - Move rdc_options into share/rdc/conf/
Change-Id: Ib2e792aef180f0f267d86d68c57b852b2cdc8ea6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 99d4d77e20]
2025-01-24 12:06:05 -06:00
Galantsev, Dmitrii 793b2de0cb Profiler - Modify metrics
Remove occupancy metrics and replace with OccupancyPercent

Add OCCUPANCY_PERCENT which uses OccupancyPercent
Add GR_ENGINE_ACTIVE which uses GPU_UTIL/100
Add TENSOR_ACTIVE_PERCENT which uses MfmaUtil
Modify FLOPS_64 to use FP64_ACTIVE

Change-Id: I5f30d77a0c80f5ac78abd1a9e57f8a0a3c6cc00b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 28acbf0436]
2024-10-15 19:00:30 -05:00
Galantsev, Dmitrii a4e55e52ec README: Add known issues section
Change-Id: I298750fdafed556480271cfce31c3fc88984cf0b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 04be1211c1]
2024-09-25 15:10:41 -05:00
Galantsev, Dmitrii 970cc3e72a Update CHANGELOG.md and README.md for ROCm 6.2
Change-Id: If062cb23290469beef0b04a146c485602377be5d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: bd9901324c]
2024-06-26 17:40:59 -05:00
randyh62 41c946a4f8 link updates, spelling
Change-Id: I71aafc2a0145d139c5c9ca6cb53214c77d88acc5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 383c0b19e8]
2024-05-08 18:15:38 -05:00
Galantsev, Dmitrii 8b317a6490 Add rocprofiler plugin
Rename ROCR -> Runtime and ROCP -> Profiler

Change-Id: If90953da8fa5d695b681813dad4a3e7ec26a9c7e
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 234b2d835b]
2024-05-07 04:39:39 -05:00
Galantsev, Dmitrii 028355dff0 SWDEV-439576 - rocmsmi -> amdsmi
- Migrate to amdsmi library
- NOTE: raslib still uses rocmsmi
- Remove unused rocmsmi service
- Remove unused RDC client code
- Remove RSMI calls from protos/rdc.proto

Change-Id: Ifc34a264c506b0ec5792307ee56b34526268762d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 9702d0f2d7]
2024-04-09 20:19:28 -05:00
Galantsev, Dmitrii 006f6b5fc7 Fix links and add certificate gen guide
Change-Id: Ieece04baade54ee3a7cde968aa08077e0d0d8391
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 67578106c4]
2024-03-19 14:41:16 -05:00
Galantsev, Dmitrii 9db00be1c1 README - Fix URLs and add lychee config
Use Lychee[1] to check dead links

[1] - https://github.com/lycheeverse/lychee

Change-Id: I0e8aade7879748dbcb4700a527bcae5a2c29ecb5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 2c27473d6f]
2024-02-08 17:06:02 -06:00
Galantsev, Dmitrii d4308e5175 Upgrade gRPC v1.59.1 -> v1.61.0
Change-Id: I8a3f13dd8f264e28474bd65e92ac53f87ab7db3f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: Icbb7b4a580894d78d8ef992befa26ce20fcf3309


[ROCm/rdc commit: f13a1fbea8]
2024-02-06 19:39:50 -06:00
Sam Wu a9ad3af5e2 Standardize documentation for ReadtheDocs
Relates to https://github.com/RadeonOpenCompute/rocm-docs-core/issues/330

Change-Id: Ic9370548bb8d919376b20f7e1800fe620369e69b


[ROCm/rdc commit: 5890852ff1]
2023-12-08 16:56:59 -05:00
Galantsev, Dmitrii ff9f16b7b5 Upgrade gRPC v1.44.0 -> v1.59.1
Change-Id: Ib43a41c61d4028ec029a8c179a94060315870fbb
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e579cb04b2]
2023-10-19 17:29:36 -05:00
Galantsev, Dmitrii 9b41583cfb README - Update documentation links
Change-Id: I2e778a766e6a4489280fe7b86f33a6c597983167
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: f6ace9fa14]
2023-09-13 19:34:28 -05:00
Public Profile 6533962e0f fix broken links
Change-Id: Ibd941eb116fd9ae4ed7deeeb3a07324a2a3ca3c3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: a3ac4bac21]
2023-08-09 00:13:09 -05:00
Sam Wu 041868928a add configs for read the docs
add handbook, user, install, and integration guides

Change-Id: I996f6909f4fdf76910981c0224f5a0266907e27a

remove old documentation steps

Change-Id: Icfad09926e67a2dfa1de0e182fc3cd534f0448f7

formatting fixes

Change-Id: I704bbbbf6ad384178f804e4a3f5e621f9c3d33b9


[ROCm/rdc commit: 1335d19020]
2023-05-05 15:44:34 -06:00
Galantsev, Dmitrii a337dc062b SWDEV-392942 - Disable rocmtools
Temporarily disable rocmtools because of hsa_shut_down issues

Change-Id: I5e8b6729b8200ccdd5c399862bfc632ba69f884c
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 90e824c63b]
2023-04-05 13:20:19 -05:00
Galantsev, Dmitrii 6be2c8784d SWDEV-342533 - Hide WIP fields
Provide support for reliable metrics and hide experimental in current
release.

Further ROCMTools integration development is pushed out to ROCm 5.6.

Change-Id: Iae7a0ed3991588c833bd8ef580b02b9c71390d55
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 4536a453db]
2023-01-23 15:31:46 -06:00
Galantsev, Dmitrii c59365f813 Remove rocmtools environment variable
- Set ROCMTOOLS_METRICS_PATH inside rdcd
- Add nullptr checks for rocmtools library functions

Change-Id: Ibbe4fed90df20e68b1a7971533765d831860c16f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 35edaa2322]
2023-01-16 19:16:26 -06:00
Galantsev, Dmitrii 9cdf52b0b7 Fix rdcd crash on rocmtools fields read
- Solve issue that resulted in rdcd crash when reading registers 700-799
  by setting ROCMTOOLS_METRICS_PATH in rdc.service

README changes:
- Change default install path for gRPC
- Simplify install instructions
- Make more commands copy-pasteable
- Replace /opt/rocm-<version> with /opt/rocm
- Misc fixes

Change-Id: I39a2896ed2af5a3889f4b36cd8bcc8d3e9593585
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6e0c5d1d56]
2023-01-06 16:39:17 -06:00
Galantsev, Dmitrii eccb4e202c Add rocmtools support
This commit adds integration with ROCmTools

Additional changes:
- Fix DEB and RPM installation issue when systemd is not present
- Fix typos in rdc.h
- Wrap negative values in parentheses in rdc.h
- CMAKE: Improve rocm_smi searching
- README: Improve formatting, add info about ROCmTools

Metrics added: 700-714
Metrics can be listed with `rdci dmon --list-all`
Majority of the metrics are only supported by Instict (MI) series GPUs
700 RDC_FI_PROF_ELAPSED_CYCLES should be available on most devices
See README for more information

Change-Id: I907d3eacdc92fc5588ca6c76c2fa1ce0ad900770
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 861a843ed7]
2022-12-16 12:19:59 -06:00
Galantsev, Dmitrii 9ff80828e5 Compile rdctst and improve CMakeLists
Main CMake improvements:

* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm

Misc improvements:

* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b


[ROCm/rdc commit: 2c171767b3]
2022-10-07 13:58:50 -05:00
Bill(Shuzhou) Liu c6a69f8e59 Update RDC document
Update README.md to refer to document portal.

Change-Id: I427122751fec5a27936b345a3ac76c96478be164


[ROCm/rdc commit: 2cd7f66154]
2022-04-27 14:38:48 -04:00
Bill(Shuzhou) Liu 6b700f8005 Support GPU memory test and compute queue test using Rocr
A new diagnostic module librdc_rocr.so is created. The
module uses Rocr to test the memory allocation, memory access
and compute queue ready status.

Change-Id: I9098f4fc3209bf381b7cb3658a4e94c2e22f2fe9


[ROCm/rdc commit: 78e2f2486b]
2021-10-21 11:01:12 -04:00
Bill(Shuzhou) Liu 114470e450 Change CMakeLists.txt to include the libras
The CMakeLists.txt is changed to add instructions to build raslib.

Change-Id: I0779046f28cbc7af292c83f3ae3ed7bcda5c57eb


[ROCm/rdc commit: 5b4fbe08d2]
2021-02-23 14:49:18 -05:00
Freddy Paul 8761ba13aa RDC:Move rdc deamon to rocm path.
Installing files to standard path across each version and using
ldconfig has issues with side-by-side install.

Usage of RUNPATH/RPATH for ROCm to ensure all ROCm libraries are
picked without the need for ldconfig.

For RDC server to be picked up by systemctl, service config file
shall be a symlink from /lib/systemctl/system/rdc.service to
corresponding RDC file path in a given version of ROCm

For side-by-side install packages of RDC post install scripts
will be removed. Hence Use will have to set the symlink explicitly
for now.

Change-Id: I916da7cf132f0f9c667e2470fac2b0875e3db9d0


[ROCm/rdc commit: fe1593dda5]
2020-12-04 14:43:06 -05:00
Bill(Shuzhou) Liu dbacfc2d6a Add a CMake option to build RDC library only
When RDC are only used as the libraries, the user can choose not to build
the rdci and rdcd, which will remove the dependencies to the gRPC and protoc.
The -DBUILD_STANDALONE=off should be pass to the cmake.
* Change README.md for the instructions.
* Move the python_binding installation from client/CMakeLists.txt to CMakeLists.txt
  so that the RDC library only build will also install the folder.
* Change CMakeLists.txt and rdc_libs/CMakeLists.txt to build with gRPC only if
  the BUILD_STANDALONE is enabled.

Change-Id: If9cfe9fc298a83636d85fe352a311fe2fe041661


[ROCm/rdc commit: 105675aeeb]
2020-11-11 08:48:40 -05:00
Chris Freehill 54f755a345 Update README with ldconfig instructions
Change-Id: Id033122d0b2f74b52a95a2ace99889c5d090cab3
(cherry picked from commit 29a3aee72f9546743d25ebae8c356b33933d3657)


[ROCm/rdc commit: 6fb4c79784]
2020-09-15 10:11:34 -04:00
Chris Freehill 13cfa9cb31 Add grpc to build
Also:
* fix typo in rpm post install script
* for RPM, tell CPack to exclude intermediate directories
  in rpm file

Change-Id: I9dbb4901298d3699e092b53b339f5cb1d77b4edb
(cherry picked from commit e894cfa757aae8343afb373ce4ae60a1aa950a91)


[ROCm/rdc commit: 9051b752c4]
2020-09-12 09:52:48 -04:00
Harish Kasiviswanathan 04d8f623a2 Update README.md document
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I365acc202442495daf89df1328e58c92457ab10d


[ROCm/rdc commit: 5e1111d4cb]
2020-09-02 20:07:05 -04:00
Chris Freehill dbfc169a4e Move docs/README.md to root
Also:
* consolidated the info in the previous rdc/README.md into
the README.md that was moved from docs/ directory.
* added missing information to get grpc into the default
library path (needed to add the grpc dir with ldconfig).
* formatting fixes

Change-Id: Id61e761ad7bdee40364bb8837be8705ed5ca53d1


[ROCm/rdc commit: bf412e3f76]
2020-08-18 17:45:33 -04:00
Bill(Shuzhou) Liu 0813e7052f Implement the rdc_lib API to support the job stats
Add the function to start and stop the job recording.
Add the function to get the job stats for each GPU and summary of multiple GPUs
Add the function to remove the jobs.

Add a class RdcLogger which can control the log level using the environment variable RDC_LOG.
This is similar to GRPC_VERBOSITY gRPC. When the customer has the issues, he can enable the verbose
log to help us to troubleshoot the issues.

Add the -u support in the rdci group, fieldgroup and dmon for connecting to rdcd without authentication.

Change-Id: I22c591823c1ee6485db106b911bed8271d1b2769


[ROCm/rdc commit: a547dc7efd]
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu aef3d29925 Implement the rdci subsystem: group, fieldgroup and dmon
Add the support for rdci subsystem group create, delete and query

Add the support for rdci subsystem fieldgroup create, delete and query

Add the support for rdci dmon system. The dmon system may show the stats every
a few seconds until press Ctrl-C. To cleanup the resources (for example, unwatch),
a signal handler is added.

Change-Id: Ib22a8a43b7083c7c72819ca21145e22702d9ad6c


[ROCm/rdc commit: 16bce67835]
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 0a20efdbf3 Add SSL mutual authentication support for rdci
The RDC API is changed to pass the certificates to the gRPC.

Add the support to add all GPUs in the host to a group. Also before
add a GPU to a group, the RDC API will verify that GPU exists or not.

Add the support to fetch the temperature metrics.

Change-Id: I5857ef03fede233d16e8b2836be120f33172da93


[ROCm/rdc commit: 66e4e790c3]
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 199f085ce3 SWDEV-209060 - Create the Skeleton RDC CLI and daemon
Create the skeleton implementation of rdc_client.so and rdci. Modify current rdcd to
integrate the RDC API service:

rdc.proto is changed to add a new RdcAPI service which defined the interfaces for the RDC API.

RdcStandaloneHandler.cpp is added to send the request using gRPC to the rdcd. It is built into
the rdc_client.so

rdci.cc, RdciDisCoverySubSystem.cc and RdciSubSystem.cc are added to implement skeleton rdci.
Currently, the discovery subsystem is supported.

rdc_api_service.cc is added to the server as a skeleton to implement the RdcAPI service. Currently,
only discovery API is implemented. Note: we disabled the rdc_rsmi_service, which will be removed
in the future. The original rdc_client.so is renamed to rdc_client_smi.so which should also be
removed in the future.

Add the instruction how to run the rdcd and rdci in the build folder in the README.md.

Change-Id: Id232f9f83787e5812d4a295dc8cf0daa7728b06c


[ROCm/rdc commit: 020f6939f7]
2020-08-17 14:07:25 -05:00
Chris Freehill dd48b63bbf Initial commit
Change-Id: I30d87413f6771d1d9d67cd4b2d65ed788d275533


[ROCm/rdc commit: 0de56e087a]
2020-01-09 17:57:19 -06:00