Wykres commitów

41 Commity

Autor SHA1 Wiadomość Data
srawat a865793b70 Refactor RDC documentation
Change-Id: Ieaba84992a8cbd185f4c2d1dc36a175c0429b754
2025-03-07 19:50:08 -06:00
Galantsev, Dmitrii 8b249046c0 Update gRPC to 1.67.1
Change-Id: I911878a3aeec8c9234b0e1ac4447364f2ed845cc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-07 18:36:34 -06:00
Justin Williams f106364fc7 Make README.md pretty
Change-Id: I7c3341deaf3621ebbc9e495b023b1dd4971a5f1d
2025-01-31 12:22:45 -06:00
Galantsev, Dmitrii bee9991c4a Revert "Dgalants/add auth script location (#108)"
This reverts commit a70aa81cfd.
2025-01-31 12:22:45 -06:00
Pryor, Adam a70aa81cfd Dgalants/add auth script location (#108)
* DOCS: Add authentication scripts location

Change-Id: Ie285d80ea6d9bb8f710998208d0aa7c6db661d02
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

* Make README.md pretty (#44)

Change-Id: I7c3341deaf3621ebbc9e495b023b1dd4971a5f1d

---------

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Williams, Justin <Justin.Williams@amd.com>
2025-01-30 12:08:11 -06:00
Galantsev, Dmitrii a8d479c147 CMAKE - Fix ABSL in clang18+ (#106)
Please see:
- https://github.com/abseil/abseil-cpp/issues/1747
- https://github.com/llvm/llvm-project/issues/102443

When GRPC is compiled with different compiler from RDC - ABI broke.
Possibly because some templates were not instantiated.
Setting '-fclang-abi-compat=17' fixes the issue.

Change-Id: Ic6409cf413c87b135f334e5b03145cb1c63356d4

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-30 10:33:58 -06:00
Galantsev, Dmitrii 99d4d77e20 CMAKE - Move rdc_options into share/rdc/conf/
Change-Id: Ib2e792aef180f0f267d86d68c57b852b2cdc8ea6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-24 12:06:05 -06:00
Galantsev, Dmitrii 28acbf0436 Profiler - Modify metrics
Remove occupancy metrics and replace with OccupancyPercent

Add OCCUPANCY_PERCENT which uses OccupancyPercent
Add GR_ENGINE_ACTIVE which uses GPU_UTIL/100
Add TENSOR_ACTIVE_PERCENT which uses MfmaUtil
Modify FLOPS_64 to use FP64_ACTIVE

Change-Id: I5f30d77a0c80f5ac78abd1a9e57f8a0a3c6cc00b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-10-15 19:00:30 -05:00
Galantsev, Dmitrii 04be1211c1 README: Add known issues section
Change-Id: I298750fdafed556480271cfce31c3fc88984cf0b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-25 15:10:41 -05:00
Galantsev, Dmitrii bd9901324c Update CHANGELOG.md and README.md for ROCm 6.2
Change-Id: If062cb23290469beef0b04a146c485602377be5d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-06-26 17:40:59 -05:00
randyh62 383c0b19e8 link updates, spelling
Change-Id: I71aafc2a0145d139c5c9ca6cb53214c77d88acc5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-05-08 18:15:38 -05:00
Galantsev, Dmitrii 234b2d835b Add rocprofiler plugin
Rename ROCR -> Runtime and ROCP -> Profiler

Change-Id: If90953da8fa5d695b681813dad4a3e7ec26a9c7e
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-05-07 04:39:39 -05:00
Galantsev, Dmitrii 9702d0f2d7 SWDEV-439576 - rocmsmi -> amdsmi
- Migrate to amdsmi library
- NOTE: raslib still uses rocmsmi
- Remove unused rocmsmi service
- Remove unused RDC client code
- Remove RSMI calls from protos/rdc.proto

Change-Id: Ifc34a264c506b0ec5792307ee56b34526268762d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-04-09 20:19:28 -05:00
Galantsev, Dmitrii 67578106c4 Fix links and add certificate gen guide
Change-Id: Ieece04baade54ee3a7cde968aa08077e0d0d8391
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-03-19 14:41:16 -05:00
Galantsev, Dmitrii 2c27473d6f README - Fix URLs and add lychee config
Use Lychee[1] to check dead links

[1] - https://github.com/lycheeverse/lychee

Change-Id: I0e8aade7879748dbcb4700a527bcae5a2c29ecb5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-02-08 17:06:02 -06:00
Galantsev, Dmitrii f13a1fbea8 Upgrade gRPC v1.59.1 -> v1.61.0
Change-Id: I8a3f13dd8f264e28474bd65e92ac53f87ab7db3f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: Icbb7b4a580894d78d8ef992befa26ce20fcf3309
2024-02-06 19:39:50 -06:00
Sam Wu 5890852ff1 Standardize documentation for ReadtheDocs
Relates to https://github.com/RadeonOpenCompute/rocm-docs-core/issues/330

Change-Id: Ic9370548bb8d919376b20f7e1800fe620369e69b
2023-12-08 16:56:59 -05:00
Galantsev, Dmitrii e579cb04b2 Upgrade gRPC v1.44.0 -> v1.59.1
Change-Id: Ib43a41c61d4028ec029a8c179a94060315870fbb
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-19 17:29:36 -05:00
Galantsev, Dmitrii f6ace9fa14 README - Update documentation links
Change-Id: I2e778a766e6a4489280fe7b86f33a6c597983167
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-13 19:34:28 -05:00
Public Profile a3ac4bac21 fix broken links
Change-Id: Ibd941eb116fd9ae4ed7deeeb3a07324a2a3ca3c3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-09 00:13:09 -05:00
Sam Wu 1335d19020 add configs for read the docs
add handbook, user, install, and integration guides

Change-Id: I996f6909f4fdf76910981c0224f5a0266907e27a

remove old documentation steps

Change-Id: Icfad09926e67a2dfa1de0e182fc3cd534f0448f7

formatting fixes

Change-Id: I704bbbbf6ad384178f804e4a3f5e621f9c3d33b9
2023-05-05 15:44:34 -06:00
Galantsev, Dmitrii 90e824c63b SWDEV-392942 - Disable rocmtools
Temporarily disable rocmtools because of hsa_shut_down issues

Change-Id: I5e8b6729b8200ccdd5c399862bfc632ba69f884c
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-04-05 13:20:19 -05:00
Galantsev, Dmitrii 4536a453db SWDEV-342533 - Hide WIP fields
Provide support for reliable metrics and hide experimental in current
release.

Further ROCMTools integration development is pushed out to ROCm 5.6.

Change-Id: Iae7a0ed3991588c833bd8ef580b02b9c71390d55
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-23 15:31:46 -06:00
Galantsev, Dmitrii 35edaa2322 Remove rocmtools environment variable
- Set ROCMTOOLS_METRICS_PATH inside rdcd
- Add nullptr checks for rocmtools library functions

Change-Id: Ibbe4fed90df20e68b1a7971533765d831860c16f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-16 19:16:26 -06:00
Galantsev, Dmitrii 6e0c5d1d56 Fix rdcd crash on rocmtools fields read
- Solve issue that resulted in rdcd crash when reading registers 700-799
  by setting ROCMTOOLS_METRICS_PATH in rdc.service

README changes:
- Change default install path for gRPC
- Simplify install instructions
- Make more commands copy-pasteable
- Replace /opt/rocm-<version> with /opt/rocm
- Misc fixes

Change-Id: I39a2896ed2af5a3889f4b36cd8bcc8d3e9593585
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-06 16:39:17 -06:00
Galantsev, Dmitrii 861a843ed7 Add rocmtools support
This commit adds integration with ROCmTools

Additional changes:
- Fix DEB and RPM installation issue when systemd is not present
- Fix typos in rdc.h
- Wrap negative values in parentheses in rdc.h
- CMAKE: Improve rocm_smi searching
- README: Improve formatting, add info about ROCmTools

Metrics added: 700-714
Metrics can be listed with `rdci dmon --list-all`
Majority of the metrics are only supported by Instict (MI) series GPUs
700 RDC_FI_PROF_ELAPSED_CYCLES should be available on most devices
See README for more information

Change-Id: I907d3eacdc92fc5588ca6c76c2fa1ce0ad900770
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-12-16 12:19:59 -06:00
Galantsev, Dmitrii 2c171767b3 Compile rdctst and improve CMakeLists
Main CMake improvements:

* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm

Misc improvements:

* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b
2022-10-07 13:58:50 -05:00
Bill(Shuzhou) Liu 2cd7f66154 Update RDC document
Update README.md to refer to document portal.

Change-Id: I427122751fec5a27936b345a3ac76c96478be164
2022-04-27 14:38:48 -04:00
Bill(Shuzhou) Liu 78e2f2486b Support GPU memory test and compute queue test using Rocr
A new diagnostic module librdc_rocr.so is created. The
module uses Rocr to test the memory allocation, memory access
and compute queue ready status.

Change-Id: I9098f4fc3209bf381b7cb3658a4e94c2e22f2fe9
2021-10-21 11:01:12 -04:00
Bill(Shuzhou) Liu 5b4fbe08d2 Change CMakeLists.txt to include the libras
The CMakeLists.txt is changed to add instructions to build raslib.

Change-Id: I0779046f28cbc7af292c83f3ae3ed7bcda5c57eb
2021-02-23 14:49:18 -05:00
Freddy Paul fe1593dda5 RDC:Move rdc deamon to rocm path.
Installing files to standard path across each version and using
ldconfig has issues with side-by-side install.

Usage of RUNPATH/RPATH for ROCm to ensure all ROCm libraries are
picked without the need for ldconfig.

For RDC server to be picked up by systemctl, service config file
shall be a symlink from /lib/systemctl/system/rdc.service to
corresponding RDC file path in a given version of ROCm

For side-by-side install packages of RDC post install scripts
will be removed. Hence Use will have to set the symlink explicitly
for now.

Change-Id: I916da7cf132f0f9c667e2470fac2b0875e3db9d0
2020-12-04 14:43:06 -05:00
Bill(Shuzhou) Liu 105675aeeb Add a CMake option to build RDC library only
When RDC are only used as the libraries, the user can choose not to build
the rdci and rdcd, which will remove the dependencies to the gRPC and protoc.
The -DBUILD_STANDALONE=off should be pass to the cmake.
* Change README.md for the instructions.
* Move the python_binding installation from client/CMakeLists.txt to CMakeLists.txt
  so that the RDC library only build will also install the folder.
* Change CMakeLists.txt and rdc_libs/CMakeLists.txt to build with gRPC only if
  the BUILD_STANDALONE is enabled.

Change-Id: If9cfe9fc298a83636d85fe352a311fe2fe041661
2020-11-11 08:48:40 -05:00
Chris Freehill 6fb4c79784 Update README with ldconfig instructions
Change-Id: Id033122d0b2f74b52a95a2ace99889c5d090cab3
(cherry picked from commit 29a3aee72f9546743d25ebae8c356b33933d3657)
2020-09-15 10:11:34 -04:00
Chris Freehill 9051b752c4 Add grpc to build
Also:
* fix typo in rpm post install script
* for RPM, tell CPack to exclude intermediate directories
  in rpm file

Change-Id: I9dbb4901298d3699e092b53b339f5cb1d77b4edb
(cherry picked from commit e894cfa757aae8343afb373ce4ae60a1aa950a91)
2020-09-12 09:52:48 -04:00
Harish Kasiviswanathan 5e1111d4cb Update README.md document
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I365acc202442495daf89df1328e58c92457ab10d
2020-09-02 20:07:05 -04:00
Chris Freehill bf412e3f76 Move docs/README.md to root
Also:
* consolidated the info in the previous rdc/README.md into
the README.md that was moved from docs/ directory.
* added missing information to get grpc into the default
library path (needed to add the grpc dir with ldconfig).
* formatting fixes

Change-Id: Id61e761ad7bdee40364bb8837be8705ed5ca53d1
2020-08-18 17:45:33 -04:00
Bill(Shuzhou) Liu a547dc7efd Implement the rdc_lib API to support the job stats
Add the function to start and stop the job recording.
Add the function to get the job stats for each GPU and summary of multiple GPUs
Add the function to remove the jobs.

Add a class RdcLogger which can control the log level using the environment variable RDC_LOG.
This is similar to GRPC_VERBOSITY gRPC. When the customer has the issues, he can enable the verbose
log to help us to troubleshoot the issues.

Add the -u support in the rdci group, fieldgroup and dmon for connecting to rdcd without authentication.

Change-Id: I22c591823c1ee6485db106b911bed8271d1b2769
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 16bce67835 Implement the rdci subsystem: group, fieldgroup and dmon
Add the support for rdci subsystem group create, delete and query

Add the support for rdci subsystem fieldgroup create, delete and query

Add the support for rdci dmon system. The dmon system may show the stats every
a few seconds until press Ctrl-C. To cleanup the resources (for example, unwatch),
a signal handler is added.

Change-Id: Ib22a8a43b7083c7c72819ca21145e22702d9ad6c
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 66e4e790c3 Add SSL mutual authentication support for rdci
The RDC API is changed to pass the certificates to the gRPC.

Add the support to add all GPUs in the host to a group. Also before
add a GPU to a group, the RDC API will verify that GPU exists or not.

Add the support to fetch the temperature metrics.

Change-Id: I5857ef03fede233d16e8b2836be120f33172da93
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 020f6939f7 SWDEV-209060 - Create the Skeleton RDC CLI and daemon
Create the skeleton implementation of rdc_client.so and rdci. Modify current rdcd to
integrate the RDC API service:

rdc.proto is changed to add a new RdcAPI service which defined the interfaces for the RDC API.

RdcStandaloneHandler.cpp is added to send the request using gRPC to the rdcd. It is built into
the rdc_client.so

rdci.cc, RdciDisCoverySubSystem.cc and RdciSubSystem.cc are added to implement skeleton rdci.
Currently, the discovery subsystem is supported.

rdc_api_service.cc is added to the server as a skeleton to implement the RdcAPI service. Currently,
only discovery API is implemented. Note: we disabled the rdc_rsmi_service, which will be removed
in the future. The original rdc_client.so is renamed to rdc_client_smi.so which should also be
removed in the future.

Add the instruction how to run the rdcd and rdci in the build folder in the README.md.

Change-Id: Id232f9f83787e5812d4a295dc8cf0daa7728b06c
2020-08-17 14:07:25 -05:00
Chris Freehill 0de56e087a Initial commit
Change-Id: I30d87413f6771d1d9d67cd4b2d65ed788d275533
2020-01-09 17:57:19 -06:00