Commit grafiek

150 Commits

Auteur SHA1 Bericht Datum
Galantsev, Dmitrii 2095dbbe8d SWDEV-436561 - Add CODEOWNERS
Change-Id: Ie806f1ba714a88643c0e5f9cb65bf70f8d59f1fb
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 82e4ea3b6f]
2023-12-12 12:07:47 -06:00
Sam Wu a9ad3af5e2 Standardize documentation for ReadtheDocs
Relates to https://github.com/RadeonOpenCompute/rocm-docs-core/issues/330

Change-Id: Ic9370548bb8d919376b20f7e1800fe620369e69b


[ROCm/rdc commit: 5890852ff1]
2023-12-08 16:56:59 -05:00
Galantsev, Dmitrii 45d7a2df04 Server - Add -a/--address option
Change-Id: Ia9e8d76b9a4ba0aadc567142601a87f0ad0b69e4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: ed3cfffd7e]
2023-12-04 15:26:44 -06:00
AravindanC 413163541a SWDEV-426649 - config file rocmpath hard coding removed
Change-Id: I01df16392201cc112c7533e8c092e4e336237b0b


[ROCm/rdc commit: c661bab06f]
2023-11-23 17:31:45 -05:00
Bill(Shuzhou) Liu 4acaddc32d Sort the ROCr gpu index based on BDF
The rocm-smi index is changed to sort based on BDF. The rocr plugin
is also changed based on that.

Change-Id: I5851431db336d50266b253dec1894a7bd9f3554b


[ROCm/rdc commit: 61a2773875]
2023-11-16 09:07:22 -05:00
Bill(Shuzhou) Liu a59c9e655b RDC crash when exit
Join the signal handling thread instead of cancel it to prevent
crash with "terminate called without an active exception".

Change-Id: I2e18eb825728fd3a94f67b1b0049516bb7b6ebbc


[ROCm/rdc commit: 1ab4110d46]
2023-11-03 09:10:22 -04:00
Galantsev, Dmitrii ff9f16b7b5 Upgrade gRPC v1.44.0 -> v1.59.1
Change-Id: Ib43a41c61d4028ec029a8c179a94060315870fbb
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e579cb04b2]
2023-10-19 17:29:36 -05:00
Galantsev, Dmitrii d4440d392e Upgrade to CXX-17 gtest-1.14
Change-Id: I1c7316f151128cbc9318b226dac14950e399d2c7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 8f9a6796f1]
2023-09-28 12:54:49 -05:00
Galantsev, Dmitrii 9b41583cfb README - Update documentation links
Change-Id: I2e778a766e6a4489280fe7b86f33a6c597983167
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: f6ace9fa14]
2023-09-13 19:34:28 -05:00
Galantsev, Dmitrii dffa733579 .gitignore - Ignore more build files
Change-Id: I5b5207e65cc3fd6537800db388da142c0e76c3ff
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: fc852fc915]
2023-09-06 10:21:11 -05:00
Galantsev, Dmitrii a0b6940fdd .editorconfig - Remove whitespace rule
Change-Id: Ia928dcb49fc094889784a0afcbc4abbe35bd59c7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 824056b0be]
2023-09-06 10:20:43 -05:00
Galantsev, Dmitrii 393bb97d99 SWDEV-410524 - Doxygen add WARN_AS_ERROR
Change-Id: I714712d61d1526cb75122a2f23e293745d41a701
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: de252b21a4]
2023-08-11 11:57:44 -05:00
Public Profile 6533962e0f fix broken links
Change-Id: Ibd941eb116fd9ae4ed7deeeb3a07324a2a3ca3c3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: a3ac4bac21]
2023-08-09 00:13:09 -05:00
Ranjith Ramakrishnan 52187e010d SWDEV-366827 - Disable file reorg backward compatibility support by default
Change-Id: I9c4201d7786be2e3f77bc1d4d15887741ba59ec5


[ROCm/rdc commit: 2e096d9009]
2023-08-07 09:25:00 -07:00
Galantsev, Dmitrii cbe2be5bf2 SYSTEMCTL: Check if running before stopping
When uninstalling the RDC application - the user is greeted with an
annoying "Failed to stop rdc.service..." message if the RDC is not
running.
This change makes sure RDC is active before trying to stop it.

Change-Id: I6fa57bfd4b9c348514cd6c38e60ed3930d32b62c
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6e52a113a2]
2023-05-31 11:41:25 -05:00
Ranjith Ramakrishnan 1bdd193581 SWDEV-310152 - Removed the RUNPATH setting in source code
Use the RUNPATH provided by build scripts

Change-Id: Ib5b3f689dc20aeecf6974281625865fe650bfa72


[ROCm/rdc commit: c82fdeab8d]
2023-05-30 16:17:08 -04:00
Sam Wu 4b5c27bf33 update documentation dependencies via requirements
rocm-docs-core v0.11.0

Change-Id: I2ecc8c6015b9bb186e1b3241eb84bcbda9c46152


[ROCm/rdc commit: 74dce41f4f]
2023-05-18 13:29:08 -06:00
Ranjith Ramakrishnan 068850c8dc SWDEV-383221 - Set the default value of ROCM_HEADER_WRAPPER_WERROR to OFF
Using wrapper header files will result in #warning message by default

Change-Id: If5847e1b03523251238018b2cf0725b302619963


[ROCm/rdc commit: bf49b88866]
2023-05-08 20:45:08 -07:00
Sam Wu 041868928a add configs for read the docs
add handbook, user, install, and integration guides

Change-Id: I996f6909f4fdf76910981c0224f5a0266907e27a

remove old documentation steps

Change-Id: Icfad09926e67a2dfa1de0e182fc3cd534f0448f7

formatting fixes

Change-Id: I704bbbbf6ad384178f804e4a3f5e621f9c3d33b9


[ROCm/rdc commit: 1335d19020]
2023-05-05 15:44:34 -06:00
Bill(Shuzhou) Liu 302c78b62b Rebuild librdc_ras.so on RHEL7
Build the library on lower OS for compatibility.

Change-Id: I24c5670d98131b739753e66fdd49b2e69759073c


[ROCm/rdc commit: 8e686daa7d]
2023-04-28 14:00:11 -05:00
Galantsev, Dmitrii fe28405d3a CMAKE: Fix RPM version
before fix:
CPack: - package: ... rdc-0.6.0-local.9999.el9.x86_64.rpm generated.

after fix:
CPack: - package: ... rdc-0.6.0.50600-local.9999.el9.x86_64.rpm generated.

Change-Id: I684816f3b4cad787eec6abbb40598d05c89d4f5d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 418167b43e]
2023-04-18 17:41:37 -04:00
Galantsev, Dmitrii a337dc062b SWDEV-392942 - Disable rocmtools
Temporarily disable rocmtools because of hsa_shut_down issues

Change-Id: I5e8b6729b8200ccdd5c399862bfc632ba69f884c
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 90e824c63b]
2023-04-05 13:20:19 -05:00
Galantsev, Dmitrii 95a9b1965c ASAN: Shutdown the signaling thread on exit
Change-Id: Ica546db354430f5f4adc33d8d92e09927d40f75b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 8f6bf948cc]
2023-03-28 11:26:38 -05:00
Bill(Shuzhou) Liu df95a71a09 Rebuild rdc_ras library on Ubuntu 20.04
Rebuild rdc_ras library on Ubuntu 20.04 for backward compatibilities.
Fallback to rocm_smi for ECC errors if rdc_ras library not available.

Change-Id: I8db9687e3eb54a6f62fce2c8d57a796c6da6b5c4


[ROCm/rdc commit: 29551b1fd0]
2023-03-16 10:02:15 -04:00
Ranjith Ramakrishnan ade4945ad4 SWDEV-366831 - Compile time flag to switch between #warning and #error message
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message

Change-Id: I45f987b572a306036a72525d2b90d366459117ad


[ROCm/rdc commit: f962d0959a]
2023-03-10 13:19:00 -08:00
Ranjith Ramakrishnan 758c906a20 SWDEV-366831 - File reorg backward compatibility message changed to #error
Change-Id: Ic6fceb5cf92cca0a3c0a7a78c81cbb69ca82dd5e


[ROCm/rdc commit: 32844ddfee]
2023-02-08 22:55:49 -08:00
Galantsev, Dmitrii c1a76d532a SWDEV-380364 - Resolve dmon + rocmtools halt
* Move hsa_init out of rocmtools and into RDC
* Remove secondary hsa_shut_down from ROCR module

Change-Id: I57d84d41ddc51595b98e734265f10bc5129a7352
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I2b389ee1a9ba3507b2df1fc2fe83598f67731aac


[ROCm/rdc commit: 24b3f138e9]
2023-02-02 18:33:14 -06:00
Galantsev, Dmitrii 6be2c8784d SWDEV-342533 - Hide WIP fields
Provide support for reliable metrics and hide experimental in current
release.

Further ROCMTools integration development is pushed out to ROCm 5.6.

Change-Id: Iae7a0ed3991588c833bd8ef580b02b9c71390d55
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 4536a453db]
2023-01-23 15:31:46 -06:00
Bill(Shuzhou) Liu 4b78871796 The RAS library SIGSEGV
When librdc_ras.so is fail to find librocm_smi64.so, it will crash.

Change-Id: I611465f6b5de87cd41ba9be9bd6ae35f66d92a3b


[ROCm/rdc commit: f467f802ad]
2023-01-19 13:09:22 -06:00
Galantsev, Dmitrii 8fc6d04a54 Format DOUBLE as a fixed floating point number
previous format:
1.20758e+06
0.370689
0.00014128

new format:
1207583.000
0.371
0.000

Change-Id: I00f41d841e5e62c4b25dc5e646b6487449773e01
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 4d35ff6092]
2023-01-18 11:18:57 -05:00
Galantsev, Dmitrii c59365f813 Remove rocmtools environment variable
- Set ROCMTOOLS_METRICS_PATH inside rdcd
- Add nullptr checks for rocmtools library functions

Change-Id: Ibbe4fed90df20e68b1a7971533765d831860c16f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 35edaa2322]
2023-01-16 19:16:26 -06:00
Bill(Shuzhou) Liu 152ff718b5 Add the changelog
Add the CHANGELOG.md for the release.

Change-Id: Ia6f0afaece9fca4df4dd2042e7a41fd91edd853c


[ROCm/rdc commit: 6687239cff]
2023-01-13 16:07:34 -05:00
Galantsev, Dmitrii 4091faf4f4 SWDEV-376779 - Fix linking for rdctst
Ieb198ad96e26e89b09cb85986214a5b1451b17a6 broke linking
for rdctst and rdcd by removing "../lib/rdc" path.
This change adds it back and makes the paths more visible.

- Link librdc_ras and librdc_rocp to rdctst
- Add longer RUNPATH for rdctst to link rdc libraries

Change-Id: Id4f128c217a6de8bb67df6750ecafdb96545811b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: fc097d44ff]
2023-01-11 19:40:59 -05:00
Galantsev, Dmitrii ece8443715 SYSTEMD: Remove set -e to allow failure
'set -e' flag doesn't allow to fall back to 'return 0' when a command
fails. As installation/removal should not fail in most cases - 'set -e'
flag has been removed.

Below page recommends to be careful with 'set -e' usage:
https://www.debian.org/doc/debian-policy/ch-opersys.html#writing-the-scripts

Change-Id: If4439aaff66747bceabd3beb7e00ae12ce950e43
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 2787767bf0]
2023-01-09 13:18:47 -06:00
Galantsev, Dmitrii 9cdf52b0b7 Fix rdcd crash on rocmtools fields read
- Solve issue that resulted in rdcd crash when reading registers 700-799
  by setting ROCMTOOLS_METRICS_PATH in rdc.service

README changes:
- Change default install path for gRPC
- Simplify install instructions
- Make more commands copy-pasteable
- Replace /opt/rocm-<version> with /opt/rocm
- Misc fixes

Change-Id: I39a2896ed2af5a3889f4b36cd8bcc8d3e9593585
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6e0c5d1d56]
2023-01-06 16:39:17 -06:00
Galantsev, Dmitrii 5c803f6b03 SWDEV-352414 - Fix gRPC linker issues
- Replace gRPC library with gRPC package
- Relax RUNPATH
- Make LINKER_FLAGS global

gRPC package includes its dependencies:
SSL, UPB, ABSL, and etc.

Change-Id: Ieb198ad96e26e89b09cb85986214a5b1451b17a6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 3e4c55ec6c]
2023-01-04 18:50:07 -06:00
Galantsev, Dmitrii eccb4e202c Add rocmtools support
This commit adds integration with ROCmTools

Additional changes:
- Fix DEB and RPM installation issue when systemd is not present
- Fix typos in rdc.h
- Wrap negative values in parentheses in rdc.h
- CMAKE: Improve rocm_smi searching
- README: Improve formatting, add info about ROCmTools

Metrics added: 700-714
Metrics can be listed with `rdci dmon --list-all`
Majority of the metrics are only supported by Instict (MI) series GPUs
700 RDC_FI_PROF_ELAPSED_CYCLES should be available on most devices
See README for more information

Change-Id: I907d3eacdc92fc5588ca6c76c2fa1ce0ad900770
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 861a843ed7]
2022-12-16 12:19:59 -06:00
Bill(Shuzhou) Liu 001461e975 gcc error when const is used
Change to #define to be compatible with gcc.

Change-Id: Id2a2b3dbeaaf7aea9b0e2075320c30f8bad50fc7


[ROCm/rdc commit: 73d8b35610]
2022-12-05 16:27:59 -06:00
Ranjith Ramakrishnan 2c614f6beb SWDEV-366823 - Change pragma message to warning
File reorganization feature was implemented with backward compatibility
The backward compatibility support will be deprecated in future release.
Changed the #pragma message to #warning for a smooth transition

Change-Id: Ib616bf4b7358a0607832a8af423c75e0bf2ab72d


[ROCm/rdc commit: 9c4ce805cf]
2022-11-21 00:59:02 -08:00
Galantsev, Dmitrii 2b89ab397c Improve CMake and relocate tests
- Respect CMAKE_INSTALL_PREFIX and ignore RDC_CLIENT_INSTALL_PREFIX
- Move example and rdctst from rocm/bin to rocm/share/rdc
- Add README for examples

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I0b1d996d206327fd1b51ac6e82d548829bdb1570


[ROCm/rdc commit: f6efd7fbf6]
2022-10-27 13:49:54 -05:00
Galantsev, Dmitrii 9ff80828e5 Compile rdctst and improve CMakeLists
Main CMake improvements:

* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm

Misc improvements:

* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b


[ROCm/rdc commit: 2c171767b3]
2022-10-07 13:58:50 -05:00
Ranjith Ramakrishnan 5d08745c53 SWDEV-345870 - Use actual header files and libraries of rocm smi rather than using wrapper header files and library softlinks
Rearranged the project name and including of GnuInstallDirs, so that GNU variables can be used

Change-Id: I85d8e69e0faf4f9f634295c3064b9f4f64f8e9b8


[ROCm/rdc commit: 3214c21e6e]
2022-09-22 13:37:58 -04:00
Galantsev, Dmitrii 0ae88bc221 Remove __pycache__ before uninstall
`__pycache__` might be created when a python script is ran. Which
prevents `rpm -e` and `dpkg --remove` from completely removing the
application. This patch removes `__pycache__` early in the uninstall
process.

A similar issue is resolved in:
rocm_smi_lib Change-Id: I695bd085d4a43b678b563b4c35f6d2e8ddfa7d7c

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I9fe0cd61570c2bd83cf9a45c95837ee6ad11e84b


[ROCm/rdc commit: 69a5a1d6bc]
2022-08-15 17:23:19 -04:00
Ranjith Ramakrishnan 92e6ab4bb8 SWDEV-350674 - Added backward compatibility for binary files and rdc.service
With file reorganization changes binaries are moved to /opt/rocm-ver/bin.
Similarly rdc.service moved to /opt/rocm-ver/libexec/rdc
Test suites still used old paths
Once test suites changes are made, backward compatibility for binaries and rdc.service can be removed
Corrcted binary path in rdc.service.in
Corrected GRPC runpath

Change-Id: I306924d81cedc19586305a79d51eea8af6e70e83


[ROCm/rdc commit: c3ea96dd71]
2022-08-09 17:45:22 -07:00
Ranjith Ramakrishnan 1e80f183d8 Use function to find all libraries with absl prefix rather than hard coding the library names
Some absl libraries are missing in the list and results in error while loading rdci binary
Use cmake logic to find all the absl libraries required instead of hard coding them.
Similar implemenation was done  in rdc server side

Change-Id: Ia04bcd137f892c2577a4a458b92ca212d42aef80


[ROCm/rdc commit: ba9633b74b]
2022-08-05 14:06:25 -07:00
Ranjith Ramakrishnan 3df8b88ca6 File reorganization with backward compatibility
SWDEV-291455 -  Binary , header files and libraries installed in bin,include and lib folder under /opt/rocm-ver
Prebuilt ras library with updated search path
cmake config files in lib/cmake/rdc
grpc,sp3,hsaco and private libraries installed in lib/rdc
config  installed in share/rdc
authentication and python_binding installed in libexec/rdc
Backward compatibility added for header files and libraries

Depends-On: I3f3d192935923f71737b3fe55ded536654a73dd7
Change-Id: Ia1a6cadc59034b155631a1ee5fdbe692d2a8a71b


[ROCm/rdc commit: 52a3463147]
2022-08-04 23:42:42 -07:00
Ranjith Ramakrishnan d8f996d16d SWDEV-321112 - Use GNUInstallDirs
Use GNUInstallDirs variables to determine the location of DOCDIR

Change-Id: I12b016bc1fe66afd92478c3a940093a4b35de0dc


[ROCm/rdc commit: 23ac479cbd]
2022-06-08 09:05:17 -07:00
Bill(Shuzhou) Liu e16a8bcaf5 Identify GPUs using PCI device identifier in RDC Prometheus plugin
Add a new option --enable_pci_id to Prometheus plugin, which will map
the GPU index to the PCI Device Identifier.

Change-Id: I38a2a7e4841975da095391002397d4515ffb8e0d


[ROCm/rdc commit: 23ab2c0671]
2022-05-05 09:16:05 -04:00
Bill(Shuzhou) Liu c6a69f8e59 Update RDC document
Update README.md to refer to document portal.

Change-Id: I427122751fec5a27936b345a3ac76c96478be164


[ROCm/rdc commit: 2cd7f66154]
2022-04-27 14:38:48 -04:00
Bill(Shuzhou) Liu ec564c1d2c Add run path dependency on grpc libraries
Add run path dependency for grpc libabsl_*.so required by RHEL.

Change-Id: Ie033cc25019e0cb46a895e8c3e583a0d22ab4561


[ROCm/rdc commit: c4dab3b2bd]
2022-03-28 09:04:17 -04:00