Wykres commitów

160 Commity

Autor SHA1 Wiadomość Data
Galantsev, Dmitrii aa5448fc16 CMAKE: Reduce install messages size
Change-Id: I6fa7cfe986b1de702492a96bddbfd406501bba50
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-02-06 00:31:32 -06:00
Bill(Shuzhou) Liu 5cfe2b4169 Fallback to junction temperature and socket power
If the card does not have edge temperature, fallback to junction
temperature. If the card only have socket power, then use socket
power instead.

Change-Id: I053a67a89cf3b29a34e82123f522c08d7dd68916
2024-02-05 10:10:26 -06:00
Galantsev, Dmitrii adf0d7094f Add __pycache__ to .gitignore
Change-Id: I815cf3cdb644978d959b80136ac7e95da3d2ca8d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-19 09:32:35 -06:00
Galantsev, Dmitrii 70ada65079 Rebuild librdc_ras.so
- Make librdc_ras.so executable

Change-Id: I715ef1d828fe4d0ecf63b8272ffeccbab280f9dc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-17 15:19:14 -06:00
Galantsev, Dmitrii f9e80cc37a Use templates for module population
Also add stddef.h workaround for old GCC.
RHEL-8 still uses GCC 8.5 and templates are not well supported.

Change-Id: Ia4dae23892ec63682ea848c46ba81de85cf6d209
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-10 00:27:09 -06:00
Galantsev, Dmitrii eaa1862a80 RVS: Finish initial RVS integration
NOTE: RVS Build is disabled by default due to CI build issues.

Change-Id: I1593f0fe22075a9f86f54afa3ac151e109f1f7bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-10 00:27:04 -06:00
Galantsev, Dmitrii 434e40305d LINT: Add cpplint, clang-format and pre-commit support
Change-Id: I3cbb787ef27d90486b212dfb1a8c77c460acc2ac
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-09 11:37:11 -06:00
Galantsev, Dmitrii 95e057c88d Simplify ModuleMgr
Change-Id: I3a57876c73e50771fcedb7ca4c67d55ac406b34d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-09 11:37:11 -06:00
Sam Wu a5906e9363 Update rocm-doc-core to v0.30.3
Documentation theme updates

Change-Id: I043d34b2947b5b27e06ce6a4f4c32f4b1e8ad039
2023-12-21 16:43:17 -07:00
Galantsev, Dmitrii 82e4ea3b6f SWDEV-436561 - Add CODEOWNERS
Change-Id: Ie806f1ba714a88643c0e5f9cb65bf70f8d59f1fb
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-12-12 12:07:47 -06:00
Sam Wu 5890852ff1 Standardize documentation for ReadtheDocs
Relates to https://github.com/RadeonOpenCompute/rocm-docs-core/issues/330

Change-Id: Ic9370548bb8d919376b20f7e1800fe620369e69b
2023-12-08 16:56:59 -05:00
Galantsev, Dmitrii ed3cfffd7e Server - Add -a/--address option
Change-Id: Ia9e8d76b9a4ba0aadc567142601a87f0ad0b69e4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-12-04 15:26:44 -06:00
AravindanC c661bab06f SWDEV-426649 - config file rocmpath hard coding removed
Change-Id: I01df16392201cc112c7533e8c092e4e336237b0b
2023-11-23 17:31:45 -05:00
Bill(Shuzhou) Liu 61a2773875 Sort the ROCr gpu index based on BDF
The rocm-smi index is changed to sort based on BDF. The rocr plugin
is also changed based on that.

Change-Id: I5851431db336d50266b253dec1894a7bd9f3554b
2023-11-16 09:07:22 -05:00
Bill(Shuzhou) Liu 1ab4110d46 RDC crash when exit
Join the signal handling thread instead of cancel it to prevent
crash with "terminate called without an active exception".

Change-Id: I2e18eb825728fd3a94f67b1b0049516bb7b6ebbc
2023-11-03 09:10:22 -04:00
Galantsev, Dmitrii e579cb04b2 Upgrade gRPC v1.44.0 -> v1.59.1
Change-Id: Ib43a41c61d4028ec029a8c179a94060315870fbb
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-19 17:29:36 -05:00
Galantsev, Dmitrii 8f9a6796f1 Upgrade to CXX-17 gtest-1.14
Change-Id: I1c7316f151128cbc9318b226dac14950e399d2c7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-28 12:54:49 -05:00
Galantsev, Dmitrii f6ace9fa14 README - Update documentation links
Change-Id: I2e778a766e6a4489280fe7b86f33a6c597983167
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-13 19:34:28 -05:00
Galantsev, Dmitrii fc852fc915 .gitignore - Ignore more build files
Change-Id: I5b5207e65cc3fd6537800db388da142c0e76c3ff
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-06 10:21:11 -05:00
Galantsev, Dmitrii 824056b0be .editorconfig - Remove whitespace rule
Change-Id: Ia928dcb49fc094889784a0afcbc4abbe35bd59c7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-06 10:20:43 -05:00
Galantsev, Dmitrii de252b21a4 SWDEV-410524 - Doxygen add WARN_AS_ERROR
Change-Id: I714712d61d1526cb75122a2f23e293745d41a701
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-11 11:57:44 -05:00
Public Profile a3ac4bac21 fix broken links
Change-Id: Ibd941eb116fd9ae4ed7deeeb3a07324a2a3ca3c3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-09 00:13:09 -05:00
Ranjith Ramakrishnan 2e096d9009 SWDEV-366827 - Disable file reorg backward compatibility support by default
Change-Id: I9c4201d7786be2e3f77bc1d4d15887741ba59ec5
2023-08-07 09:25:00 -07:00
Galantsev, Dmitrii 6e52a113a2 SYSTEMCTL: Check if running before stopping
When uninstalling the RDC application - the user is greeted with an
annoying "Failed to stop rdc.service..." message if the RDC is not
running.
This change makes sure RDC is active before trying to stop it.

Change-Id: I6fa57bfd4b9c348514cd6c38e60ed3930d32b62c
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-05-31 11:41:25 -05:00
Ranjith Ramakrishnan c82fdeab8d SWDEV-310152 - Removed the RUNPATH setting in source code
Use the RUNPATH provided by build scripts

Change-Id: Ib5b3f689dc20aeecf6974281625865fe650bfa72
2023-05-30 16:17:08 -04:00
Sam Wu 74dce41f4f update documentation dependencies via requirements
rocm-docs-core v0.11.0

Change-Id: I2ecc8c6015b9bb186e1b3241eb84bcbda9c46152
2023-05-18 13:29:08 -06:00
Ranjith Ramakrishnan bf49b88866 SWDEV-383221 - Set the default value of ROCM_HEADER_WRAPPER_WERROR to OFF
Using wrapper header files will result in #warning message by default

Change-Id: If5847e1b03523251238018b2cf0725b302619963
2023-05-08 20:45:08 -07:00
Sam Wu 1335d19020 add configs for read the docs
add handbook, user, install, and integration guides

Change-Id: I996f6909f4fdf76910981c0224f5a0266907e27a

remove old documentation steps

Change-Id: Icfad09926e67a2dfa1de0e182fc3cd534f0448f7

formatting fixes

Change-Id: I704bbbbf6ad384178f804e4a3f5e621f9c3d33b9
2023-05-05 15:44:34 -06:00
Bill(Shuzhou) Liu 8e686daa7d Rebuild librdc_ras.so on RHEL7
Build the library on lower OS for compatibility.

Change-Id: I24c5670d98131b739753e66fdd49b2e69759073c
2023-04-28 14:00:11 -05:00
Galantsev, Dmitrii 418167b43e CMAKE: Fix RPM version
before fix:
CPack: - package: ... rdc-0.6.0-local.9999.el9.x86_64.rpm generated.

after fix:
CPack: - package: ... rdc-0.6.0.50600-local.9999.el9.x86_64.rpm generated.

Change-Id: I684816f3b4cad787eec6abbb40598d05c89d4f5d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-04-18 17:41:37 -04:00
Galantsev, Dmitrii 90e824c63b SWDEV-392942 - Disable rocmtools
Temporarily disable rocmtools because of hsa_shut_down issues

Change-Id: I5e8b6729b8200ccdd5c399862bfc632ba69f884c
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-04-05 13:20:19 -05:00
Galantsev, Dmitrii 8f6bf948cc ASAN: Shutdown the signaling thread on exit
Change-Id: Ica546db354430f5f4adc33d8d92e09927d40f75b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-03-28 11:26:38 -05:00
Bill(Shuzhou) Liu 29551b1fd0 Rebuild rdc_ras library on Ubuntu 20.04
Rebuild rdc_ras library on Ubuntu 20.04 for backward compatibilities.
Fallback to rocm_smi for ECC errors if rdc_ras library not available.

Change-Id: I8db9687e3eb54a6f62fce2c8d57a796c6da6b5c4
2023-03-16 10:02:15 -04:00
Ranjith Ramakrishnan f962d0959a SWDEV-366831 - Compile time flag to switch between #warning and #error message
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message

Change-Id: I45f987b572a306036a72525d2b90d366459117ad
2023-03-10 13:19:00 -08:00
Ranjith Ramakrishnan 32844ddfee SWDEV-366831 - File reorg backward compatibility message changed to #error
Change-Id: Ic6fceb5cf92cca0a3c0a7a78c81cbb69ca82dd5e
2023-02-08 22:55:49 -08:00
Galantsev, Dmitrii 24b3f138e9 SWDEV-380364 - Resolve dmon + rocmtools halt
* Move hsa_init out of rocmtools and into RDC
* Remove secondary hsa_shut_down from ROCR module

Change-Id: I57d84d41ddc51595b98e734265f10bc5129a7352
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I2b389ee1a9ba3507b2df1fc2fe83598f67731aac
2023-02-02 18:33:14 -06:00
Galantsev, Dmitrii 4536a453db SWDEV-342533 - Hide WIP fields
Provide support for reliable metrics and hide experimental in current
release.

Further ROCMTools integration development is pushed out to ROCm 5.6.

Change-Id: Iae7a0ed3991588c833bd8ef580b02b9c71390d55
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-23 15:31:46 -06:00
Bill(Shuzhou) Liu f467f802ad The RAS library SIGSEGV
When librdc_ras.so is fail to find librocm_smi64.so, it will crash.

Change-Id: I611465f6b5de87cd41ba9be9bd6ae35f66d92a3b
2023-01-19 13:09:22 -06:00
Galantsev, Dmitrii 4d35ff6092 Format DOUBLE as a fixed floating point number
previous format:
1.20758e+06
0.370689
0.00014128

new format:
1207583.000
0.371
0.000

Change-Id: I00f41d841e5e62c4b25dc5e646b6487449773e01
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-18 11:18:57 -05:00
Galantsev, Dmitrii 35edaa2322 Remove rocmtools environment variable
- Set ROCMTOOLS_METRICS_PATH inside rdcd
- Add nullptr checks for rocmtools library functions

Change-Id: Ibbe4fed90df20e68b1a7971533765d831860c16f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-16 19:16:26 -06:00
Bill(Shuzhou) Liu 6687239cff Add the changelog
Add the CHANGELOG.md for the release.

Change-Id: Ia6f0afaece9fca4df4dd2042e7a41fd91edd853c
2023-01-13 16:07:34 -05:00
Galantsev, Dmitrii fc097d44ff SWDEV-376779 - Fix linking for rdctst
Ieb198ad96e26e89b09cb85986214a5b1451b17a6 broke linking
for rdctst and rdcd by removing "../lib/rdc" path.
This change adds it back and makes the paths more visible.

- Link librdc_ras and librdc_rocp to rdctst
- Add longer RUNPATH for rdctst to link rdc libraries

Change-Id: Id4f128c217a6de8bb67df6750ecafdb96545811b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-11 19:40:59 -05:00
Galantsev, Dmitrii 2787767bf0 SYSTEMD: Remove set -e to allow failure
'set -e' flag doesn't allow to fall back to 'return 0' when a command
fails. As installation/removal should not fail in most cases - 'set -e'
flag has been removed.

Below page recommends to be careful with 'set -e' usage:
https://www.debian.org/doc/debian-policy/ch-opersys.html#writing-the-scripts

Change-Id: If4439aaff66747bceabd3beb7e00ae12ce950e43
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-09 13:18:47 -06:00
Galantsev, Dmitrii 6e0c5d1d56 Fix rdcd crash on rocmtools fields read
- Solve issue that resulted in rdcd crash when reading registers 700-799
  by setting ROCMTOOLS_METRICS_PATH in rdc.service

README changes:
- Change default install path for gRPC
- Simplify install instructions
- Make more commands copy-pasteable
- Replace /opt/rocm-<version> with /opt/rocm
- Misc fixes

Change-Id: I39a2896ed2af5a3889f4b36cd8bcc8d3e9593585
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-06 16:39:17 -06:00
Galantsev, Dmitrii 3e4c55ec6c SWDEV-352414 - Fix gRPC linker issues
- Replace gRPC library with gRPC package
- Relax RUNPATH
- Make LINKER_FLAGS global

gRPC package includes its dependencies:
SSL, UPB, ABSL, and etc.

Change-Id: Ieb198ad96e26e89b09cb85986214a5b1451b17a6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-04 18:50:07 -06:00
Galantsev, Dmitrii 861a843ed7 Add rocmtools support
This commit adds integration with ROCmTools

Additional changes:
- Fix DEB and RPM installation issue when systemd is not present
- Fix typos in rdc.h
- Wrap negative values in parentheses in rdc.h
- CMAKE: Improve rocm_smi searching
- README: Improve formatting, add info about ROCmTools

Metrics added: 700-714
Metrics can be listed with `rdci dmon --list-all`
Majority of the metrics are only supported by Instict (MI) series GPUs
700 RDC_FI_PROF_ELAPSED_CYCLES should be available on most devices
See README for more information

Change-Id: I907d3eacdc92fc5588ca6c76c2fa1ce0ad900770
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-12-16 12:19:59 -06:00
Bill(Shuzhou) Liu 73d8b35610 gcc error when const is used
Change to #define to be compatible with gcc.

Change-Id: Id2a2b3dbeaaf7aea9b0e2075320c30f8bad50fc7
2022-12-05 16:27:59 -06:00
Ranjith Ramakrishnan 9c4ce805cf SWDEV-366823 - Change pragma message to warning
File reorganization feature was implemented with backward compatibility
The backward compatibility support will be deprecated in future release.
Changed the #pragma message to #warning for a smooth transition

Change-Id: Ib616bf4b7358a0607832a8af423c75e0bf2ab72d
2022-11-21 00:59:02 -08:00
Galantsev, Dmitrii f6efd7fbf6 Improve CMake and relocate tests
- Respect CMAKE_INSTALL_PREFIX and ignore RDC_CLIENT_INSTALL_PREFIX
- Move example and rdctst from rocm/bin to rocm/share/rdc
- Add README for examples

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I0b1d996d206327fd1b51ac6e82d548829bdb1570
2022-10-27 13:49:54 -05:00
Galantsev, Dmitrii 2c171767b3 Compile rdctst and improve CMakeLists
Main CMake improvements:

* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm

Misc improvements:

* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b
2022-10-07 13:58:50 -05:00