Commit-Graf

130 Incheckningar

Upphovsman SHA1 Meddelande Datum
Galantsev, Dmitrii fe28405d3a CMAKE: Fix RPM version
before fix:
CPack: - package: ... rdc-0.6.0-local.9999.el9.x86_64.rpm generated.

after fix:
CPack: - package: ... rdc-0.6.0.50600-local.9999.el9.x86_64.rpm generated.

Change-Id: I684816f3b4cad787eec6abbb40598d05c89d4f5d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 418167b43e]
2023-04-18 17:41:37 -04:00
Galantsev, Dmitrii a337dc062b SWDEV-392942 - Disable rocmtools
Temporarily disable rocmtools because of hsa_shut_down issues

Change-Id: I5e8b6729b8200ccdd5c399862bfc632ba69f884c
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 90e824c63b]
2023-04-05 13:20:19 -05:00
Galantsev, Dmitrii 95a9b1965c ASAN: Shutdown the signaling thread on exit
Change-Id: Ica546db354430f5f4adc33d8d92e09927d40f75b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 8f6bf948cc]
2023-03-28 11:26:38 -05:00
Bill(Shuzhou) Liu df95a71a09 Rebuild rdc_ras library on Ubuntu 20.04
Rebuild rdc_ras library on Ubuntu 20.04 for backward compatibilities.
Fallback to rocm_smi for ECC errors if rdc_ras library not available.

Change-Id: I8db9687e3eb54a6f62fce2c8d57a796c6da6b5c4


[ROCm/rdc commit: 29551b1fd0]
2023-03-16 10:02:15 -04:00
Ranjith Ramakrishnan ade4945ad4 SWDEV-366831 - Compile time flag to switch between #warning and #error message
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message

Change-Id: I45f987b572a306036a72525d2b90d366459117ad


[ROCm/rdc commit: f962d0959a]
2023-03-10 13:19:00 -08:00
Ranjith Ramakrishnan 758c906a20 SWDEV-366831 - File reorg backward compatibility message changed to #error
Change-Id: Ic6fceb5cf92cca0a3c0a7a78c81cbb69ca82dd5e


[ROCm/rdc commit: 32844ddfee]
2023-02-08 22:55:49 -08:00
Galantsev, Dmitrii c1a76d532a SWDEV-380364 - Resolve dmon + rocmtools halt
* Move hsa_init out of rocmtools and into RDC
* Remove secondary hsa_shut_down from ROCR module

Change-Id: I57d84d41ddc51595b98e734265f10bc5129a7352
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I2b389ee1a9ba3507b2df1fc2fe83598f67731aac


[ROCm/rdc commit: 24b3f138e9]
2023-02-02 18:33:14 -06:00
Galantsev, Dmitrii 6be2c8784d SWDEV-342533 - Hide WIP fields
Provide support for reliable metrics and hide experimental in current
release.

Further ROCMTools integration development is pushed out to ROCm 5.6.

Change-Id: Iae7a0ed3991588c833bd8ef580b02b9c71390d55
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 4536a453db]
2023-01-23 15:31:46 -06:00
Bill(Shuzhou) Liu 4b78871796 The RAS library SIGSEGV
When librdc_ras.so is fail to find librocm_smi64.so, it will crash.

Change-Id: I611465f6b5de87cd41ba9be9bd6ae35f66d92a3b


[ROCm/rdc commit: f467f802ad]
2023-01-19 13:09:22 -06:00
Galantsev, Dmitrii 8fc6d04a54 Format DOUBLE as a fixed floating point number
previous format:
1.20758e+06
0.370689
0.00014128

new format:
1207583.000
0.371
0.000

Change-Id: I00f41d841e5e62c4b25dc5e646b6487449773e01
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 4d35ff6092]
2023-01-18 11:18:57 -05:00
Galantsev, Dmitrii c59365f813 Remove rocmtools environment variable
- Set ROCMTOOLS_METRICS_PATH inside rdcd
- Add nullptr checks for rocmtools library functions

Change-Id: Ibbe4fed90df20e68b1a7971533765d831860c16f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 35edaa2322]
2023-01-16 19:16:26 -06:00
Bill(Shuzhou) Liu 152ff718b5 Add the changelog
Add the CHANGELOG.md for the release.

Change-Id: Ia6f0afaece9fca4df4dd2042e7a41fd91edd853c


[ROCm/rdc commit: 6687239cff]
2023-01-13 16:07:34 -05:00
Galantsev, Dmitrii 4091faf4f4 SWDEV-376779 - Fix linking for rdctst
Ieb198ad96e26e89b09cb85986214a5b1451b17a6 broke linking
for rdctst and rdcd by removing "../lib/rdc" path.
This change adds it back and makes the paths more visible.

- Link librdc_ras and librdc_rocp to rdctst
- Add longer RUNPATH for rdctst to link rdc libraries

Change-Id: Id4f128c217a6de8bb67df6750ecafdb96545811b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: fc097d44ff]
2023-01-11 19:40:59 -05:00
Galantsev, Dmitrii ece8443715 SYSTEMD: Remove set -e to allow failure
'set -e' flag doesn't allow to fall back to 'return 0' when a command
fails. As installation/removal should not fail in most cases - 'set -e'
flag has been removed.

Below page recommends to be careful with 'set -e' usage:
https://www.debian.org/doc/debian-policy/ch-opersys.html#writing-the-scripts

Change-Id: If4439aaff66747bceabd3beb7e00ae12ce950e43
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 2787767bf0]
2023-01-09 13:18:47 -06:00
Galantsev, Dmitrii 9cdf52b0b7 Fix rdcd crash on rocmtools fields read
- Solve issue that resulted in rdcd crash when reading registers 700-799
  by setting ROCMTOOLS_METRICS_PATH in rdc.service

README changes:
- Change default install path for gRPC
- Simplify install instructions
- Make more commands copy-pasteable
- Replace /opt/rocm-<version> with /opt/rocm
- Misc fixes

Change-Id: I39a2896ed2af5a3889f4b36cd8bcc8d3e9593585
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6e0c5d1d56]
2023-01-06 16:39:17 -06:00
Galantsev, Dmitrii 5c803f6b03 SWDEV-352414 - Fix gRPC linker issues
- Replace gRPC library with gRPC package
- Relax RUNPATH
- Make LINKER_FLAGS global

gRPC package includes its dependencies:
SSL, UPB, ABSL, and etc.

Change-Id: Ieb198ad96e26e89b09cb85986214a5b1451b17a6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 3e4c55ec6c]
2023-01-04 18:50:07 -06:00
Galantsev, Dmitrii eccb4e202c Add rocmtools support
This commit adds integration with ROCmTools

Additional changes:
- Fix DEB and RPM installation issue when systemd is not present
- Fix typos in rdc.h
- Wrap negative values in parentheses in rdc.h
- CMAKE: Improve rocm_smi searching
- README: Improve formatting, add info about ROCmTools

Metrics added: 700-714
Metrics can be listed with `rdci dmon --list-all`
Majority of the metrics are only supported by Instict (MI) series GPUs
700 RDC_FI_PROF_ELAPSED_CYCLES should be available on most devices
See README for more information

Change-Id: I907d3eacdc92fc5588ca6c76c2fa1ce0ad900770
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 861a843ed7]
2022-12-16 12:19:59 -06:00
Bill(Shuzhou) Liu 001461e975 gcc error when const is used
Change to #define to be compatible with gcc.

Change-Id: Id2a2b3dbeaaf7aea9b0e2075320c30f8bad50fc7


[ROCm/rdc commit: 73d8b35610]
2022-12-05 16:27:59 -06:00
Ranjith Ramakrishnan 2c614f6beb SWDEV-366823 - Change pragma message to warning
File reorganization feature was implemented with backward compatibility
The backward compatibility support will be deprecated in future release.
Changed the #pragma message to #warning for a smooth transition

Change-Id: Ib616bf4b7358a0607832a8af423c75e0bf2ab72d


[ROCm/rdc commit: 9c4ce805cf]
2022-11-21 00:59:02 -08:00
Galantsev, Dmitrii 2b89ab397c Improve CMake and relocate tests
- Respect CMAKE_INSTALL_PREFIX and ignore RDC_CLIENT_INSTALL_PREFIX
- Move example and rdctst from rocm/bin to rocm/share/rdc
- Add README for examples

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I0b1d996d206327fd1b51ac6e82d548829bdb1570


[ROCm/rdc commit: f6efd7fbf6]
2022-10-27 13:49:54 -05:00
Galantsev, Dmitrii 9ff80828e5 Compile rdctst and improve CMakeLists
Main CMake improvements:

* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm

Misc improvements:

* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b


[ROCm/rdc commit: 2c171767b3]
2022-10-07 13:58:50 -05:00
Ranjith Ramakrishnan 5d08745c53 SWDEV-345870 - Use actual header files and libraries of rocm smi rather than using wrapper header files and library softlinks
Rearranged the project name and including of GnuInstallDirs, so that GNU variables can be used

Change-Id: I85d8e69e0faf4f9f634295c3064b9f4f64f8e9b8


[ROCm/rdc commit: 3214c21e6e]
2022-09-22 13:37:58 -04:00
Galantsev, Dmitrii 0ae88bc221 Remove __pycache__ before uninstall
`__pycache__` might be created when a python script is ran. Which
prevents `rpm -e` and `dpkg --remove` from completely removing the
application. This patch removes `__pycache__` early in the uninstall
process.

A similar issue is resolved in:
rocm_smi_lib Change-Id: I695bd085d4a43b678b563b4c35f6d2e8ddfa7d7c

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I9fe0cd61570c2bd83cf9a45c95837ee6ad11e84b


[ROCm/rdc commit: 69a5a1d6bc]
2022-08-15 17:23:19 -04:00
Ranjith Ramakrishnan 92e6ab4bb8 SWDEV-350674 - Added backward compatibility for binary files and rdc.service
With file reorganization changes binaries are moved to /opt/rocm-ver/bin.
Similarly rdc.service moved to /opt/rocm-ver/libexec/rdc
Test suites still used old paths
Once test suites changes are made, backward compatibility for binaries and rdc.service can be removed
Corrcted binary path in rdc.service.in
Corrected GRPC runpath

Change-Id: I306924d81cedc19586305a79d51eea8af6e70e83


[ROCm/rdc commit: c3ea96dd71]
2022-08-09 17:45:22 -07:00
Ranjith Ramakrishnan 1e80f183d8 Use function to find all libraries with absl prefix rather than hard coding the library names
Some absl libraries are missing in the list and results in error while loading rdci binary
Use cmake logic to find all the absl libraries required instead of hard coding them.
Similar implemenation was done  in rdc server side

Change-Id: Ia04bcd137f892c2577a4a458b92ca212d42aef80


[ROCm/rdc commit: ba9633b74b]
2022-08-05 14:06:25 -07:00
Ranjith Ramakrishnan 3df8b88ca6 File reorganization with backward compatibility
SWDEV-291455 -  Binary , header files and libraries installed in bin,include and lib folder under /opt/rocm-ver
Prebuilt ras library with updated search path
cmake config files in lib/cmake/rdc
grpc,sp3,hsaco and private libraries installed in lib/rdc
config  installed in share/rdc
authentication and python_binding installed in libexec/rdc
Backward compatibility added for header files and libraries

Depends-On: I3f3d192935923f71737b3fe55ded536654a73dd7
Change-Id: Ia1a6cadc59034b155631a1ee5fdbe692d2a8a71b


[ROCm/rdc commit: 52a3463147]
2022-08-04 23:42:42 -07:00
Ranjith Ramakrishnan d8f996d16d SWDEV-321112 - Use GNUInstallDirs
Use GNUInstallDirs variables to determine the location of DOCDIR

Change-Id: I12b016bc1fe66afd92478c3a940093a4b35de0dc


[ROCm/rdc commit: 23ac479cbd]
2022-06-08 09:05:17 -07:00
Bill(Shuzhou) Liu e16a8bcaf5 Identify GPUs using PCI device identifier in RDC Prometheus plugin
Add a new option --enable_pci_id to Prometheus plugin, which will map
the GPU index to the PCI Device Identifier.

Change-Id: I38a2a7e4841975da095391002397d4515ffb8e0d


[ROCm/rdc commit: 23ab2c0671]
2022-05-05 09:16:05 -04:00
Bill(Shuzhou) Liu c6a69f8e59 Update RDC document
Update README.md to refer to document portal.

Change-Id: I427122751fec5a27936b345a3ac76c96478be164


[ROCm/rdc commit: 2cd7f66154]
2022-04-27 14:38:48 -04:00
Bill(Shuzhou) Liu ec564c1d2c Add run path dependency on grpc libraries
Add run path dependency for grpc libabsl_*.so required by RHEL.

Change-Id: Ie033cc25019e0cb46a895e8c3e583a0d22ab4561


[ROCm/rdc commit: c4dab3b2bd]
2022-03-28 09:04:17 -04:00
Bill(Shuzhou) Liu b50c3b485b Add user rdc to render or video group
Add user rdc to render or video group to access KFD for diagnostic.

Change-Id: Ie9b4ea65402319a2aae255063f8c79e56979a47f


[ROCm/rdc commit: d54c5715f0]
2022-03-23 13:53:42 -04:00
Bill(Shuzhou) Liu 843ba8f3b8 Install License file under /opt/rocm
Change-Id: Ifbb7edddfd5eda173039399c47f2d20f813f1710


[ROCm/rdc commit: 346eb5981d]
2022-03-15 13:18:58 -04:00
Bill(Shuzhou) Liu 234ef250e2 Upgrade GoogleTest to v1.11.0
The old GoogleTest has compile errors on Centos 9. Upgrade it
to latest version.

Change-Id: Ifc95c68ddf2321509b90e20af11c8d468a63f431


[ROCm/rdc commit: c465d29d8c]
2022-03-14 10:23:06 -04:00
Bill(Shuzhou) Liu a408ccb983 Enable the support to grpc v1.44.0
grpc v1.44.0 needs to link to library absl_synchronization. The
CMakeLists.txt is changed to link to that library if available.

Change-Id: I92f7247473a70e7a83416b9744e788e45d104565


[ROCm/rdc commit: 2a46ee2ab2]
2022-03-03 16:11:09 -05:00
Saravanan Solaiyappan e99c8cb29e Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
in package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ide29b84e59eea6154276f790e353a25506dd3bdb


[ROCm/rdc commit: 6a345a064e]
2022-02-24 13:31:03 -05:00
Bill(Shuzhou) Liu 5e7c19624b Install libprotobuf under lib64 folder
When compile grpc on SLES, the libprotobuf is created under lib64
folder, install it to lib folder as well.

Change-Id: I9ccf2133c3b1b71e623d9009a86cf580a19e76cf


[ROCm/rdc commit: ffc5db221b]
2022-02-04 09:02:31 -05:00
Bill(Shuzhou) Liu a3c3283aa6 Add rpm License header
Add rpm License header for cpack

Change-Id: I3e8d05abe69749abe6ce28751e7da9bb229aa08d


[ROCm/rdc commit: 7eeb7f9388]
2022-01-20 13:33:08 -05:00
Bill(Shuzhou) Liu 38aaf423ca Add license file to rdc package
Install LICENSE.txt to share/doc/rdc

Change-Id: Ife9872aa745cb6fcf79976bf6453098a6594572a


[ROCm/rdc commit: 0273dd6b9e]
2022-01-18 10:50:31 -05:00
Bill(Shuzhou) Liu fa3a258bb6 Add MI200 kernel files for RDC diagnostic
Add the kernel files compiled for MI200.

Change-Id: Ib61795809c14457e332a77d7182992f245ff5b31


[ROCm/rdc commit: 179bd293ef]
2022-01-11 09:28:30 -05:00
Bill(Shuzhou) Liu 8c772e1b90 Fix the compile error for gcc-11
Fix the error: 'sleep_for' is not a member of 'std::this_thread'

Change-Id: If25ef03023df17081878f9b44c3a68195f07c653


[ROCm/rdc commit: adfa89631d]
2021-10-26 15:36:52 -04:00
Bill(Shuzhou) Liu 6b700f8005 Support GPU memory test and compute queue test using Rocr
A new diagnostic module librdc_rocr.so is created. The
module uses Rocr to test the memory allocation, memory access
and compute queue ready status.

Change-Id: I9098f4fc3209bf381b7cb3658a4e94c2e22f2fe9


[ROCm/rdc commit: 78e2f2486b]
2021-10-21 11:01:12 -04:00
Bill(Shuzhou) Liu ed96db8cba Correct the install path of grpc
Correct the grpc install path which miss $.

Change-Id: I17736a81ee24d2abc680a3646b1536efafcb3d69


[ROCm/rdc commit: 6ab71e1a4a]
2021-10-19 17:11:35 -04:00
Bill(Shuzhou) Liu 57f1f72eb6 Add cmake target for RDC
RDC will provide cmake files exporting the INCLUDE/LIBRARY targets.

Change-Id: I8e8aeff426c45eae823d988f6473424ccf29687c


[ROCm/rdc commit: a640e5c821]
2021-09-28 13:53:44 -04:00
Icarus Sparry 651c149772 Add owner write to rcdi permissions
Needed to work around a debian packaging bug if debug information is
being produced in a separate package.

Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Change-Id: Ieab3cc3515eeeb952159acea3dc1effd14613eeb


[ROCm/rdc commit: 2a1a002f74]
2021-09-22 18:11:50 +00:00
Bill(Shuzhou) Liu ab8edea8d1 Make RDC to respect the LD_LIBRARY_PATH pass by the cmake
The RDC override the LD_LIBRARY_PATH to force to use the current grpc
path. The change will also add original LD_LIBRARY_PATH to it.

Change-Id: I48da84c3135c6ede129c3cb9148dbb1896b652c3


[ROCm/rdc commit: bd034263d4]
2021-09-17 15:52:30 -04:00
Bill(Shuzhou) Liu 84eca4cf9e Add -g compiler option for ADDRESS_SANITIZER
Add -g compiler option for Address Sanitizer

Change-Id: I5c4a72dd06a7242715c537fc0d44770b126862d2


[ROCm/rdc commit: 6f95200387]
2021-08-03 13:52:21 -04:00
Icarus Sparry 506a3072e9 Add dependency on rocm-core
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Change-Id: I5783b116b098bc8ebad62a4fad407a29c80f19af
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>


[ROCm/rdc commit: 13c550d861]
2021-07-27 08:43:48 -04:00
Bill(Shuzhou) Liu fa9c6ad6f8 Add the RdcSmiDiagnostic module
Provides a RdcSmiDiagnostic module, which will call rocm_smi_lib.

It will support following diagnostics: Get GPU Topology, Check GPU
parameters and check processes running on the GPUs.

The grpc client and server side diagnostics function is added.

The diag module is added to the rdci.

Change-Id: I10a0cf3c20556a61373ab686f82cae75acaa40dd


[ROCm/rdc commit: 76ccf58008]
2021-07-26 14:56:17 -04:00
Bill(Shuzhou) Liu ac15d50b0c rdcd process uses 100% CPU
The rdcd uses another thread to listen the GPU events. That thread
runs in a tight loop which consume 100% CPU.
The fix will add a sleep to yield CPU.

Bug: SWDEV-291576
Change-Id: I7996720aab4a80346d79b1c73ee532d2abcd93cc


[ROCm/rdc commit: 5a4bf97327]
2021-06-18 13:49:45 -04:00
Bill(Shuzhou) Liu 7d7a5bfd1c Disable bulk fetch. Add environment variable to enable it
RDC can optimize by bulk fetching multiple metrics using a single
rocm_smi call. However, currently this is not completely supported in
all ASIC generations. By default disable this for now.

Set environment variable RDC_BULK_FETCH_ENABLED=TRUE to enable
RDC bulk fetch.

BUG: SWDEV-289316

Change-Id: Ibb55514f198356dccf5f47bb0fd2d53c17acb251


[ROCm/rdc commit: 673f5a4ee1]
2021-06-09 15:53:17 -04:00