Γράφημα Υποβολών

32 Υποβολές

Συγγραφέας SHA1 Μήνυμα Ημερομηνία
Chen Gong a8086b484d rocprofiler: add valu utilization
SWDEV-475242

For the description of "FP32 Engine Activity" and "FP64 Engine Activity" in dcgm,
It seems that we do not have an equivalent to these pipe-utilizations on our hardware.

In rocprofiler, I think VALU Utilization is the closest to what we want.

Change-Id: Ibce8835ef4757084cdfd73258de6fc1606ca0158
Signed-off-by: Chen Gong <curry.gong@amd.com>


[ROCm/rdc commit: 251fcbe49d]
2024-11-21 15:24:01 +08:00
Galantsev, Dmitrii 8e657c165c RVS - Fix cookie_t -> rdc_diag_callback_t types issue
Issue introduced in ae9030ab1a

Change-Id: I2b6a8024d45fc44d92cf2770be9887dfc0fb3ede
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e1b57c43f3]
2024-11-12 10:36:52 -06:00
Galantsev, Dmitrii ae9030ab1a RVS - Report test progress in realtime
Change-Id: Id9fea71f242f372f408ecd777c030465b7ef9989
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 37ddd5bf50]
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii 73c79fcd83 Finish basic logging impl
Change-Id: Ia3d6ac80f4832f1bfb63573c543659abd5f84341
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 9c77312c51]
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii b0035605ee CMAKE - Find modules at build time
Change-Id: I9370ef1433579aff1a37f3636050f525638d8658
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: cdf1588974]
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii 39687e8d96 CMAKE - Fix RVS include
Change-Id: I65095cc3d04fc2a5daeee5c809f635cb1662822f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Revert "Disable RVS as the error scares people"

This reverts commit f3450f61bf.

Change-Id: I5086c25772444aa3bfc4c10abc1ea58d3f3f1f27
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: dd50027748]
2024-11-07 11:18:41 -06:00
Galantsev, Dmitrii 793b2de0cb Profiler - Modify metrics
Remove occupancy metrics and replace with OccupancyPercent

Add OCCUPANCY_PERCENT which uses OccupancyPercent
Add GR_ENGINE_ACTIVE which uses GPU_UTIL/100
Add TENSOR_ACTIVE_PERCENT which uses MfmaUtil
Modify FLOPS_64 to use FP64_ACTIVE

Change-Id: I5f30d77a0c80f5ac78abd1a9e57f8a0a3c6cc00b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 28acbf0436]
2024-10-15 19:00:30 -05:00
Galantsev, Dmitrii 999cae5e2c SWDEV-466829 - Disable ROCP when in GTest
Change-Id: I3b218fe256717c1dc9187d5f17476dfc990656c2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: c40a6308c5]
2024-09-26 17:00:05 -05:00
Bill(Shuzhou) Liu 6372df9447 Update the hsaco for diagonstic on MI300X
Add hsaco for gfx940, gfx941 and gfx942

Change-Id: Ibd55fcc2d036d1190357e1e86d4e170568426d94


[ROCm/rdc commit: 9800528c19]
2024-09-17 14:15:35 -05:00
Galantsev, Dmitrii b50c64b868 Use correct rocprofiler metrics
Change-Id: I26603de7425abb6588f770ed68c22e14d6d20d56
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: d4bb33d100]
2024-06-11 11:15:18 -05:00
Galantsev, Dmitrii 73948f95e2 Rewrite rocprofiler plugin
Change-Id: Ic7dd967cc60cacd2b16a465180505ea2a342fccf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 3514225b83]
2024-06-11 03:11:15 -05:00
Galantsev, Dmitrii 29b86095ed Fix rocprofiler plugin
- Replace non-working fields with working ones
    - remove CU_OCCUPANCY completely as it isn't well supported
- Fix rocprofiler initialization with shared_ptr and rdc_module_init
- Replace env var ROCPROFILER_METRICS_PATH with ROCP_METRICS
    - ROCPROFILER_METRICS_PATH is only relevant for rocprofv2
    - ROCP_METRICS is only relevant for rocprofv1 (which we are using)

Change-Id: I21e6fa3f0e1694c38f44ca0e5659d672559f7380
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 20ca2ce574]
2024-06-06 01:51:39 -05:00
Galantsev, Dmitrii c2a75bbe4c Finalize the rocprofiler fields
Change-Id: I4ed1c4309f21bdcc7281d911663036caf5947182
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 07c414af5e]
2024-06-04 19:49:06 -05:00
Galantsev, Dmitrii f73e123900 Add GPU indexing and fix check for fields in rocprof
- Fix RUNPATH for tests

Change-Id: I79517592b49d27080a010a2e41e5878adf24a157
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e11afbf60f]
2024-06-04 12:56:22 -05:00
Galantsev, Dmitrii a80dfd4f00 Add memory bandwidth metrics
Change-Id: I310ca8af0536497be619d2bda1e540d1f11c2565
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 53033a5b77]
2024-05-17 14:55:01 -05:00
Galantsev, Dmitrii cff2ac8490 Add rocprofiler_example.cc and fix logging
Change-Id: Ib3ed8754f314edc76ea56bfec9a645d720f8926d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: c7fcb1ad25]
2024-05-17 14:55:01 -05:00
Galantsev, Dmitrii 83cf97e280 Profiler - Add all required metrics
Change-Id: Iea3938df9407789c061c3a6ead9167a69069d6e6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: c3a4c899d5]
2024-05-09 23:24:02 -05:00
Galantsev, Dmitrii 8b317a6490 Add rocprofiler plugin
Rename ROCR -> Runtime and ROCP -> Profiler

Change-Id: If90953da8fa5d695b681813dad4a3e7ec26a9c7e
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 234b2d835b]
2024-05-07 04:39:39 -05:00
Galantsev, Dmitrii 028355dff0 SWDEV-439576 - rocmsmi -> amdsmi
- Migrate to amdsmi library
- NOTE: raslib still uses rocmsmi
- Remove unused rocmsmi service
- Remove unused RDC client code
- Remove RSMI calls from protos/rdc.proto

Change-Id: Ifc34a264c506b0ec5792307ee56b34526268762d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 9702d0f2d7]
2024-04-09 20:19:28 -05:00
Galantsev, Dmitrii c314326da0 Revert "Sort the ROCr gpu index based on BDF"
Fix 'rdcd diag' compute and system tests.
This reverts commit 4acaddc32d.

Change-Id: Ia092c46649c1d6338fb96ffe7e6feba4b045f027


[ROCm/rdc commit: 662cc0f8b2]
2024-04-09 10:27:19 -05:00
Galantsev, Dmitrii 53ecc0fc81 Remove -X from .hsaco files
Change-Id: I1f1b4f07eb854ce2e254564b83719be52b553b02
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 9d55c26247]
2024-03-27 20:35:08 -05:00
Galantsev, Dmitrii dd257bfcac CMAKE - Find hsa-runtime64
Change-Id: Id877eb9cfcc61d81993a6a43703ef2e5f72e1e8f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6d5d9971c2]
2024-02-19 23:49:38 -05:00
Galantsev, Dmitrii 703d6c0d44 Use templates for module population
Also add stddef.h workaround for old GCC.
RHEL-8 still uses GCC 8.5 and templates are not well supported.

Change-Id: Ia4dae23892ec63682ea848c46ba81de85cf6d209
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: f9e80cc37a]
2024-01-10 00:27:09 -06:00
Galantsev, Dmitrii 38c60ff90b RVS: Finish initial RVS integration
NOTE: RVS Build is disabled by default due to CI build issues.

Change-Id: I1593f0fe22075a9f86f54afa3ac151e109f1f7bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: eaa1862a80]
2024-01-10 00:27:04 -06:00
Galantsev, Dmitrii ea624cbb7c LINT: Add cpplint, clang-format and pre-commit support
Change-Id: I3cbb787ef27d90486b212dfb1a8c77c460acc2ac
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 434e40305d]
2024-01-09 11:37:11 -06:00
Bill(Shuzhou) Liu 4acaddc32d Sort the ROCr gpu index based on BDF
The rocm-smi index is changed to sort based on BDF. The rocr plugin
is also changed based on that.

Change-Id: I5851431db336d50266b253dec1894a7bd9f3554b


[ROCm/rdc commit: 61a2773875]
2023-11-16 09:07:22 -05:00
Galantsev, Dmitrii c1a76d532a SWDEV-380364 - Resolve dmon + rocmtools halt
* Move hsa_init out of rocmtools and into RDC
* Remove secondary hsa_shut_down from ROCR module

Change-Id: I57d84d41ddc51595b98e734265f10bc5129a7352
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I2b389ee1a9ba3507b2df1fc2fe83598f67731aac


[ROCm/rdc commit: 24b3f138e9]
2023-02-02 18:33:14 -06:00
Galantsev, Dmitrii eccb4e202c Add rocmtools support
This commit adds integration with ROCmTools

Additional changes:
- Fix DEB and RPM installation issue when systemd is not present
- Fix typos in rdc.h
- Wrap negative values in parentheses in rdc.h
- CMAKE: Improve rocm_smi searching
- README: Improve formatting, add info about ROCmTools

Metrics added: 700-714
Metrics can be listed with `rdci dmon --list-all`
Majority of the metrics are only supported by Instict (MI) series GPUs
700 RDC_FI_PROF_ELAPSED_CYCLES should be available on most devices
See README for more information

Change-Id: I907d3eacdc92fc5588ca6c76c2fa1ce0ad900770
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 861a843ed7]
2022-12-16 12:19:59 -06:00
Galantsev, Dmitrii 9ff80828e5 Compile rdctst and improve CMakeLists
Main CMake improvements:

* Add rdctst with -DBUILD_TESTS=ON
* Set default ROCM_DIR to /opt/rocm/
* Split rdc_libs/CMakeLists.txt into subdirectories
* Package tests into rdc-tests.deb and .rpm

Misc improvements:

* Add .editorconfig to normalize code formatting
* Add .gitignore
* Expand RPATH for gRPC to reduce LD_LIBRARY_PATH usage
* Export compile_commands.json
* Show warning and do not install gRPC if GRPC_ROOT is left as default
* Move .in files into relevant subdirectories
* Move most variables into project CMakeLists.txt to avoid redefinitions
* Normalize CMakeLists.txt formatting (4 spaces indentation)
* Rename DIAGNOSTIC_LIB to RDC_ROCR_LIB
* Update gRPC version in README to 1.44.0
* Remove gtest source
* Pull gtest from github if not installed

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: I1039ef61247e3f0ff822925cc869fb0c2bf3af85
Change-Id: I879b21428e6642f19fda67092b365d8b78b7ba7b


[ROCm/rdc commit: 2c171767b3]
2022-10-07 13:58:50 -05:00
Ranjith Ramakrishnan 3df8b88ca6 File reorganization with backward compatibility
SWDEV-291455 -  Binary , header files and libraries installed in bin,include and lib folder under /opt/rocm-ver
Prebuilt ras library with updated search path
cmake config files in lib/cmake/rdc
grpc,sp3,hsaco and private libraries installed in lib/rdc
config  installed in share/rdc
authentication and python_binding installed in libexec/rdc
Backward compatibility added for header files and libraries

Depends-On: I3f3d192935923f71737b3fe55ded536654a73dd7
Change-Id: Ia1a6cadc59034b155631a1ee5fdbe692d2a8a71b


[ROCm/rdc commit: 52a3463147]
2022-08-04 23:42:42 -07:00
Bill(Shuzhou) Liu fa3a258bb6 Add MI200 kernel files for RDC diagnostic
Add the kernel files compiled for MI200.

Change-Id: Ib61795809c14457e332a77d7182992f245ff5b31


[ROCm/rdc commit: 179bd293ef]
2022-01-11 09:28:30 -05:00
Bill(Shuzhou) Liu 6b700f8005 Support GPU memory test and compute queue test using Rocr
A new diagnostic module librdc_rocr.so is created. The
module uses Rocr to test the memory allocation, memory access
and compute queue ready status.

Change-Id: I9098f4fc3209bf381b7cb3658a4e94c2e22f2fe9


[ROCm/rdc commit: 78e2f2486b]
2021-10-21 11:01:12 -04:00