175 Tiomáintí

Údar SHA1 Teachtaireacht Dáta
Li Ma 709b621c48 Modify the error log for MM_ENC_UTIL
Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I83805fc8ad7003ecd5189c8f940b44edbf0ebd1f


[ROCm/rdc commit: 26ea06bb69]
2025-03-04 08:42:22 -06:00
Arif, Maisam c26abbbe9a Fixed RDC to work with updated amdsmi_get_power_info() (#115)
Change-Id: Ic9e7a68ae58f61dbe73fc7d1b17af34152933e71

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/rdc commit: 552f15a1fb]
2025-02-11 00:51:29 -06:00
Pryor, Adam c00a9a709d SWDEV-512736 Fix RDC Policy callback printout (#114)
Change-Id: I6e018dcb0a6b272812c959649d913e3ba33def40

[ROCm/rdc commit: 93a8ab8915]
2025-02-10 08:40:03 -06:00
Pryor, Adam c5560793e8 SWDEV-500382 fix energy consumed (#105)
Change-Id: I3f180f34abed763db1287bf01581753534f32828

Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/rdc commit: af56e460c4]
2025-01-30 09:38:00 -06:00
Pryor, Adam 0186fc2481 SWDEV-508477 Eval Flops Percent (#85)
SWDEV-508477 - Profiler add FP*_PERCENT

Change-Id: Idb6250fe6b7ba3df6fe7d30861e0fbbda7e9bdce

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

[ROCm/rdc commit: 6f358ddc9e]
2025-01-24 10:07:32 -06:00
Galantsev, Dmitrii 3218c2af5c CMAKE - Rename SMI_*_DIR into AMD_SMI_*_DIR
Change-Id: I3b8b852e6b68f1448c8ed5d5e6ea4579c470ff53
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e033fd4c55]
2025-01-23 20:56:00 -06:00
Ma, Li 25853f01dc Fix Memory Current Bandwidth (#98)
Adjust the calculation order to ensure accuracy.


Change-Id: Ica10769fa3dba10c67428d09ffd454fc09ed0da8

Signed-off-by: Li Ma <li.ma@amd.com>

[ROCm/rdc commit: 9dce427c69]
2025-01-24 10:22:08 +08:00
adapryor c57e200bdc SWDEV-500382 fix energy consumed
Change-Id: I3f180f34abed763db1287bf01581753534f32828


[ROCm/rdc commit: e8057b1042]
2025-01-21 21:49:33 -06:00
adapryor 8286a92fc1 Implementation for RDC_FI_PROF_OCCUPANCY_PER_ACTIVE_CU SWDEV-50895
Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I8da7d9846edabe5629c75f50cd2bb4b23e019a17
Signed-off-by: adapryor <Adam.pryor@amd.com>


[ROCm/rdc commit: 290b90dc89]
2025-01-21 21:49:19 -06:00
Pryor, Adam 9f1f502d93 SWDEV-510089 Fix rocprof segfaulting on ctrl+c (#94)
Change-Id: Iaa0f3856bb8fed174cbc935b85739414ecd44758

Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/rdc commit: 0ae4404a09]
2025-01-21 10:30:31 -06:00
limeng12 4f3b114740 [SWDEV-230863] Improve the functionality of RdcSmiHealth module.
Memory check:get the threshold of retired page number
EEPROM check:read and verify the checksum
Power/Thermal check: power/thermal throttle status counter

Signed-off-by: Meng Li <li.meng@amd.com>
Change-Id: Id2c751416eb5bf007e6e1da8dc05966a6ba1324e


[ROCm/rdc commit: 016a1d9d39]
2025-01-14 08:14:36 +08:00
Galantsev, Dmitrii b78295c8f8 RVS - Add IET and PEBB tests
Change-Id: Ia032901d74c882e5cbfa5a3164199cd4d571341f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 5861ec7663]
2025-01-08 18:23:13 -06:00
Galantsev, Dmitrii 9d32387925 RVS - Add memory bandwidth test
Change-Id: I4c8990170861f6a0f3853615db68634fdaa7a622
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: b058cbecf1]
2025-01-08 18:23:13 -06:00
Pryor, Adam 20f3ba845c Implementation for adding pcie_total (#40)
* Implementation for adding pcie_total

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I4b0cfd7095e9d984e939283ee7169d01f55a1847
Signed-off-by: adapryor <Adam.pryor@amd.com>

* Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I021f29083de651cab9fbe7db98acbe20f65948d4

* Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I42f3207b745fa787dabe30a85c8e063159d1337d

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/rdc commit: 60b7359161]
2024-12-26 18:36:41 -06:00
Ma, Li 0e5cf815d8 SWDEV-475244 - Memory Usage and Bandwidth: max mem and current mem (#48)
[ROCm/rdc commit: 772481f952]
2024-12-23 10:22:53 +08:00
stali 52bb0d6466 Enable RDC link Status feature
1.add link status APIs
   2.Add link status example for link status API usage


[ROCm/rdc commit: 29b6699b62]
2024-12-23 09:30:21 +08:00
Greg Scaffidi 725599b51c Add RDC_FI_PROF_SM_ACTIVE metric.
Signed-off-by: Greg Scaffidi <salvatore.scaffidi@amd.com>
Change-Id: I63aaf5eb05d74ba696ace2b088e17c2cfb1bd74b
Signed-off-by: adapryor <Adam.pryor@amd.com>


[ROCm/rdc commit: f4de4b0529]
2024-12-21 15:21:46 -06:00
Adam Pryor 1c26bf4304 Implementation for SWDEV-479728:[RDC] - Clock Speed/Power Cap Control
Change-Id: I767a71325527aa3c691e9607953ceafebacfb4d5
Signed-off-by: adapryor <Adam.pryor@amd.com>


[ROCm/rdc commit: df170c8801]
2024-12-20 16:03:33 -06:00
Galantsev, Dmitrii 755ae0ee5d Profiler - Migrate from rocprofv1 to rocprofv3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Fixed RDC for Rocprofv3

Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Ic9162bacf1322b265e6bbcdd9fbb9b1fdef414fd

last updates

Change-Id: I12e168501327c5e4cff8a9273b0512fb0e098fe7

comment

Change-Id: I61da61e66dcc017ec46f98ff4c90fb064c9679e8


[ROCm/rdc commit: 7c91a07a43]
2024-12-20 15:39:02 -06:00
adapryor c42d0232f4 Fix for SWDEV-500637
Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Id42a2da321bdba74dfc8e16d7dc04d05cef4e34a


[ROCm/rdc commit: e1e7f59269]
2024-12-18 11:10:41 -06:00
stali 1e45293968 Enable RDC topology feature
1.Add topology APIs
2.Add topology example for topology API usage

Change-Id: Ib79c06d0bac85119672f194ba685ebf25029979c


[ROCm/rdc commit: 8bcb5f7068]
2024-12-16 10:02:41 +08:00
Li Ma 772c1c0a0d SWDEV-475244 - Memory Usage and Bandwidth: max mem and current mem
Implemented max memory bandwith and current memory bandwidth. Added two
new field ids: RDC_FI_GPU_MEMORY_MAX_BANDWIDTH, RDC_FI_GPU_MEMORY_CUR_BANDWIDTH

Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I453e49937a84777146575f4f5bdd69fd4fe53bfc


[ROCm/rdc commit: 30f9b2ac2f]
2024-12-16 09:43:20 +08:00
Galantsev, Dmitrii d9b13912c6 Profiler - Remove averaging
Averaging happens very slowly and only confuses people...

Change-Id: I60754d3b896b6ffeb6104bb1c2fcc54e9869b331
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 2c61dfe2ce]
2024-12-11 11:58:50 -06:00
Galantsev, Dmitrii fc83179a9d Profiler - Fix fp64 metric
Change-Id: Iab27e21740c2c51143a9e88d085b80716bf193e2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 2605eda5f3]
2024-12-11 11:27:41 -06:00
zichguan-amd a5227aa61b Make ROCM_DIR default ROCm path for rocprof
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>


[ROCm/rdc commit: c042b4f582]
2024-12-02 11:19:56 -05:00
Chen Gong a8086b484d rocprofiler: add valu utilization
SWDEV-475242

For the description of "FP32 Engine Activity" and "FP64 Engine Activity" in dcgm,
It seems that we do not have an equivalent to these pipe-utilizations on our hardware.

In rocprofiler, I think VALU Utilization is the closest to what we want.

Change-Id: Ibce8835ef4757084cdfd73258de6fc1606ca0158
Signed-off-by: Chen Gong <curry.gong@amd.com>


[ROCm/rdc commit: 251fcbe49d]
2024-11-21 15:24:01 +08:00
limeng12 71e2727a8f Backgroud health check
Add the RdcSmiHealth module, which will call rocm_smi_lib.
It will support following health:
 - XGMI error detected
 - PCIE replay count detected
 - Memory check
 - InfoROM check
 - Power/Thermal check
The grpc client and server side health function is added.
The health module is added to the rdci.

At present, XGMI/PCIE and a part of Memory have been implemented.
Others will be added as soon as possible.

Change-Id: I1bd99290bdc7dea733f21a41a8c4bcefb2138112


[ROCm/rdc commit: 853d3b0cc5]
2024-11-19 14:00:49 +08:00
Galantsev, Dmitrii 8e657c165c RVS - Fix cookie_t -> rdc_diag_callback_t types issue
Issue introduced in ae9030ab1a

Change-Id: I2b6a8024d45fc44d92cf2770be9887dfc0fb3ede
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e1b57c43f3]
2024-11-12 10:36:52 -06:00
Galantsev, Dmitrii ae9030ab1a RVS - Report test progress in realtime
Change-Id: Id9fea71f242f372f408ecd777c030465b7ef9989
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 37ddd5bf50]
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii 73c79fcd83 Finish basic logging impl
Change-Id: Ia3d6ac80f4832f1bfb63573c543659abd5f84341
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 9c77312c51]
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii b0035605ee CMAKE - Find modules at build time
Change-Id: I9370ef1433579aff1a37f3636050f525638d8658
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: cdf1588974]
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii 39687e8d96 CMAKE - Fix RVS include
Change-Id: I65095cc3d04fc2a5daeee5c809f635cb1662822f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Revert "Disable RVS as the error scares people"

This reverts commit f3450f61bf.

Change-Id: I5086c25772444aa3bfc4c10abc1ea58d3f3f1f27
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: dd50027748]
2024-11-07 11:18:41 -06:00
Chao Fei d489245fbe Enable RDC policy feature
1. Add policy APIs
2. Add policy example for policy API usage

Change-Id: I14deb7c809d0b865b7bb083842092fc37868025e
Signed-off-by: Chao Fei <Chao.Fei@amd.com>


[ROCm/rdc commit: 345ac64a43]
2024-10-23 20:37:27 -04:00
Li Ma 7e3c4b9a21 SWDEV-475244 - Memory Usage and Bandwidth: memory activity
Implemented memory activity and added a new fied id
RDC_FI_GPU_MEMORY_ACTIVITY.

Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I11abe356ef6b01ce4917fd19dcc128efbc535f39


[ROCm/rdc commit: 4bd31b605a]
2024-10-22 11:11:31 +08:00
Li Ma 09c718954c SWDEV-475255 - MM Engine Decoding Throughput
Implemented DEC activity for now due to ENC activity is unavailable in
amdsmi.

Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I34bb56e6e0d8d2ab91243f8932f0ac10cb2d1e9f


[ROCm/rdc commit: b17abf93fa]
2024-10-18 10:01:41 +08:00
Galantsev, Dmitrii 793b2de0cb Profiler - Modify metrics
Remove occupancy metrics and replace with OccupancyPercent

Add OCCUPANCY_PERCENT which uses OccupancyPercent
Add GR_ENGINE_ACTIVE which uses GPU_UTIL/100
Add TENSOR_ACTIVE_PERCENT which uses MfmaUtil
Modify FLOPS_64 to use FP64_ACTIVE

Change-Id: I5f30d77a0c80f5ac78abd1a9e57f8a0a3c6cc00b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 28acbf0436]
2024-10-15 19:00:30 -05:00
adapryor c283aebd1c Add XGMI read/write sum metrics
Change-Id: I898b779ea7f5336edf0d047fb1e5d3ec40085baa


[ROCm/rdc commit: e20bc58b1c]
2024-10-09 17:02:55 -05:00
Galantsev, Dmitrii 999cae5e2c SWDEV-466829 - Disable ROCP when in GTest
Change-Id: I3b218fe256717c1dc9187d5f17476dfc990656c2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: c40a6308c5]
2024-09-26 17:00:05 -05:00
Bill(Shuzhou) Liu 6372df9447 Update the hsaco for diagonstic on MI300X
Add hsaco for gfx940, gfx941 and gfx942

Change-Id: Ibd55fcc2d036d1190357e1e86d4e170568426d94


[ROCm/rdc commit: 9800528c19]
2024-09-17 14:15:35 -05:00
Galantsev, Dmitrii f3450f61bf Disable RVS as the error scares people
Change-Id: I572fdb65dd8882ab4fdc1474cb39fc0e493b1eab
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 660c5afaf4]
2024-09-13 10:42:59 -05:00
Chen Gong cd98bb7f90 Implement the code related to the GetMixedComponentVersion()
Change-Id: I98aad97b4cb6498b7f2fc03a2d5ee7c9e949d5f1
Signed-off-by: Chen Gong <curry.gong@amd.com>


[ROCm/rdc commit: 1edd04d84e]
2024-09-10 10:06:44 -05:00
Chen Gong 891039280f Implement rdc_device_get_component_version API related code
Implement an API to obtain the version information of the rdc calling component.
See rdc_component_t for details on available components.
It can be expanded later if necessary.

Change-Id: I03b48f774179c52c57b606704283add74ca39a02
Signed-off-by: Chen Gong <curry.gong@amd.com>


[ROCm/rdc commit: 5a3fd9fbc1]
2024-09-10 10:06:44 -05:00
Galantsev, Dmitrii cc3c3ce9b6 Add OAM_ID
Change-Id: I771b2f7f088940838c09ba3521a7955faa64e7ec
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: bffe4e22fa]
2024-09-09 21:19:33 -05:00
Tom Rix 27f35431ea Fix build with BUILD_STANDALONE=OFF
When the rdc is built with this configure option
-DBUILD_STANDALONE=OFF

This error is caused
CMake Error at rdc_libs/CMakeLists.txt:106 (export):
  export given target "rdc_client" which is not built by this project.

Resolve this by using conditional

Change-Id: I3f6bb2946c609c7db9fc38015b7d9c8ae766f3a0
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6762a6dd8b]
2024-07-08 12:49:09 -05:00
Galantsev, Dmitrii 9a2806ac95 SWDEV-452795 - Disable RAS plugin, fix XGMI
RAS plugin loaded rocm-smi which is in conflict with amd-smi library

Main source of grief was the map 'devInfoTypesStrings' that is defined
in both rocm-smi and amd-smi

We assume that rocm-smi would get lazy-loaded by RAS library and
overwrite symbols defined in amd-smi. devInfoTypesStrings in rocm-smi
contains different number of elements, the enums are also different.
RDC relies on amd-smi's enums.

One such enum is kDevGpuMetrics:
  rocm-smi: kDevGpuMetrics = 68
  amd-smi:  kDevGpuMetrics = 75

Example of overlapping map definitions:

  $ objdump --dynamic-syms /opt/rocm/lib/libamd_smi.so | grep devInfoTypesStrings
  00000000003c4980 g    DO .data.rel.ro0000000000000008  Base        devInfoTypesStrings
  00000000003db830 g    DO .bss0000000000000030  Base        _ZN3amd3smi6Device19devInfoTypesStringsE
  $ objdump --dynamic-syms /opt/rocm/lib/librocm_smi64.so  | grep devInfoTypesStrings
  00000000003dc590 g    DO .bss0000000000000030  Base        _ZN3amd3smi6Device19devInfoTypesStringsE
  00000000003c9c68 g    DO .data.rel.ro0000000000000008  Base        devInfoTypesStrings

Change-Id: Ib2f2db32b6abd7ebe84e7807c25581461eb86bae
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: d85657e5f2]
2024-06-26 03:42:07 -05:00
Galantsev, Dmitrii b50c64b868 Use correct rocprofiler metrics
Change-Id: I26603de7425abb6588f770ed68c22e14d6d20d56
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: d4bb33d100]
2024-06-11 11:15:18 -05:00
Galantsev, Dmitrii 73948f95e2 Rewrite rocprofiler plugin
Change-Id: Ic7dd967cc60cacd2b16a465180505ea2a342fccf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 3514225b83]
2024-06-11 03:11:15 -05:00
Galantsev, Dmitrii 29b86095ed Fix rocprofiler plugin
- Replace non-working fields with working ones
    - remove CU_OCCUPANCY completely as it isn't well supported
- Fix rocprofiler initialization with shared_ptr and rdc_module_init
- Replace env var ROCPROFILER_METRICS_PATH with ROCP_METRICS
    - ROCPROFILER_METRICS_PATH is only relevant for rocprofv2
    - ROCP_METRICS is only relevant for rocprofv1 (which we are using)

Change-Id: I21e6fa3f0e1694c38f44ca0e5659d672559f7380
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 20ca2ce574]
2024-06-06 01:51:39 -05:00
Galantsev, Dmitrii c2a75bbe4c Finalize the rocprofiler fields
Change-Id: I4ed1c4309f21bdcc7281d911663036caf5947182
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 07c414af5e]
2024-06-04 19:49:06 -05:00
Galantsev, Dmitrii f73e123900 Add GPU indexing and fix check for fields in rocprof
- Fix RUNPATH for tests

Change-Id: I79517592b49d27080a010a2e41e5878adf24a157
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e11afbf60f]
2024-06-04 12:56:22 -05:00