提交線圖

148 次程式碼提交

作者 SHA1 備註 日期
adapryor 33924ea79e Profiler - Fix SIMD Utilization
Change-Id: I6775cce9901a714d20e80c8c17e7a563edeb48a4
2025-05-07 00:56:52 -05:00
Galantsev, Dmitrii fa8b89f4ae CMAKE - Format with cmake-format
Change-Id: I08e71fc5060b1f6e0168225cc5fe66886c2044bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-05-06 17:28:14 -05:00
Galantsev, Dmitrii 02c0786a2c Profiler - Add SIMD_UTILIZATION (#171)
Change-Id: I19d5acd80dbed8c4fc4e1c85eec71ca89398d299

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-05-06 13:20:03 -07:00
Pryor, Adam 2db6ddea69 [SWDEV-523349/SWDEV-527257] Fix Rdci Config (#161)
Change-Id: Iae21ea8061205f186086a3ed59c6259ddeb1dbe7

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-04-28 11:57:51 -05:00
Galantsev, Dmitrii a5cb334f8b Add RDC_FI_GPU_BUSY_PERCENT
AMDSMI needs to merge first and bump the version to at least 24.4.2

Change-Id: I30149bb78c79ebc3de0dabdc8e63fcef12b2f406
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-04-15 17:00:56 -05:00
Galantsev, Dmitrii ac50573e67 CMAKE - Bump version to 1.1.0
Change-Id: I0fbc0f6d842c034ad858f30fa6418afd01e11a4f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-04-11 17:27:27 -05:00
Galantsev, Dmitrii dfae9cd37f Profiler - Remove buffer to fix memory leaks
Change-Id: Ia3717ccfc147221557f5469965c2abb76b3f451c
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-04-11 17:27:27 -05:00
Pryor, Adam 58811fecbb [SWDEV-515192] Fix rdc topo (#146)
Change-Id: I64a8077a56e2eaf99735fafb1010d869a1fdb0c3

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-04-10 17:46:08 -05:00
Galantsev, Dmitrii 91be467cad Profiler - Fix eval fields
The 'value' pointer was being written to a lot and then used for reading
within the same function. This likely caused issues all over RDC when
reading the metrics.

This commit changes it so *value is written to only once.

Change-Id: I83c158c1e46c6ce46ff87d8a2e769f26ffa8c0da
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-04-09 20:06:21 -05:00
Galantsev, Dmitrii 24024f0e4f Revert "Implement CPU discovery support"
This reverts commit f967f8a17d15e148464393fcd145af01dc0e1525.
2025-04-07 20:45:19 -05:00
Galantsev, Dmitrii c96f5db52c Revert "Fix breaking changes introduced with CPU support"
This reverts commit e9ac9e4626e3e45ebdfafb39e251d073091429f1.
2025-04-07 20:45:19 -05:00
Galantsev, Dmitrii 0aeceefcb3 Fix breaking changes introduced with CPU support
Changes introduced in 3bdca8b8b6
broke RDC if it was compiled without ESMI support, or if esmi driver is
not loaded when RDC is being used.

Change-Id: Id54e1e9002d2e3cf09240081149eed84178700af
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-04-07 14:41:46 -05:00
Yuan, Perry 3bdca8b8b6 Implement CPU discovery support (#77)
* Implement CPU discovery support

SWDEV-482949:

enable the CPU model name info support to the RDC, rdci command
can detect GPU and CPU modules at the same time.
It will query the CPU info through the amdsmi interface like below:

1 GPUs found.
-----------------------------------------------------------------
GPU Index        Device Information
0               AMD Radeon PRO W7800
=================================================================
1 CPUs found.
-----------------------------------------------------------------
CPU Index        Device Information
0               AMD Ryzen Threadripper PRO 7995WX 96-Cores
-----------------------------------------------------------------

Change-Id: Ibc6533c9a61000cd86c45b1bae14c3eb6788c119
Signed-off-by: Perry Yuan <perry.yuan@amd.com>

* CMAKE - Add required version for amdsmi

Change-Id: I341a89351d196ec66cce215a5d1d3953302fcc66
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

---------

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-31 10:58:36 +08:00
Galantsev, Dmitrii 80ee980cdb CMAKE - Fix build types
Addresses issue https://github.com/ROCm/rdc/issues/43

Change-Id: I456184358524a6feef4bf83eecb655678c3bc42d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-30 18:54:54 -05:00
Galantsev, Dmitrii bdb2367010 RVS - Add long-running tests
Change-Id: Iddeb7f2d4fdcd69d7ac1ae94b2fa128ee3011b1a
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-27 23:42:56 -05:00
Galantsev, Dmitrii 58350a8bb8 Profiler - Remove bootstrap link
Change-Id: Ieea57515d77c2d521d95568c3bc2660cc829d829
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-27 23:29:30 -05:00
Galantsev, Dmitrii 059d015ea4 CMAKE - Add BUILD_INTERFACE include dirs for rdc_bootstrap
Change-Id: I93df878b21e245277c7a8d9589102a15c2517f4f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-27 23:29:30 -05:00
Galantsev, Dmitrii 51de344be7 Profiler - Add CPC and CPF metrics
Change-Id: I27fd725e9e1868c9afe7624d6e4aafad2a42d47e
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-27 19:01:23 -05:00
Pryor, Adam 47692d3ed5 [SWDEV-498711] RDC Partition Implementation (#119)
* [SWDEV-498711] RDC Partition Implementation

Change-Id: Ibfc3709793770537e4c9d36458f34c6b4f461724
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-03-27 14:10:11 -05:00
Galantsev, Dmitrii 929041b556 Fix amdsmi_get_power_info API
This change creates a workaround for a broken C api in amdsmi.

amdsmi_get_power_info API is broken in rocm 6.4.0 (amdsmi 25.2) and is fixed
in rocm 6.4.1 (amdsmi 25.3).

Breaking AMDSMI change:
https://github.com/ROCm/amdsmi/commit/dc4a16da6fb45d581a6e23c78d340172989418a0

Change-Id: Ib45a2702aa722c7735f3ccd1081d8f62e4d34216
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-19 23:12:45 -05:00
Galantsev, Dmitrii f5a4402ce5 RVS - Use config files and make GPU aware
Change-Id: I7a5c80ed4e6122d102e494d1ae38b4b7d40c42cd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-11 15:39:16 -05:00
Galantsev, Dmitrii 247c8c7d5e RVS - Disable IET test
Change-Id: I015d68735316d2dc6af18d16f972d9f379b76bcf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-11 09:51:08 -05:00
Galantsev, Dmitrii d5f8ff0ab0 CMAKE - Set fallback version to 0.3.0
Change-Id: I2322bdb7d3a8e4f83346ca4f5d24351ad2a4eccc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-04 08:43:32 -06:00
Li Ma 26ea06bb69 Modify the error log for MM_ENC_UTIL
Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I83805fc8ad7003ecd5189c8f940b44edbf0ebd1f
2025-03-04 08:42:22 -06:00
Arif, Maisam 552f15a1fb Fixed RDC to work with updated amdsmi_get_power_info() (#115)
Change-Id: Ic9e7a68ae58f61dbe73fc7d1b17af34152933e71

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-02-11 00:51:29 -06:00
Pryor, Adam 93a8ab8915 SWDEV-512736 Fix RDC Policy callback printout (#114)
Change-Id: I6e018dcb0a6b272812c959649d913e3ba33def40
2025-02-10 08:40:03 -06:00
Pryor, Adam af56e460c4 SWDEV-500382 fix energy consumed (#105)
Change-Id: I3f180f34abed763db1287bf01581753534f32828

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-01-30 09:38:00 -06:00
Pryor, Adam 6f358ddc9e SWDEV-508477 Eval Flops Percent (#85)
SWDEV-508477 - Profiler add FP*_PERCENT

Change-Id: Idb6250fe6b7ba3df6fe7d30861e0fbbda7e9bdce

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-24 10:07:32 -06:00
Galantsev, Dmitrii e033fd4c55 CMAKE - Rename SMI_*_DIR into AMD_SMI_*_DIR
Change-Id: I3b8b852e6b68f1448c8ed5d5e6ea4579c470ff53
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-23 20:56:00 -06:00
Ma, Li 9dce427c69 Fix Memory Current Bandwidth (#98)
Adjust the calculation order to ensure accuracy.


Change-Id: Ica10769fa3dba10c67428d09ffd454fc09ed0da8

Signed-off-by: Li Ma <li.ma@amd.com>
2025-01-24 10:22:08 +08:00
adapryor e8057b1042 SWDEV-500382 fix energy consumed
Change-Id: I3f180f34abed763db1287bf01581753534f32828
2025-01-21 21:49:33 -06:00
adapryor 290b90dc89 Implementation for RDC_FI_PROF_OCCUPANCY_PER_ACTIVE_CU SWDEV-50895
Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I8da7d9846edabe5629c75f50cd2bb4b23e019a17
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-01-21 21:49:19 -06:00
Pryor, Adam 0ae4404a09 SWDEV-510089 Fix rocprof segfaulting on ctrl+c (#94)
Change-Id: Iaa0f3856bb8fed174cbc935b85739414ecd44758

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-01-21 10:30:31 -06:00
limeng12 016a1d9d39 [SWDEV-230863] Improve the functionality of RdcSmiHealth module.
Memory check:get the threshold of retired page number
EEPROM check:read and verify the checksum
Power/Thermal check: power/thermal throttle status counter

Signed-off-by: Meng Li <li.meng@amd.com>
Change-Id: Id2c751416eb5bf007e6e1da8dc05966a6ba1324e
2025-01-14 08:14:36 +08:00
Galantsev, Dmitrii 5861ec7663 RVS - Add IET and PEBB tests
Change-Id: Ia032901d74c882e5cbfa5a3164199cd4d571341f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-08 18:23:13 -06:00
Galantsev, Dmitrii b058cbecf1 RVS - Add memory bandwidth test
Change-Id: I4c8990170861f6a0f3853615db68634fdaa7a622
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-08 18:23:13 -06:00
Pryor, Adam 60b7359161 Implementation for adding pcie_total (#40)
* Implementation for adding pcie_total

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I4b0cfd7095e9d984e939283ee7169d01f55a1847
Signed-off-by: adapryor <Adam.pryor@amd.com>

* Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I021f29083de651cab9fbe7db98acbe20f65948d4

* Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I42f3207b745fa787dabe30a85c8e063159d1337d

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-26 18:36:41 -06:00
Ma, Li 772481f952 SWDEV-475244 - Memory Usage and Bandwidth: max mem and current mem (#48) 2024-12-23 10:22:53 +08:00
stali 29b6699b62 Enable RDC link Status feature
1.add link status APIs
   2.Add link status example for link status API usage
2024-12-23 09:30:21 +08:00
Greg Scaffidi f4de4b0529 Add RDC_FI_PROF_SM_ACTIVE metric.
Signed-off-by: Greg Scaffidi <salvatore.scaffidi@amd.com>
Change-Id: I63aaf5eb05d74ba696ace2b088e17c2cfb1bd74b
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-21 15:21:46 -06:00
Adam Pryor df170c8801 Implementation for SWDEV-479728:[RDC] - Clock Speed/Power Cap Control
Change-Id: I767a71325527aa3c691e9607953ceafebacfb4d5
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-20 16:03:33 -06:00
Galantsev, Dmitrii 7c91a07a43 Profiler - Migrate from rocprofv1 to rocprofv3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Fixed RDC for Rocprofv3

Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Ic9162bacf1322b265e6bbcdd9fbb9b1fdef414fd

last updates

Change-Id: I12e168501327c5e4cff8a9273b0512fb0e098fe7

comment

Change-Id: I61da61e66dcc017ec46f98ff4c90fb064c9679e8
2024-12-20 15:39:02 -06:00
adapryor e1e7f59269 Fix for SWDEV-500637
Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Id42a2da321bdba74dfc8e16d7dc04d05cef4e34a
2024-12-18 11:10:41 -06:00
stali 8bcb5f7068 Enable RDC topology feature
1.Add topology APIs
2.Add topology example for topology API usage

Change-Id: Ib79c06d0bac85119672f194ba685ebf25029979c
2024-12-16 10:02:41 +08:00
Li Ma 30f9b2ac2f SWDEV-475244 - Memory Usage and Bandwidth: max mem and current mem
Implemented max memory bandwith and current memory bandwidth. Added two
new field ids: RDC_FI_GPU_MEMORY_MAX_BANDWIDTH, RDC_FI_GPU_MEMORY_CUR_BANDWIDTH

Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I453e49937a84777146575f4f5bdd69fd4fe53bfc
2024-12-16 09:43:20 +08:00
Galantsev, Dmitrii 2c61dfe2ce Profiler - Remove averaging
Averaging happens very slowly and only confuses people...

Change-Id: I60754d3b896b6ffeb6104bb1c2fcc54e9869b331
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-12-11 11:58:50 -06:00
Galantsev, Dmitrii 2605eda5f3 Profiler - Fix fp64 metric
Change-Id: Iab27e21740c2c51143a9e88d085b80716bf193e2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-12-11 11:27:41 -06:00
zichguan-amd c042b4f582 Make ROCM_DIR default ROCm path for rocprof
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2024-12-02 11:19:56 -05:00
Chen Gong 251fcbe49d rocprofiler: add valu utilization
SWDEV-475242

For the description of "FP32 Engine Activity" and "FP64 Engine Activity" in dcgm,
It seems that we do not have an equivalent to these pipe-utilizations on our hardware.

In rocprofiler, I think VALU Utilization is the closest to what we want.

Change-Id: Ibce8835ef4757084cdfd73258de6fc1606ca0158
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-11-21 15:24:01 +08:00
limeng12 853d3b0cc5 Backgroud health check
Add the RdcSmiHealth module, which will call rocm_smi_lib.
It will support following health:
 - XGMI error detected
 - PCIE replay count detected
 - Memory check
 - InfoROM check
 - Power/Thermal check
The grpc client and server side health function is added.
The health module is added to the rdci.

At present, XGMI/PCIE and a part of Memory have been implemented.
Others will be added as soon as possible.

Change-Id: I1bd99290bdc7dea733f21a41a8c4bcefb2138112
2024-11-19 14:00:49 +08:00