Граф коммитов

284 Коммитов

Автор SHA1 Сообщение Дата
Pryor, Adam a70aa81cfd Dgalants/add auth script location (#108)
* DOCS: Add authentication scripts location

Change-Id: Ie285d80ea6d9bb8f710998208d0aa7c6db661d02
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

* Make README.md pretty (#44)

Change-Id: I7c3341deaf3621ebbc9e495b023b1dd4971a5f1d

---------

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Williams, Justin <Justin.Williams@amd.com>
2025-01-30 12:08:11 -06:00
Galantsev, Dmitrii 4da277a64e DOCS: Add authentication scripts location (#96)
Change-Id: Ie285d80ea6d9bb8f710998208d0aa7c6db661d02

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-30 12:06:18 -06:00
Galantsev, Dmitrii a8d479c147 CMAKE - Fix ABSL in clang18+ (#106)
Please see:
- https://github.com/abseil/abseil-cpp/issues/1747
- https://github.com/llvm/llvm-project/issues/102443

When GRPC is compiled with different compiler from RDC - ABI broke.
Possibly because some templates were not instantiated.
Setting '-fclang-abi-compat=17' fixes the issue.

Change-Id: Ic6409cf413c87b135f334e5b03145cb1c63356d4

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-30 10:33:58 -06:00
Pryor, Adam af56e460c4 SWDEV-500382 fix energy consumed (#105)
Change-Id: I3f180f34abed763db1287bf01581753534f32828

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-01-30 09:38:00 -06:00
Galantsev, Dmitrii 99d4d77e20 CMAKE - Move rdc_options into share/rdc/conf/
Change-Id: Ib2e792aef180f0f267d86d68c57b852b2cdc8ea6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-24 12:06:05 -06:00
Pryor, Adam 6f358ddc9e SWDEV-508477 Eval Flops Percent (#85)
SWDEV-508477 - Profiler add FP*_PERCENT

Change-Id: Idb6250fe6b7ba3df6fe7d30861e0fbbda7e9bdce

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-24 10:07:32 -06:00
Galantsev, Dmitrii e033fd4c55 CMAKE - Rename SMI_*_DIR into AMD_SMI_*_DIR
Change-Id: I3b8b852e6b68f1448c8ed5d5e6ea4579c470ff53
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-23 20:56:00 -06:00
Ma, Li 9dce427c69 Fix Memory Current Bandwidth (#98)
Adjust the calculation order to ensure accuracy.


Change-Id: Ica10769fa3dba10c67428d09ffd454fc09ed0da8

Signed-off-by: Li Ma <li.ma@amd.com>
2025-01-24 10:22:08 +08:00
stali e36d3fae22 fix topology issue 2025-01-24 09:22:42 +08:00
Galantsev, Dmitrii ef77c0ed92 Fix workflow for rocprof by specifying GPU_TARGETS
Change-Id: I153f9e73471599fbcf68c73ad0ed9f4db7a742ef
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-23 18:38:34 -06:00
Galantsev, Dmitrii 9dd58b6907 Update workflow to artifacts@v4
Change-Id: Ib08a0afc0954ea2eb581425cbf9cf1d7715cebc5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-22 14:31:31 -06:00
adapryor e8057b1042 SWDEV-500382 fix energy consumed
Change-Id: I3f180f34abed763db1287bf01581753534f32828
2025-01-21 21:49:33 -06:00
adapryor 290b90dc89 Implementation for RDC_FI_PROF_OCCUPANCY_PER_ACTIVE_CU SWDEV-50895
Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I8da7d9846edabe5629c75f50cd2bb4b23e019a17
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-01-21 21:49:19 -06:00
stali b427c07ffe fixed rdc link state print issue 2025-01-22 09:05:49 +08:00
Pryor, Adam 0ae4404a09 SWDEV-510089 Fix rocprof segfaulting on ctrl+c (#94)
Change-Id: Iaa0f3856bb8fed174cbc935b85739414ecd44758

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-01-21 10:30:31 -06:00
Mallya, Ameya Keshava 0490b1c925 Fixed Workflow for updated KWS structure 2025-01-17 08:21:11 -08:00
Mallya, Ameya Keshava cadbf69b45 Added KWS check (#88) 2025-01-15 11:11:01 -08:00
limeng12 016a1d9d39 [SWDEV-230863] Improve the functionality of RdcSmiHealth module.
Memory check:get the threshold of retired page number
EEPROM check:read and verify the checksum
Power/Thermal check: power/thermal throttle status counter

Signed-off-by: Meng Li <li.meng@amd.com>
Change-Id: Id2c751416eb5bf007e6e1da8dc05966a6ba1324e
2025-01-14 08:14:36 +08:00
Galantsev, Dmitrii 83f36f1673 Include assert.h during C compilation (#4)
Fix for https://github.com/ROCm/ROCm/issues/3997. When compiling a C program that includes rdc/rdc.h, multiple assertion errors are thrown without this header included.

Change-Id: Ie5b5c1a1a17c8207cf9b1be23b31193e260d5c1a

Co-authored-by: harkgill-amd <harkgill@amd.com>
2025-01-10 11:29:15 -05:00
srawat 0e53160bee Update LICENSE 2025-01-09 13:12:24 -06:00
Galantsev, Dmitrii 5861ec7663 RVS - Add IET and PEBB tests
Change-Id: Ia032901d74c882e5cbfa5a3164199cd4d571341f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-08 18:23:13 -06:00
Galantsev, Dmitrii b058cbecf1 RVS - Add memory bandwidth test
Change-Id: I4c8990170861f6a0f3853615db68634fdaa7a622
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-01-08 18:23:13 -06:00
stali a76760db8c fix group policy reg issue 2025-01-07 15:02:17 +08:00
Li, Star bd7d7c99c1 Fix unit issue in policy feature (#78)
1. For temperature the unit in milli Celsius
2. For power the unit in microwatts.
3. Fix second register call to rdcd doesn't functional because start flag

Co-authored-by: Chao Fei <chao.fei@amd.com>
2025-01-06 09:21:08 +08:00
Pryor, Adam 60b7359161 Implementation for adding pcie_total (#40)
* Implementation for adding pcie_total

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I4b0cfd7095e9d984e939283ee7169d01f55a1847
Signed-off-by: adapryor <Adam.pryor@amd.com>

* Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I021f29083de651cab9fbe7db98acbe20f65948d4

* Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I42f3207b745fa787dabe30a85c8e063159d1337d

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-26 18:36:41 -06:00
Ma, Li 772481f952 SWDEV-475244 - Memory Usage and Bandwidth: max mem and current mem (#48) 2024-12-23 10:22:53 +08:00
stali 29b6699b62 Enable RDC link Status feature
1.add link status APIs
   2.Add link status example for link status API usage
2024-12-23 09:30:21 +08:00
Greg Scaffidi f4de4b0529 Add RDC_FI_PROF_SM_ACTIVE metric.
Signed-off-by: Greg Scaffidi <salvatore.scaffidi@amd.com>
Change-Id: I63aaf5eb05d74ba696ace2b088e17c2cfb1bd74b
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-21 15:21:46 -06:00
Adam Pryor df170c8801 Implementation for SWDEV-479728:[RDC] - Clock Speed/Power Cap Control
Change-Id: I767a71325527aa3c691e9607953ceafebacfb4d5
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-20 16:03:33 -06:00
Galantsev, Dmitrii 7c91a07a43 Profiler - Migrate from rocprofv1 to rocprofv3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Fixed RDC for Rocprofv3

Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Ic9162bacf1322b265e6bbcdd9fbb9b1fdef414fd

last updates

Change-Id: I12e168501327c5e4cff8a9273b0512fb0e098fe7

comment

Change-Id: I61da61e66dcc017ec46f98ff4c90fb064c9679e8
2024-12-20 15:39:02 -06:00
Maisam Arif 35eb8e7c4b Resolve CI caller merge conflicts
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Icb0389f422b6f158492828e79e44fe00e5db07f5
2024-12-19 10:10:23 -06:00
Choudhary, Rahul 69be6f1c16 Update rocm_ci_caller.yml: base ref to support pull and push request
The change is present in mainline and was missing in staging
2024-12-18 11:57:44 -08:00
adapryor e1e7f59269 Fix for SWDEV-500637
Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Id42a2da321bdba74dfc8e16d7dc04d05cef4e34a
2024-12-18 11:10:41 -06:00
Choudhary, Rahul 948271bd9b Create rocm_ci_caller.yml enabling PSDB and OSDB for amd-mainline changes 2024-12-17 11:36:22 -08:00
stali 8bcb5f7068 Enable RDC topology feature
1.Add topology APIs
2.Add topology example for topology API usage

Change-Id: Ib79c06d0bac85119672f194ba685ebf25029979c
2024-12-16 10:02:41 +08:00
Li Ma 30f9b2ac2f SWDEV-475244 - Memory Usage and Bandwidth: max mem and current mem
Implemented max memory bandwith and current memory bandwidth. Added two
new field ids: RDC_FI_GPU_MEMORY_MAX_BANDWIDTH, RDC_FI_GPU_MEMORY_CUR_BANDWIDTH

Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I453e49937a84777146575f4f5bdd69fd4fe53bfc
2024-12-16 09:43:20 +08:00
Galantsev, Dmitrii 2c61dfe2ce Profiler - Remove averaging
Averaging happens very slowly and only confuses people...

Change-Id: I60754d3b896b6ffeb6104bb1c2fcc54e9869b331
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-12-11 11:58:50 -06:00
Galantsev, Dmitrii 2605eda5f3 Profiler - Fix fp64 metric
Change-Id: Iab27e21740c2c51143a9e88d085b80716bf193e2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-12-11 11:27:41 -06:00
Ranjith Ramakrishnan b778a879cb SWDEV-502603 - Use RPM_INSTALL_PREFIX variable rather than hard coded install prefix paths in RPM post/prerm scripts
Change-Id: I2699459e1e3730cf045f24f0c90e09f900701a6f
2024-12-10 21:44:09 -06:00
zichguan-amd c042b4f582 Make ROCM_DIR default ROCm path for rocprof
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2024-12-02 11:19:56 -05:00
Galantsev, Dmitrii b5272fb99c CI - Use vars instead of secrets
Change-Id: Ib917b8677c204a75bedcb345978f2b09216b115f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-26 11:17:50 -06:00
Galantsev, Dmitrii 94005119d6 CI - Add initial config
Change-Id: I02a08e3f761b7997d8835566b81654431423405d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-25 21:43:33 -06:00
Chen Gong 251fcbe49d rocprofiler: add valu utilization
SWDEV-475242

For the description of "FP32 Engine Activity" and "FP64 Engine Activity" in dcgm,
It seems that we do not have an equivalent to these pipe-utilizations on our hardware.

In rocprofiler, I think VALU Utilization is the closest to what we want.

Change-Id: Ibce8835ef4757084cdfd73258de6fc1606ca0158
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-11-21 15:24:01 +08:00
limeng12 853d3b0cc5 Backgroud health check
Add the RdcSmiHealth module, which will call rocm_smi_lib.
It will support following health:
 - XGMI error detected
 - PCIE replay count detected
 - Memory check
 - InfoROM check
 - Power/Thermal check
The grpc client and server side health function is added.
The health module is added to the rdci.

At present, XGMI/PCIE and a part of Memory have been implemented.
Others will be added as soon as possible.

Change-Id: I1bd99290bdc7dea733f21a41a8c4bcefb2138112
2024-11-19 14:00:49 +08:00
Galantsev, Dmitrii f1428a8226 Update changelog for 6.3
Change-Id: I1b2d26f1e6c7963052fb36fd6c40e3d10c22082d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Rawat, Swati <Swati.Rawat@amd.com>
2024-11-15 14:10:11 -06:00
Bill(Shuzhou) Liu 5e3ebecf80 Correct RDC_FI_PCIE_BANDWIDTH unit
The unit should be mbps instead of GB/second
2024-11-13 09:45:46 -05:00
stali d8fec06bab Enable RDCI policy subsystem
- Enable set and get for policy settings
- Enable register and clear policy events

Change-Id: If4eaaf9b80e668fb21691757210e0aa1532cecae
Signed-off-by: stali <Star.Li@amd.com>
2024-11-12 20:40:08 -06:00
Galantsev, Dmitrii e1b57c43f3 RVS - Fix cookie_t -> rdc_diag_callback_t types issue
Issue introduced in 37ddd5bf50

Change-Id: I2b6a8024d45fc44d92cf2770be9887dfc0fb3ede
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-12 10:36:52 -06:00
Galantsev, Dmitrii 4f7e441566 AMDSMI - Fix kRasErrStateStrings in tests
Change-Id: Ia9498fae215397baf7201715574954313c17da93
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii 37ddd5bf50 RVS - Report test progress in realtime
Change-Id: Id9fea71f242f372f408ecd777c030465b7ef9989
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-07 11:21:22 -06:00