Pryor, Adam
a70aa81cfd
Dgalants/add auth script location ( #108 )
...
* DOCS: Add authentication scripts location
Change-Id: Ie285d80ea6d9bb8f710998208d0aa7c6db661d02
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
* Make README.md pretty (#44 )
Change-Id: I7c3341deaf3621ebbc9e495b023b1dd4971a5f1d
---------
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
Co-authored-by: Williams, Justin <Justin.Williams@amd.com >
2025-01-30 12:08:11 -06:00
Galantsev, Dmitrii
4da277a64e
DOCS: Add authentication scripts location ( #96 )
...
Change-Id: Ie285d80ea6d9bb8f710998208d0aa7c6db661d02
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-01-30 12:06:18 -06:00
Galantsev, Dmitrii
a8d479c147
CMAKE - Fix ABSL in clang18+ ( #106 )
...
Please see:
- https://github.com/abseil/abseil-cpp/issues/1747
- https://github.com/llvm/llvm-project/issues/102443
When GRPC is compiled with different compiler from RDC - ABI broke.
Possibly because some templates were not instantiated.
Setting '-fclang-abi-compat=17' fixes the issue.
Change-Id: Ic6409cf413c87b135f334e5b03145cb1c63356d4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-01-30 10:33:58 -06:00
Pryor, Adam
af56e460c4
SWDEV-500382 fix energy consumed ( #105 )
...
Change-Id: I3f180f34abed763db1287bf01581753534f32828
Signed-off-by: adapryor <Adam.pryor@amd.com >
2025-01-30 09:38:00 -06:00
Galantsev, Dmitrii
99d4d77e20
CMAKE - Move rdc_options into share/rdc/conf/
...
Change-Id: Ib2e792aef180f0f267d86d68c57b852b2cdc8ea6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-01-24 12:06:05 -06:00
Pryor, Adam
6f358ddc9e
SWDEV-508477 Eval Flops Percent ( #85 )
...
SWDEV-508477 - Profiler add FP*_PERCENT
Change-Id: Idb6250fe6b7ba3df6fe7d30861e0fbbda7e9bdce
Signed-off-by: adapryor <Adam.pryor@amd.com >
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-01-24 10:07:32 -06:00
Galantsev, Dmitrii
e033fd4c55
CMAKE - Rename SMI_*_DIR into AMD_SMI_*_DIR
...
Change-Id: I3b8b852e6b68f1448c8ed5d5e6ea4579c470ff53
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-01-23 20:56:00 -06:00
Ma, Li
9dce427c69
Fix Memory Current Bandwidth ( #98 )
...
Adjust the calculation order to ensure accuracy.
Change-Id: Ica10769fa3dba10c67428d09ffd454fc09ed0da8
Signed-off-by: Li Ma <li.ma@amd.com >
2025-01-24 10:22:08 +08:00
stali
e36d3fae22
fix topology issue
2025-01-24 09:22:42 +08:00
Galantsev, Dmitrii
ef77c0ed92
Fix workflow for rocprof by specifying GPU_TARGETS
...
Change-Id: I153f9e73471599fbcf68c73ad0ed9f4db7a742ef
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-01-23 18:38:34 -06:00
Galantsev, Dmitrii
9dd58b6907
Update workflow to artifacts@v4
...
Change-Id: Ib08a0afc0954ea2eb581425cbf9cf1d7715cebc5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-01-22 14:31:31 -06:00
adapryor
e8057b1042
SWDEV-500382 fix energy consumed
...
Change-Id: I3f180f34abed763db1287bf01581753534f32828
2025-01-21 21:49:33 -06:00
adapryor
290b90dc89
Implementation for RDC_FI_PROF_OCCUPANCY_PER_ACTIVE_CU SWDEV-50895
...
Signed-off-by: adapryor <Adam.pryor@amd.com >
Change-Id: I8da7d9846edabe5629c75f50cd2bb4b23e019a17
Signed-off-by: adapryor <Adam.pryor@amd.com >
2025-01-21 21:49:19 -06:00
stali
b427c07ffe
fixed rdc link state print issue
2025-01-22 09:05:49 +08:00
Pryor, Adam
0ae4404a09
SWDEV-510089 Fix rocprof segfaulting on ctrl+c ( #94 )
...
Change-Id: Iaa0f3856bb8fed174cbc935b85739414ecd44758
Signed-off-by: adapryor <Adam.pryor@amd.com >
2025-01-21 10:30:31 -06:00
Mallya, Ameya Keshava
0490b1c925
Fixed Workflow for updated KWS structure
2025-01-17 08:21:11 -08:00
Mallya, Ameya Keshava
cadbf69b45
Added KWS check ( #88 )
2025-01-15 11:11:01 -08:00
limeng12
016a1d9d39
[SWDEV-230863] Improve the functionality of RdcSmiHealth module.
...
Memory check:get the threshold of retired page number
EEPROM check:read and verify the checksum
Power/Thermal check: power/thermal throttle status counter
Signed-off-by: Meng Li <li.meng@amd.com >
Change-Id: Id2c751416eb5bf007e6e1da8dc05966a6ba1324e
2025-01-14 08:14:36 +08:00
Galantsev, Dmitrii
83f36f1673
Include assert.h during C compilation ( #4 )
...
Fix for https://github.com/ROCm/ROCm/issues/3997 . When compiling a C program that includes rdc/rdc.h, multiple assertion errors are thrown without this header included.
Change-Id: Ie5b5c1a1a17c8207cf9b1be23b31193e260d5c1a
Co-authored-by: harkgill-amd <harkgill@amd.com >
2025-01-10 11:29:15 -05:00
srawat
0e53160bee
Update LICENSE
2025-01-09 13:12:24 -06:00
Galantsev, Dmitrii
5861ec7663
RVS - Add IET and PEBB tests
...
Change-Id: Ia032901d74c882e5cbfa5a3164199cd4d571341f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-01-08 18:23:13 -06:00
Galantsev, Dmitrii
b058cbecf1
RVS - Add memory bandwidth test
...
Change-Id: I4c8990170861f6a0f3853615db68634fdaa7a622
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-01-08 18:23:13 -06:00
stali
a76760db8c
fix group policy reg issue
2025-01-07 15:02:17 +08:00
Li, Star
bd7d7c99c1
Fix unit issue in policy feature ( #78 )
...
1. For temperature the unit in milli Celsius
2. For power the unit in microwatts.
3. Fix second register call to rdcd doesn't functional because start flag
Co-authored-by: Chao Fei <chao.fei@amd.com >
2025-01-06 09:21:08 +08:00
Pryor, Adam
60b7359161
Implementation for adding pcie_total ( #40 )
...
* Implementation for adding pcie_total
Signed-off-by: adapryor <Adam.pryor@amd.com >
Change-Id: I4b0cfd7095e9d984e939283ee7169d01f55a1847
Signed-off-by: adapryor <Adam.pryor@amd.com >
* Updates
Signed-off-by: adapryor <Adam.pryor@amd.com >
Change-Id: I021f29083de651cab9fbe7db98acbe20f65948d4
* Updates
Signed-off-by: adapryor <Adam.pryor@amd.com >
Change-Id: I42f3207b745fa787dabe30a85c8e063159d1337d
---------
Signed-off-by: adapryor <Adam.pryor@amd.com >
2024-12-26 18:36:41 -06:00
Ma, Li
772481f952
SWDEV-475244 - Memory Usage and Bandwidth: max mem and current mem ( #48 )
2024-12-23 10:22:53 +08:00
stali
29b6699b62
Enable RDC link Status feature
...
1.add link status APIs
2.Add link status example for link status API usage
2024-12-23 09:30:21 +08:00
Greg Scaffidi
f4de4b0529
Add RDC_FI_PROF_SM_ACTIVE metric.
...
Signed-off-by: Greg Scaffidi <salvatore.scaffidi@amd.com >
Change-Id: I63aaf5eb05d74ba696ace2b088e17c2cfb1bd74b
Signed-off-by: adapryor <Adam.pryor@amd.com >
2024-12-21 15:21:46 -06:00
Adam Pryor
df170c8801
Implementation for SWDEV-479728:[RDC] - Clock Speed/Power Cap Control
...
Change-Id: I767a71325527aa3c691e9607953ceafebacfb4d5
Signed-off-by: adapryor <Adam.pryor@amd.com >
2024-12-20 16:03:33 -06:00
Galantsev, Dmitrii
7c91a07a43
Profiler - Migrate from rocprofv1 to rocprofv3
...
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
Fixed RDC for Rocprofv3
Updates
Signed-off-by: adapryor <Adam.pryor@amd.com >
Change-Id: Ic9162bacf1322b265e6bbcdd9fbb9b1fdef414fd
last updates
Change-Id: I12e168501327c5e4cff8a9273b0512fb0e098fe7
comment
Change-Id: I61da61e66dcc017ec46f98ff4c90fb064c9679e8
2024-12-20 15:39:02 -06:00
Maisam Arif
35eb8e7c4b
Resolve CI caller merge conflicts
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Icb0389f422b6f158492828e79e44fe00e5db07f5
2024-12-19 10:10:23 -06:00
Choudhary, Rahul
69be6f1c16
Update rocm_ci_caller.yml: base ref to support pull and push request
...
The change is present in mainline and was missing in staging
2024-12-18 11:57:44 -08:00
adapryor
e1e7f59269
Fix for SWDEV-500637
...
Signed-off-by: adapryor <Adam.pryor@amd.com >
Change-Id: Id42a2da321bdba74dfc8e16d7dc04d05cef4e34a
2024-12-18 11:10:41 -06:00
Choudhary, Rahul
948271bd9b
Create rocm_ci_caller.yml enabling PSDB and OSDB for amd-mainline changes
2024-12-17 11:36:22 -08:00
stali
8bcb5f7068
Enable RDC topology feature
...
1.Add topology APIs
2.Add topology example for topology API usage
Change-Id: Ib79c06d0bac85119672f194ba685ebf25029979c
2024-12-16 10:02:41 +08:00
Li Ma
30f9b2ac2f
SWDEV-475244 - Memory Usage and Bandwidth: max mem and current mem
...
Implemented max memory bandwith and current memory bandwidth. Added two
new field ids: RDC_FI_GPU_MEMORY_MAX_BANDWIDTH, RDC_FI_GPU_MEMORY_CUR_BANDWIDTH
Signed-off-by: Li Ma <li.ma@amd.com >
Change-Id: I453e49937a84777146575f4f5bdd69fd4fe53bfc
2024-12-16 09:43:20 +08:00
Galantsev, Dmitrii
2c61dfe2ce
Profiler - Remove averaging
...
Averaging happens very slowly and only confuses people...
Change-Id: I60754d3b896b6ffeb6104bb1c2fcc54e9869b331
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2024-12-11 11:58:50 -06:00
Galantsev, Dmitrii
2605eda5f3
Profiler - Fix fp64 metric
...
Change-Id: Iab27e21740c2c51143a9e88d085b80716bf193e2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2024-12-11 11:27:41 -06:00
Ranjith Ramakrishnan
b778a879cb
SWDEV-502603 - Use RPM_INSTALL_PREFIX variable rather than hard coded install prefix paths in RPM post/prerm scripts
...
Change-Id: I2699459e1e3730cf045f24f0c90e09f900701a6f
2024-12-10 21:44:09 -06:00
zichguan-amd
c042b4f582
Make ROCM_DIR default ROCm path for rocprof
...
Signed-off-by: zichguan-amd <zichuan.guan@amd.com >
2024-12-02 11:19:56 -05:00
Galantsev, Dmitrii
b5272fb99c
CI - Use vars instead of secrets
...
Change-Id: Ib917b8677c204a75bedcb345978f2b09216b115f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2024-11-26 11:17:50 -06:00
Galantsev, Dmitrii
94005119d6
CI - Add initial config
...
Change-Id: I02a08e3f761b7997d8835566b81654431423405d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2024-11-25 21:43:33 -06:00
Chen Gong
251fcbe49d
rocprofiler: add valu utilization
...
SWDEV-475242
For the description of "FP32 Engine Activity" and "FP64 Engine Activity" in dcgm,
It seems that we do not have an equivalent to these pipe-utilizations on our hardware.
In rocprofiler, I think VALU Utilization is the closest to what we want.
Change-Id: Ibce8835ef4757084cdfd73258de6fc1606ca0158
Signed-off-by: Chen Gong <curry.gong@amd.com >
2024-11-21 15:24:01 +08:00
limeng12
853d3b0cc5
Backgroud health check
...
Add the RdcSmiHealth module, which will call rocm_smi_lib.
It will support following health:
- XGMI error detected
- PCIE replay count detected
- Memory check
- InfoROM check
- Power/Thermal check
The grpc client and server side health function is added.
The health module is added to the rdci.
At present, XGMI/PCIE and a part of Memory have been implemented.
Others will be added as soon as possible.
Change-Id: I1bd99290bdc7dea733f21a41a8c4bcefb2138112
2024-11-19 14:00:49 +08:00
Galantsev, Dmitrii
f1428a8226
Update changelog for 6.3
...
Change-Id: I1b2d26f1e6c7963052fb36fd6c40e3d10c22082d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
Co-authored-by: Rawat, Swati <Swati.Rawat@amd.com >
2024-11-15 14:10:11 -06:00
Bill(Shuzhou) Liu
5e3ebecf80
Correct RDC_FI_PCIE_BANDWIDTH unit
...
The unit should be mbps instead of GB/second
2024-11-13 09:45:46 -05:00
stali
d8fec06bab
Enable RDCI policy subsystem
...
- Enable set and get for policy settings
- Enable register and clear policy events
Change-Id: If4eaaf9b80e668fb21691757210e0aa1532cecae
Signed-off-by: stali <Star.Li@amd.com >
2024-11-12 20:40:08 -06:00
Galantsev, Dmitrii
e1b57c43f3
RVS - Fix cookie_t -> rdc_diag_callback_t types issue
...
Issue introduced in 37ddd5bf50
Change-Id: I2b6a8024d45fc44d92cf2770be9887dfc0fb3ede
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2024-11-12 10:36:52 -06:00
Galantsev, Dmitrii
4f7e441566
AMDSMI - Fix kRasErrStateStrings in tests
...
Change-Id: Ia9498fae215397baf7201715574954313c17da93
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii
37ddd5bf50
RVS - Report test progress in realtime
...
Change-Id: Id9fea71f242f372f408ecd777c030465b7ef9989
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2024-11-07 11:21:22 -06:00