コミットグラフ

255 コミット

作成者 SHA1 メッセージ 日付
Adam Pryor df170c8801 Implementation for SWDEV-479728:[RDC] - Clock Speed/Power Cap Control
Change-Id: I767a71325527aa3c691e9607953ceafebacfb4d5
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-20 16:03:33 -06:00
Galantsev, Dmitrii 7c91a07a43 Profiler - Migrate from rocprofv1 to rocprofv3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Fixed RDC for Rocprofv3

Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Ic9162bacf1322b265e6bbcdd9fbb9b1fdef414fd

last updates

Change-Id: I12e168501327c5e4cff8a9273b0512fb0e098fe7

comment

Change-Id: I61da61e66dcc017ec46f98ff4c90fb064c9679e8
2024-12-20 15:39:02 -06:00
Maisam Arif 35eb8e7c4b Resolve CI caller merge conflicts
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Icb0389f422b6f158492828e79e44fe00e5db07f5
2024-12-19 10:10:23 -06:00
Choudhary, Rahul 69be6f1c16 Update rocm_ci_caller.yml: base ref to support pull and push request
The change is present in mainline and was missing in staging
2024-12-18 11:57:44 -08:00
adapryor e1e7f59269 Fix for SWDEV-500637
Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Id42a2da321bdba74dfc8e16d7dc04d05cef4e34a
2024-12-18 11:10:41 -06:00
Choudhary, Rahul 948271bd9b Create rocm_ci_caller.yml enabling PSDB and OSDB for amd-mainline changes 2024-12-17 11:36:22 -08:00
stali 8bcb5f7068 Enable RDC topology feature
1.Add topology APIs
2.Add topology example for topology API usage

Change-Id: Ib79c06d0bac85119672f194ba685ebf25029979c
2024-12-16 10:02:41 +08:00
Galantsev, Dmitrii 2c61dfe2ce Profiler - Remove averaging
Averaging happens very slowly and only confuses people...

Change-Id: I60754d3b896b6ffeb6104bb1c2fcc54e9869b331
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-12-11 11:58:50 -06:00
Galantsev, Dmitrii 2605eda5f3 Profiler - Fix fp64 metric
Change-Id: Iab27e21740c2c51143a9e88d085b80716bf193e2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-12-11 11:27:41 -06:00
Ranjith Ramakrishnan b778a879cb SWDEV-502603 - Use RPM_INSTALL_PREFIX variable rather than hard coded install prefix paths in RPM post/prerm scripts
Change-Id: I2699459e1e3730cf045f24f0c90e09f900701a6f
2024-12-10 21:44:09 -06:00
zichguan-amd c042b4f582 Make ROCM_DIR default ROCm path for rocprof
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2024-12-02 11:19:56 -05:00
Galantsev, Dmitrii b5272fb99c CI - Use vars instead of secrets
Change-Id: Ib917b8677c204a75bedcb345978f2b09216b115f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-26 11:17:50 -06:00
Galantsev, Dmitrii 94005119d6 CI - Add initial config
Change-Id: I02a08e3f761b7997d8835566b81654431423405d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-25 21:43:33 -06:00
Chen Gong 251fcbe49d rocprofiler: add valu utilization
SWDEV-475242

For the description of "FP32 Engine Activity" and "FP64 Engine Activity" in dcgm,
It seems that we do not have an equivalent to these pipe-utilizations on our hardware.

In rocprofiler, I think VALU Utilization is the closest to what we want.

Change-Id: Ibce8835ef4757084cdfd73258de6fc1606ca0158
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-11-21 15:24:01 +08:00
limeng12 853d3b0cc5 Backgroud health check
Add the RdcSmiHealth module, which will call rocm_smi_lib.
It will support following health:
 - XGMI error detected
 - PCIE replay count detected
 - Memory check
 - InfoROM check
 - Power/Thermal check
The grpc client and server side health function is added.
The health module is added to the rdci.

At present, XGMI/PCIE and a part of Memory have been implemented.
Others will be added as soon as possible.

Change-Id: I1bd99290bdc7dea733f21a41a8c4bcefb2138112
2024-11-19 14:00:49 +08:00
Galantsev, Dmitrii f1428a8226 Update changelog for 6.3
Change-Id: I1b2d26f1e6c7963052fb36fd6c40e3d10c22082d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Rawat, Swati <Swati.Rawat@amd.com>
2024-11-15 14:10:11 -06:00
Bill(Shuzhou) Liu 5e3ebecf80 Correct RDC_FI_PCIE_BANDWIDTH unit
The unit should be mbps instead of GB/second
2024-11-13 09:45:46 -05:00
stali d8fec06bab Enable RDCI policy subsystem
- Enable set and get for policy settings
- Enable register and clear policy events

Change-Id: If4eaaf9b80e668fb21691757210e0aa1532cecae
Signed-off-by: stali <Star.Li@amd.com>
2024-11-12 20:40:08 -06:00
Galantsev, Dmitrii e1b57c43f3 RVS - Fix cookie_t -> rdc_diag_callback_t types issue
Issue introduced in 37ddd5bf50

Change-Id: I2b6a8024d45fc44d92cf2770be9887dfc0fb3ede
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-12 10:36:52 -06:00
Galantsev, Dmitrii 4f7e441566 AMDSMI - Fix kRasErrStateStrings in tests
Change-Id: Ia9498fae215397baf7201715574954313c17da93
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii 37ddd5bf50 RVS - Report test progress in realtime
Change-Id: Id9fea71f242f372f408ecd777c030465b7ef9989
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii 9c77312c51 Finish basic logging impl
Change-Id: Ia3d6ac80f4832f1bfb63573c543659abd5f84341
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii cdf1588974 CMAKE - Find modules at build time
Change-Id: I9370ef1433579aff1a37f3636050f525638d8658
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-07 11:21:22 -06:00
Galantsev, Dmitrii dd50027748 CMAKE - Fix RVS include
Change-Id: I65095cc3d04fc2a5daeee5c809f635cb1662822f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Revert "Disable RVS as the error scares people"

This reverts commit 660c5afaf4.

Change-Id: I5086c25772444aa3bfc4c10abc1ea58d3f3f1f27
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-07 11:18:41 -06:00
Chao Fei 345ac64a43 Enable RDC policy feature
1. Add policy APIs
2. Add policy example for policy API usage

Change-Id: I14deb7c809d0b865b7bb083842092fc37868025e
Signed-off-by: Chao Fei <Chao.Fei@amd.com>
2024-10-23 20:37:27 -04:00
Li Ma 4bd31b605a SWDEV-475244 - Memory Usage and Bandwidth: memory activity
Implemented memory activity and added a new fied id
RDC_FI_GPU_MEMORY_ACTIVITY.

Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I11abe356ef6b01ce4917fd19dcc128efbc535f39
2024-10-22 11:11:31 +08:00
Li Ma b17abf93fa SWDEV-475255 - MM Engine Decoding Throughput
Implemented DEC activity for now due to ENC activity is unavailable in
amdsmi.

Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I34bb56e6e0d8d2ab91243f8932f0ac10cb2d1e9f
2024-10-18 10:01:41 +08:00
Galantsev, Dmitrii 28acbf0436 Profiler - Modify metrics
Remove occupancy metrics and replace with OccupancyPercent

Add OCCUPANCY_PERCENT which uses OccupancyPercent
Add GR_ENGINE_ACTIVE which uses GPU_UTIL/100
Add TENSOR_ACTIVE_PERCENT which uses MfmaUtil
Modify FLOPS_64 to use FP64_ACTIVE

Change-Id: I5f30d77a0c80f5ac78abd1a9e57f8a0a3c6cc00b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-10-15 19:00:30 -05:00
adapryor e20bc58b1c Add XGMI read/write sum metrics
Change-Id: I898b779ea7f5336edf0d047fb1e5d3ec40085baa
2024-10-09 17:02:55 -05:00
Galantsev, Dmitrii c40a6308c5 SWDEV-466829 - Disable ROCP when in GTest
Change-Id: I3b218fe256717c1dc9187d5f17476dfc990656c2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-26 17:00:05 -05:00
Galantsev, Dmitrii d4a868cb69 Increase MAX_NUM_DEVICES limit
Change-Id: I0cf21be156649818fd05a66928054710322b23ac
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-25 20:58:19 -05:00
Galantsev, Dmitrii 04be1211c1 README: Add known issues section
Change-Id: I298750fdafed556480271cfce31c3fc88984cf0b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-25 15:10:41 -05:00
Bill(Shuzhou) Liu 9800528c19 Update the hsaco for diagonstic on MI300X
Add hsaco for gfx940, gfx941 and gfx942

Change-Id: Ibd55fcc2d036d1190357e1e86d4e170568426d94
2024-09-17 14:15:35 -05:00
Li Ma ca569346a3 SWDEV-445415 - Pthread detach instead of pthread join
Detcah the thread which handle shutdown signals instead of joining
thread can avoid the segfault issue on specific ASIC.

Signed-off-by: Li Ma <li.ma@amd.com>
Change-Id: I74ac53c027ac370605caaa87115c83fd8027526a
2024-09-13 18:32:37 -04:00
Sam Wu b5df3a2135 Bump rocm-docs-core to 1.7.2
Update documentation requirements

Change-Id: I19cd1a96309844898e112777412e9c006a8874a0
2024-09-13 14:01:46 -06:00
Galantsev, Dmitrii 660c5afaf4 Disable RVS as the error scares people
Change-Id: I572fdb65dd8882ab4fdc1474cb39fc0e493b1eab
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-13 10:42:59 -05:00
Li Ma 183c65c8b2 SWDEV-483668 - Drop -shared-libasan flag for GCC compiler
Libasan is in gcc by default, thus building RDC with ASAN
enabled by GCC doesn't need -shared-libasan.

Change-Id: I8078f7ea5d46c6beea29c2823db3357a67f00b60
Signed-off-by: Li Ma <li.ma@amd.com>
2024-09-10 23:13:15 -04:00
Chen Gong 0cfca6d93d Implement the discovery -v command line interface
Call the previously implemented get_rdcd_version and rdc_get_smiversion

Change-Id: If76037d462fa9328c3af8c85423ee4547882e36e
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 1edd04d84e Implement the code related to the GetMixedComponentVersion()
Change-Id: I98aad97b4cb6498b7f2fc03a2d5ee7c9e949d5f1
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 8db404f84f Provide a way for rdci to get component version
For rdci, the version information of some components(such as RDCD),
cannot be obtained through the rdc_device_get_component_version API.

Here create a fake rdc API to get them.

Change-Id: I75d8bcd1993873cff209995b58362f75787a4598
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 5a3fd9fbc1 Implement rdc_device_get_component_version API related code
Implement an API to obtain the version information of the rdc calling component.
See rdc_component_t for details on available components.
It can be expanded later if necessary.

Change-Id: I03b48f774179c52c57b606704283add74ca39a02
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 69a9f24b6e Add an rdc API to get component version
Change-Id: I56250a6101debeb78628f1fd1dfff7f21c52cdc0
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 45c6d0b03b Reorganize the code path of the rdci Discovery Subsystem
Prepare for adding 'detection version information' later

Change-Id: Ib2b5e70b2360b1c5ff87a537f41f34f23c7ed61f
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 6591563d53 Add the function of outputting rdci version information
Change-Id: Iabeec48ba2e109ead7fb6fb07454ebcdc74a11e6
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 9f8d447e75 Add the function of outputting rdcd version information
Change-Id: I0572fd4b98f697660ab9099deabfd4f0fce802f3
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong ac874d3921 Get the hash value and pass it to rdcd and rdci
Want to display version information along with the hash value.

Change-Id: I0f9ad576f8f66747ce2e84d4f524ccd16d399927
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Galantsev, Dmitrii bffe4e22fa Add OAM_ID
Change-Id: I771b2f7f088940838c09ba3521a7955faa64e7ec
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-09 21:19:33 -05:00
Galantsev, Dmitrii bbe0b3573c Update python_interface and remove --enable_pci_id
Change-Id: Ie5d511f3da25221bf60bc669ab172323703a1c45
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-08-26 19:55:53 -04:00
Bill(Shuzhou) Liu 56b08ea7c3 Update the document to install the rdc service
Correct the file path of the rdc.service in the document

Change-Id: Ib161e97abdd5e2a117b2758ff5407b55337ab25b
2024-08-21 12:21:57 -05:00
Galantsev, Dmitrii c015f0fcaa INSTALL - Fix rdc groups and lock file check
This fixes issues like:

1.

  /run/lock/rdcd.lock: Bad file descriptor
  Failed to determine owner of lock file.: Numerical result out of range

2.

  rdc.service: Failed to determine group credentials: No such process
  rdc.service: Failed at step GROUP spawning rdcd: No such process

Change-Id: I0ef5eb6ab72d036a3ea8dcb81f7a9108d279f7d6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-08-12 18:54:43 -05:00