Commit Graph

16 Commitit

Tekijä SHA1 Viesti Päivämäärä
Galantsev, Dmitrii 24024f0e4f Revert "Implement CPU discovery support"
This reverts commit f967f8a17d15e148464393fcd145af01dc0e1525.
2025-04-07 20:45:19 -05:00
Yuan, Perry 3bdca8b8b6 Implement CPU discovery support (#77)
* Implement CPU discovery support

SWDEV-482949:

enable the CPU model name info support to the RDC, rdci command
can detect GPU and CPU modules at the same time.
It will query the CPU info through the amdsmi interface like below:

1 GPUs found.
-----------------------------------------------------------------
GPU Index        Device Information
0               AMD Radeon PRO W7800
=================================================================
1 CPUs found.
-----------------------------------------------------------------
CPU Index        Device Information
0               AMD Ryzen Threadripper PRO 7995WX 96-Cores
-----------------------------------------------------------------

Change-Id: Ibc6533c9a61000cd86c45b1bae14c3eb6788c119
Signed-off-by: Perry Yuan <perry.yuan@amd.com>

* CMAKE - Add required version for amdsmi

Change-Id: I341a89351d196ec66cce215a5d1d3953302fcc66
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

---------

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-31 10:58:36 +08:00
Pryor, Adam 47692d3ed5 [SWDEV-498711] RDC Partition Implementation (#119)
* [SWDEV-498711] RDC Partition Implementation

Change-Id: Ibfc3709793770537e4c9d36458f34c6b4f461724
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-03-27 14:10:11 -05:00
stali 29b6699b62 Enable RDC link Status feature
1.add link status APIs
   2.Add link status example for link status API usage
2024-12-23 09:30:21 +08:00
Adam Pryor df170c8801 Implementation for SWDEV-479728:[RDC] - Clock Speed/Power Cap Control
Change-Id: I767a71325527aa3c691e9607953ceafebacfb4d5
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-12-20 16:03:33 -06:00
stali 8bcb5f7068 Enable RDC topology feature
1.Add topology APIs
2.Add topology example for topology API usage

Change-Id: Ib79c06d0bac85119672f194ba685ebf25029979c
2024-12-16 10:02:41 +08:00
limeng12 853d3b0cc5 Backgroud health check
Add the RdcSmiHealth module, which will call rocm_smi_lib.
It will support following health:
 - XGMI error detected
 - PCIE replay count detected
 - Memory check
 - InfoROM check
 - Power/Thermal check
The grpc client and server side health function is added.
The health module is added to the rdci.

At present, XGMI/PCIE and a part of Memory have been implemented.
Others will be added as soon as possible.

Change-Id: I1bd99290bdc7dea733f21a41a8c4bcefb2138112
2024-11-19 14:00:49 +08:00
Galantsev, Dmitrii 9c77312c51 Finish basic logging impl
Change-Id: Ia3d6ac80f4832f1bfb63573c543659abd5f84341
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-07 11:21:22 -06:00
Chao Fei 345ac64a43 Enable RDC policy feature
1. Add policy APIs
2. Add policy example for policy API usage

Change-Id: I14deb7c809d0b865b7bb083842092fc37868025e
Signed-off-by: Chao Fei <Chao.Fei@amd.com>
2024-10-23 20:37:27 -04:00
Chen Gong 1edd04d84e Implement the code related to the GetMixedComponentVersion()
Change-Id: I98aad97b4cb6498b7f2fc03a2d5ee7c9e949d5f1
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Chen Gong 5a3fd9fbc1 Implement rdc_device_get_component_version API related code
Implement an API to obtain the version information of the rdc calling component.
See rdc_component_t for details on available components.
It can be expanded later if necessary.

Change-Id: I03b48f774179c52c57b606704283add74ca39a02
Signed-off-by: Chen Gong <curry.gong@amd.com>
2024-09-10 10:06:44 -05:00
Galantsev, Dmitrii 434e40305d LINT: Add cpplint, clang-format and pre-commit support
Change-Id: I3cbb787ef27d90486b212dfb1a8c77c460acc2ac
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-01-09 11:37:11 -06:00
Bill(Shuzhou) Liu 76ccf58008 Add the RdcSmiDiagnostic module
Provides a RdcSmiDiagnostic module, which will call rocm_smi_lib.

It will support following diagnostics: Get GPU Topology, Check GPU
parameters and check processes running on the GPUs.

The grpc client and server side diagnostics function is added.

The diag module is added to the rdci.

Change-Id: I10a0cf3c20556a61373ab686f82cae75acaa40dd
2021-07-26 14:56:17 -04:00
Bill(Shuzhou) Liu fe3e75edfa Implement the gRPC APIs for the job stats
Add the job stats APIs in the rdc_api_service at the server side rdcd
Add the job stats APIs for the RdcStandaloneHandler at the client side
Make the load librdc.so and librdc_client.so thread safe.
Impelement async update all fields in RdcEmbeddedHandler.

Change-Id: I659d91efb32d1094d3b7f0f2cec39518cd7336ce
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 7ee29b6cdd Implement the APIs for gRPC calls in client/server
Implement the APIs defined in the RdcStandaloneHandler to make gRPC call to daemon

Implement the APIs defined in the RdcAPIServiceImpl to handle the gRPC calls in daemon

Add two APIs to get all GPU groups and field groups: rdc_group_get_all_ids()
and rdc_group_field_all_ids()
Those two APIs are required by the rdci group and fieldgroup
sub-modules.

Change-Id: I066091423146dea180c16af212688ed43dc44611
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 020f6939f7 SWDEV-209060 - Create the Skeleton RDC CLI and daemon
Create the skeleton implementation of rdc_client.so and rdci. Modify current rdcd to
integrate the RDC API service:

rdc.proto is changed to add a new RdcAPI service which defined the interfaces for the RDC API.

RdcStandaloneHandler.cpp is added to send the request using gRPC to the rdcd. It is built into
the rdc_client.so

rdci.cc, RdciDisCoverySubSystem.cc and RdciSubSystem.cc are added to implement skeleton rdci.
Currently, the discovery subsystem is supported.

rdc_api_service.cc is added to the server as a skeleton to implement the RdcAPI service. Currently,
only discovery API is implemented. Note: we disabled the rdc_rsmi_service, which will be removed
in the future. The original rdc_client.so is renamed to rdc_client_smi.so which should also be
removed in the future.

Add the instruction how to run the rdcd and rdci in the build folder in the README.md.

Change-Id: Id232f9f83787e5812d4a295dc8cf0daa7728b06c
2020-08-17 14:07:25 -05:00