Граф коммитов

7 Коммитов

Автор SHA1 Сообщение Дата
Galantsev, Dmitrii 38c60ff90b RVS: Finish initial RVS integration
NOTE: RVS Build is disabled by default due to CI build issues.

Change-Id: I1593f0fe22075a9f86f54afa3ac151e109f1f7bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: eaa1862a80]
2024-01-10 00:27:04 -06:00
Galantsev, Dmitrii ea624cbb7c LINT: Add cpplint, clang-format and pre-commit support
Change-Id: I3cbb787ef27d90486b212dfb1a8c77c460acc2ac
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 434e40305d]
2024-01-09 11:37:11 -06:00
Bill(Shuzhou) Liu fa9c6ad6f8 Add the RdcSmiDiagnostic module
Provides a RdcSmiDiagnostic module, which will call rocm_smi_lib.

It will support following diagnostics: Get GPU Topology, Check GPU
parameters and check processes running on the GPUs.

The grpc client and server side diagnostics function is added.

The diag module is added to the rdci.

Change-Id: I10a0cf3c20556a61373ab686f82cae75acaa40dd


[ROCm/rdc commit: 76ccf58008]
2021-07-26 14:56:17 -04:00
Bill(Shuzhou) Liu 7d7a5bfd1c Disable bulk fetch. Add environment variable to enable it
RDC can optimize by bulk fetching multiple metrics using a single
rocm_smi call. However, currently this is not completely supported in
all ASIC generations. By default disable this for now.

Set environment variable RDC_BULK_FETCH_ENABLED=TRUE to enable
RDC bulk fetch.

BUG: SWDEV-289316

Change-Id: Ibb55514f198356dccf5f47bb0fd2d53c17acb251


[ROCm/rdc commit: 673f5a4ee1]
2021-06-09 15:53:17 -04:00
Bill(Shuzhou) Liu f504f697e3 The Diagnostic API interface
The API interface defines how the caller will use the API. An
example also shows how the API can be used.
It also defines the RdcDiagnostic module which can load the
library dynamically and then dispatch diagnostic test to run.

Change-Id: I1e041aab86f7e19338860f5ba65262977f4ea9cb


[ROCm/rdc commit: eab3625d65]
2021-05-27 10:59:11 -04:00
Chris Freehill 79b5e54d3b Add event notification support and rdci timestamps
Also:
* print header line every 50 line on output
* print events that are being listened for with header
* cpplint clean-up

Change-Id: Ic049eb79156a9528b556e56f0fa43e1344f898cc


[ROCm/rdc commit: b278cd379b]
2020-11-22 07:10:39 -05:00
Bill(Shuzhou) Liu 32e91cf8d2 RDC module framework
The framework is required for RAS integration. When the RAS fields
need to be retrieved, the framework will load the RAS library at run time,
and then call the RAS function to retrieve RAS metrics.

* The RdcModuleMgr will be used to manage different modules. RDC
  only has the telemetry module now.
* When RDCTelemetryModule is loaded, it will load the RAS library.
  It will also call rdc_telemetry_fields_query() defined in the RAS
  library for the list of fields RAS supported.
* The RdcSmiLib is a wrapper for the rocm_msi_lib to provide the
  interface required by the RDCTelemetryModule.
* The RdcWatchTable will use the RdcModuleMgr to get the
  RDCTelemetryModule to bulk fetch mulitple fields.
* The RdcTelemetryModule will dispatch those fields to different
  library: RdcSmiLib or RdcRasLib.

The watch() and unwatch() in the RDCTelemetryModule will been implemented
at the next task.

Change-Id: I81b01d5b52d1ea3cdcec7c09af86b6622dd5899e


[ROCm/rdc commit: ba35cdcfe2]
2020-09-02 14:46:40 -04:00