Backgroud health check
Add the RdcSmiHealth module, which will call rocm_smi_lib. It will support following health: - XGMI error detected - PCIE replay count detected - Memory check - InfoROM check - Power/Thermal check The grpc client and server side health function is added. The health module is added to the rdci. At present, XGMI/PCIE and a part of Memory have been implemented. Others will be added as soon as possible. Change-Id: I1bd99290bdc7dea733f21a41a8c4bcefb2138112
This commit is contained in:
کامیت شده توسط
Meng, Li (Jassmine)
والد
f1428a8226
کامیت
853d3b0cc5
@@ -163,6 +163,14 @@ class rdc_field_t(c_int):
|
||||
RDC_EVNT_NOTIF_PRE_RESET = 2002
|
||||
RDC_EVNT_NOTIF_POST_RESET = 2003
|
||||
RDC_EVNT_NOTIF_RING_HANG = 2004
|
||||
RDC_HEALTH_XGMI_ERROR = 3000
|
||||
RDC_HEALTH_PCIE_REPLAY_COUNT = 3001
|
||||
RDC_HEALTH_RETIRED_PAGE_NUM = 3002
|
||||
RDC_HEALTH_PENDING_PAGE_NUM = 3003
|
||||
RDC_HEALTH_RETIRED_PAGE_LIMIT = 3004
|
||||
RDC_HEALTH_UNCORRECTABLE_PAGE_LIMIT = 3005
|
||||
RDC_HEALTH_POWER_THROTTLE_TIME = 3006
|
||||
RDC_HEALTH_THERMAL_THROTTLE_TIME = 3007
|
||||
|
||||
rdc_handle_t = c_void_p
|
||||
rdc_gpu_group_t = c_uint32
|
||||
|
||||
مرجع در شماره جدید
Block a user