Files
rocm-systems/example
limeng12 853d3b0cc5 Backgroud health check
Add the RdcSmiHealth module, which will call rocm_smi_lib.
It will support following health:
 - XGMI error detected
 - PCIE replay count detected
 - Memory check
 - InfoROM check
 - Power/Thermal check
The grpc client and server side health function is added.
The health module is added to the rdci.

At present, XGMI/PCIE and a part of Memory have been implemented.
Others will be added as soon as possible.

Change-Id: I1bd99290bdc7dea733f21a41a8c4bcefb2138112
2024-11-19 14:00:49 +08:00
..
2024-11-19 14:00:49 +08:00
2024-11-19 14:00:49 +08:00
2024-10-23 20:37:27 -04:00
2022-10-27 13:49:54 -05:00
2024-06-06 01:51:39 -05:00

Examples

How to compile examples?

NOTE: You have to have RDC installed somewhere.

If you have rocm (and RDC) installed under /opt/rocm - then you can simply do:

# same as 'mkdir -p build; cd build; cmake ../; cd ../'
cmake -B build
# same as 'cd build; make; cd ../'
make -C build

If you have rocm installed under a different directory, then you will have to add that path with one of the following ways:

  • cmake -DROCM_DIR=/custom/rocm/path -B build
  • ROCM_PATH=/custom/rocm/path cmake -B build

followed by make -C build

You can also set ROCM_PATH environment variable.

I can't find rdc!

  • Is RDC installed?
  • Is RDC installed under /opt/rocm?
  • Can you find /opt/rocm/lib/cmake/rdc/rdcTargets.cmake?

Where is rdc?

ldd build/diagnostic

Look for librdc_bootstrap.so

diagnostic is halted, but other examples work

Did you wait long enough?

It takes a while to run. 46 seconds on my machine with 2 GPUs.

Couldn't find the platform configure..

Couldn't find the config for the Device...

That's probably ok. The examples will still run.

Try to cd into the config directory and call these examples from there.