Files
rocm-systems/projects/rdc/python_binding
Bill(Shuzhou) Liu d53a9c4e21 RDC Prometheus plugin return errors when use the --rdc_gpu_indexes
When above option is used, the plugin returns errors:
  result = rdc.rdc_group_gpu_add(rdc_handle, gpu_group_id, gpu)
  ctypes.ArgumentError: argument 3: <type 'exceptions.TypeError'>: wrong type

The rdc_prometheus.py is changed to convert string to integer.
The RdcUtil.py is also changed to raise Exception properly.

Change-Id: I9535091ff1fc8882cccd32e5f2810da5241768c3


[ROCm/rdc commit: 7ca7a571a7]
2021-02-23 14:15:04 -05:00
..
2020-08-17 14:09:37 -05:00
2020-12-01 10:56:36 -05:00
2020-11-10 14:26:49 -05:00
2020-11-10 14:26:49 -05:00
2020-08-17 14:09:37 -05:00
2020-08-17 14:09:37 -05:00

  • Quick start If you do not have the RDC installed, please specify the RDC library path using: export LD_LIBRARY_PATH=<rdc_libs_path>

Then you can run RdcReader in python_binding folder: python RdcReader.py

  • Prometheus plugin Install the prometheus_client: % pip install prometheus_client

Start the rdcd with auth and then run plugin to connect to it: % python rdc_prometheus.py

Check the options of the plugin: % python rdc_prometheus.py --help

Verify the plugin is running: % curl localhost:5000

In the managment computer, install the Prometheus from https://github.com/prometheus/prometheus

Modify the file prometheus_targets.json to add the compute nodes running the plugin. Start the Prometheus % prometheus --config.file=

Browse to localhost:9090 in the managment computer for metrics from RDC.