Add a new option --enable_pci_id to Prometheus plugin, which will map
the GPU index to the PCI Device Identifier.
Change-Id: I38a2a7e4841975da095391002397d4515ffb8e0d
[ROCm/rdc commit: 23ab2c0671]
When above option is used, the plugin returns errors:
result = rdc.rdc_group_gpu_add(rdc_handle, gpu_group_id, gpu)
ctypes.ArgumentError: argument 3: <type 'exceptions.TypeError'>: wrong type
The rdc_prometheus.py is changed to convert string to integer.
The RdcUtil.py is also changed to raise Exception properly.
Change-Id: I9535091ff1fc8882cccd32e5f2810da5241768c3
[ROCm/rdc commit: 7ca7a571a7]
The new raslib fields are added to RDC for dmon.
* The rdc_field.data, rdc.h and rdc_bootstrap.py are changed
for new fields.
* The RDC_FI_ECC_CORRECT_TOTAL and RDC_FI_ECC_UNCORRECT_TOTAL are
removed from RdcSmiLib.cc, and will be gotten from raslib.
Change-Id: I4ee016e3d52e9d38b54406ca129da511f741c6d6
[ROCm/rdc commit: 81ad23343c]
The python script will search list of the installation folders to
find the librdc_bootstrap.so.
Change-Id: I52e444e6d153c318c731c4b2cd0d8e39b0fd31ca
[ROCm/rdc commit: 4b3dbc4697]
Also:
* print header line every 50 line on output
* print events that are being listened for with header
* cpplint clean-up
Change-Id: Ic049eb79156a9528b556e56f0fa43e1344f898cc
[ROCm/rdc commit: b278cd379b]
Two files are added to the python_binding folder:
* The rdc_collectd.py is a collectd plugin to store the RDC
metrics to the collectd round robin database.
* The rdc_collectd.conf is a configure file which can control
which fields to collect, how frequently the fields can be collected
and run the plugin in embedded mode.
Change-Id: Ief44d004376ca8a82ed0d8ad36805243acb47080
[ROCm/rdc commit: bb6d98b036]
A new Grafana dashboard file rdc_grafana_dashboard_example.json
has been added to the folder python_binding. User can import
this dashboard to monitor multiple compute nodes.
To display the host name only in the dashboard, the
rdc_prometheus_example.yml is also changed to create a new label
short_instance which will not have the port number.
Change-Id: I9ab91838006d59c8dcb5fea01decb8c799484e1d
[ROCm/rdc commit: aeba7b0f91]
The framework now supports watch() and unwatch(), which can be used
by the telemetry library to init events or pre-fetch fields when recording
starts.
* A new header file RdcTelemetryLibInterface.h is defined for library to
include it.
* The RdcWatchTable will not talk to RdcMetricFetcher directly anymore.
It will call the framework watch/unwatch to dispatch it to the libraries.
* Make the python binding consistent with the current code.
Change-Id: Ie5731d920ed5928f901369d60c23bd450807a562
[ROCm/rdc commit: 151520b97e]
The rdc_prometheus.py is a Prometheus plugin for RDC
The rdc_prometheus_example.yml and prometheus_targets.json are
example Prometheus configuration. If there are multiple compute
nodes, they can be defined at prometheus_targets.json.
Change-Id: I3611b1e8a166f6608351f6e7644808bf72a4d3a0
[ROCm/rdc commit: 9c7a1347ea]
A new folder python_binding is created for RDC python binding:
* The rdc_bootstrap.py is a python ctypes wrapper for the librdc_boostrap.so
* The RdcUtil.py defines common utilities for RDC to manage group/fieldgroup
* The RdcReader.py is a class to simplify the usage of the RDC:
- The user only needs to specify which fields he wants to monitoring.
RdcReader will create groups and fieldgroups, watch the fields, and fetch the fields.
- The RdcReader can support embedded and standalone mode.
- The standalone can be with authentication and without authentication.
- In standalone mode, the RdcReader can automatically reconnect to the rdcd when the connection is lost.
- When rdcd is restarted, the previously created group and fieldgroup may lose.
The RdcReader can re-create them and watch the fields after reconnect.
- If the client is restarted, RdcReader can detect the groups and fieldgroups
created before and avoid re-create them.
- The user can pass the unit converter if he does not want to use RDC default unit.
Change-Id: I109ec86012f37162eb13f7d3e921115b7dd82369
[ROCm/rdc commit: 9209c6c516]