Gráfico de commits

8 Commits

Autor SHA1 Mensaje Fecha
adapryor e847f74f78 Fix Prometheus counters
default to gauage

Change-Id: Ia0428e61f023f10b02b3ebe103870d40c057abe3

Change values in question to gauges

Change-Id: I81c91c880246342a0ad0586f6dbe50b247a01117

fixes

Change-Id: I949438d3d3b511c22649640e082b59a3fb7696e0

Fix info handling

Change-Id: I8091fbfa55ba5a9c21c4569dd40e37fb432924f3

fix default

Change-Id: Ia449fed18730a06a858107e9218dc7b443a681fb
2025-03-07 20:48:11 +00:00
Galantsev, Dmitrii 7c91a07a43 Profiler - Migrate from rocprofv1 to rocprofv3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Fixed RDC for Rocprofv3

Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Ic9162bacf1322b265e6bbcdd9fbb9b1fdef414fd

last updates

Change-Id: I12e168501327c5e4cff8a9273b0512fb0e098fe7

comment

Change-Id: I61da61e66dcc017ec46f98ff4c90fb064c9679e8
2024-12-20 15:39:02 -06:00
Galantsev, Dmitrii bbe0b3573c Update python_interface and remove --enable_pci_id
Change-Id: Ie5d511f3da25221bf60bc669ab172323703a1c45
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-08-26 19:55:53 -04:00
Galantsev, Dmitrii 9702d0f2d7 SWDEV-439576 - rocmsmi -> amdsmi
- Migrate to amdsmi library
- NOTE: raslib still uses rocmsmi
- Remove unused rocmsmi service
- Remove unused RDC client code
- Remove RSMI calls from protos/rdc.proto

Change-Id: Ifc34a264c506b0ec5792307ee56b34526268762d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-04-09 20:19:28 -05:00
Bill(Shuzhou) Liu 5cfe2b4169 Fallback to junction temperature and socket power
If the card does not have edge temperature, fallback to junction
temperature. If the card only have socket power, then use socket
power instead.

Change-Id: I053a67a89cf3b29a34e82123f522c08d7dd68916
2024-02-05 10:10:26 -06:00
Bill(Shuzhou) Liu 23ab2c0671 Identify GPUs using PCI device identifier in RDC Prometheus plugin
Add a new option --enable_pci_id to Prometheus plugin, which will map
the GPU index to the PCI Device Identifier.

Change-Id: I38a2a7e4841975da095391002397d4515ffb8e0d
2022-05-05 09:16:05 -04:00
Bill(Shuzhou) Liu 7ca7a571a7 RDC Prometheus plugin return errors when use the --rdc_gpu_indexes
When above option is used, the plugin returns errors:
  result = rdc.rdc_group_gpu_add(rdc_handle, gpu_group_id, gpu)
  ctypes.ArgumentError: argument 3: <type 'exceptions.TypeError'>: wrong type

The rdc_prometheus.py is changed to convert string to integer.
The RdcUtil.py is also changed to raise Exception properly.

Change-Id: I9535091ff1fc8882cccd32e5f2810da5241768c3
2021-02-23 14:15:04 -05:00
Bill(Shuzhou) Liu 9c7a1347ea RDC Prometheus plugin
The rdc_prometheus.py is a Prometheus plugin for RDC
The rdc_prometheus_example.yml and prometheus_targets.json are
example Prometheus configuration. If there are multiple compute
nodes, they can be defined at prometheus_targets.json.

Change-Id: I3611b1e8a166f6608351f6e7644808bf72a4d3a0
2020-08-17 14:09:37 -05:00