Граф коммитов

9 Коммитов

Автор SHA1 Сообщение Дата
Dmitrii 0575606e49 chore: [rdc] Add copyright notice (#1098) 2025-09-24 09:07:20 -07:00
adapryor 7113c62704 Fix Prometheus counters
default to gauage

Change-Id: Ia0428e61f023f10b02b3ebe103870d40c057abe3

Change values in question to gauges

Change-Id: I81c91c880246342a0ad0586f6dbe50b247a01117

fixes

Change-Id: I949438d3d3b511c22649640e082b59a3fb7696e0

Fix info handling

Change-Id: I8091fbfa55ba5a9c21c4569dd40e37fb432924f3

fix default

Change-Id: Ia449fed18730a06a858107e9218dc7b443a681fb


[ROCm/rdc commit: e847f74f78]
2025-03-07 20:48:11 +00:00
Galantsev, Dmitrii 755ae0ee5d Profiler - Migrate from rocprofv1 to rocprofv3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

Fixed RDC for Rocprofv3

Updates

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Ic9162bacf1322b265e6bbcdd9fbb9b1fdef414fd

last updates

Change-Id: I12e168501327c5e4cff8a9273b0512fb0e098fe7

comment

Change-Id: I61da61e66dcc017ec46f98ff4c90fb064c9679e8


[ROCm/rdc commit: 7c91a07a43]
2024-12-20 15:39:02 -06:00
Galantsev, Dmitrii 3dd90a6ff2 Update python_interface and remove --enable_pci_id
Change-Id: Ie5d511f3da25221bf60bc669ab172323703a1c45
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: bbe0b3573c]
2024-08-26 19:55:53 -04:00
Galantsev, Dmitrii 028355dff0 SWDEV-439576 - rocmsmi -> amdsmi
- Migrate to amdsmi library
- NOTE: raslib still uses rocmsmi
- Remove unused rocmsmi service
- Remove unused RDC client code
- Remove RSMI calls from protos/rdc.proto

Change-Id: Ifc34a264c506b0ec5792307ee56b34526268762d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 9702d0f2d7]
2024-04-09 20:19:28 -05:00
Bill(Shuzhou) Liu d1efa59fe8 Fallback to junction temperature and socket power
If the card does not have edge temperature, fallback to junction
temperature. If the card only have socket power, then use socket
power instead.

Change-Id: I053a67a89cf3b29a34e82123f522c08d7dd68916


[ROCm/rdc commit: 5cfe2b4169]
2024-02-05 10:10:26 -06:00
Bill(Shuzhou) Liu e16a8bcaf5 Identify GPUs using PCI device identifier in RDC Prometheus plugin
Add a new option --enable_pci_id to Prometheus plugin, which will map
the GPU index to the PCI Device Identifier.

Change-Id: I38a2a7e4841975da095391002397d4515ffb8e0d


[ROCm/rdc commit: 23ab2c0671]
2022-05-05 09:16:05 -04:00
Bill(Shuzhou) Liu d53a9c4e21 RDC Prometheus plugin return errors when use the --rdc_gpu_indexes
When above option is used, the plugin returns errors:
  result = rdc.rdc_group_gpu_add(rdc_handle, gpu_group_id, gpu)
  ctypes.ArgumentError: argument 3: <type 'exceptions.TypeError'>: wrong type

The rdc_prometheus.py is changed to convert string to integer.
The RdcUtil.py is also changed to raise Exception properly.

Change-Id: I9535091ff1fc8882cccd32e5f2810da5241768c3


[ROCm/rdc commit: 7ca7a571a7]
2021-02-23 14:15:04 -05:00
Bill(Shuzhou) Liu b91560f0a8 RDC Prometheus plugin
The rdc_prometheus.py is a Prometheus plugin for RDC
The rdc_prometheus_example.yml and prometheus_targets.json are
example Prometheus configuration. If there are multiple compute
nodes, they can be defined at prometheus_targets.json.

Change-Id: I3611b1e8a166f6608351f6e7644808bf72a4d3a0


[ROCm/rdc commit: 9c7a1347ea]
2020-08-17 14:09:37 -05:00