When librocm-smi is pulled through a dependency, we may end up on a system
without actual hardware supported by ROCM, and rsmi_init() failing is
actually expected, we do want to frighten the user in such a case.
[ROCm/rocm_smi_lib commit: 8ca4207d5c]
liboam.a was missing in static rocm-smi package and resulting in compilation error on appliction that use rocm-smi
[ROCm/rocm_smi_lib commit: 59468e3f78]
Changes: - Updates to APIs to handle null pointers or RSMI_STATUS_NOT_SUPPORTED
- Fixes to tests to handle partitioned configurations correctly
- Synced with latest AMD SMI API changes
Change-Id: I7a932f9336ef29ccb01d3b15e2101f6136b45720
[ROCm/rocm_smi_lib commit: 12b78439d2]
Updated:
- Removed backwards compatibility for jpeg_activity/vcn_activity
- On supported ASICs users can use XCP (partition) stat values:
jpeg_busy and vcn_busy
Change-Id: I78c403f8462668738ec57cac12b107f6a3989b18
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 1c6b2adae7]
[SWDEV-523359] fan_read_write: Add set fan speed validation check.
- Handled NOT_SUPPORTED status which previously caused rsmitst to false fail
- Added continute statement to proceed with rest of FanReadWrite test.
- fixed spacing line 140
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
[ROCm/rocm_smi_lib commit: ac31c6e576]
* [SWDEV-531834] Fix AMD GPUs visible, but data is inaccessible:
- Scans directories under /sys/bus/pci/drivers/amdgpu
- Verifies each device's runtime_status to determine if it's active
- Returns False if any device is not in active state
- Handles permission errors gracefully with proper debug logging
- Includes comments explaining behavior differences between Instinct / NAVI hardware
The default status is set to True, assuming devices are active unless
proven otherwise, which accommodates hardware like some Instinct ASICS
which do not support runtime power management.
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
[ROCm/rocm_smi_lib commit: 47f80145cb]
Added libdrm/libdrm_amdgpu to the package requires/depends list and removed the same from suggests list.
The rocm smi header files are using drm.h
[ROCm/rocm_smi_lib commit: 6d53d9f9cf]
Changes:
- Unique Id tries reading from KGD
-> falls back to use KFD if not found
Change-Id: I8fb8f38df5db7413805f4a20621ad12ed3fc89a3
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 4276207ff8]
* Changes:
- Updates to DRM renderD* / card* pathing for partition devices
- Now use KFD to discover AMD devices and populate accordingly
Device MUST have an accessible KFD node (via cgroups)
- Updated several ROCm SMI CLI outputs to handle SYSFS files
which are not accessible on partition nodes
- Added a new method to help get card/drm info
(rsmi_dev_device_identifiers_get) from ROCm SMI
Change-Id: If844f27ffc595942272abe9c8167ed90a0b0e225
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: a0df877fdf]
rocm-smi is not working in mGPU, Blocking DLM tests
Updates include:
- Creating check_runtime_status function to check for device status of active.
- Added warning to users that No AMD GPUs are available, check power status/control.
- Added check for empty string coming from HWMON, if emtpy returns unexpected data.
---------
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
[ROCm/rocm_smi_lib commit: 2630bf0a8c]
* SWDEV-518214: GPU Metrics 1.8 (#31)
- Updates:
- Adding the following metrics to allow new calculations for violation status:
- Per XCP metrics gfx_below_host_limit_ppt_acc
- Per XCP metrics gfx_below_host_limit_thm_acc
- Per XCP metrics gfx_low_utilization_acc
- Per XCP metrics gfx_below_host_limit_total_acc
- Increasing available JPEG engines to 40. Current ASICs may not support all 40. These will be indicated as UINT16_MAX or N/A in CLI.
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: f69e65f7bd]
Memory partition test was changing original compute partiton based
on default compute mode. Corrected this to set back to original
compute partition.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/rocm_smi_lib commit: d8de415960]
Guest: Tests needed to account for not supporting changing compute
partitions.
BM: Tests need to account for invalid responses from Driver (due to
static CPX config).
[ROCm/rocm_smi_lib commit: 967493c39a]
Changes: - Fixed Device Name (market name)
- Added new API rsmi_dev_market_name_get()
- Updated tests
- Updated amdgpu_drm.h to match latest mainline kernel
- Fixed subsystem ID to only show hex value (not subsystem name)
- rocm_smi_lib now has a recommended requirement for libdrm
Change-Id: Ic438529e16c8c3dbbdd620da664918148c40c997
[ROCm/rocm_smi_lib commit: b951a65cf2]
1) When `clang` is used as system compiler, libraries were built without respecting LDFLAGS. For example, this affected LTO flags, if any (and it only affected clang, not gcc).
2) Linker flags are registered as CXX flags, which produces warnings during compilation:
```
clang++: warning: -Wl,-z,noexecstack: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-znoexecheap: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,relro: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,now: 'linker' input unused [-Wunused-command-line-argument]
```
3) Clang does not support `-Wtrampolines` flag:
```
warning: unknown warning option '-Wtrampolines' [-Wunknown-warning-option]
```
4) No linkers support `noexecheap` anymore. `noexecheap` linker flag was a part of PaX patches to GNU ld, (which were dropped in 2017)[https://www.gentoo.org/support/news-items/2017-08-19-hardened-sources-removal.html]. Now ld/ld.lld/ld.gold don't support it and protection of heap is managed by NX bit. Therefore every compiler produces this warning:
```
ld.lld: warning: unknown -z value: noexecheap
```
Closes#210.
Co-authored-by: Sv. Lockal <lockalsash@gmail.com>
[ROCm/rocm_smi_lib commit: 59cbeb57d1]
- Fixes the broken links in rocm_smi.h
- Uses automodule instead of autofunction in docs/reference/python_api.rst
- Fixes some warnings during docs build
- Update some of the versions in requirements.txt
[ROCm/rocm_smi_lib commit: fc61e40506]