* SWDEV-518214: GPU Metrics 1.8 (#31)
- Updates:
- Adding the following metrics to allow new calculations for violation status:
- Per XCP metrics gfx_below_host_limit_ppt_acc
- Per XCP metrics gfx_below_host_limit_thm_acc
- Per XCP metrics gfx_low_utilization_acc
- Per XCP metrics gfx_below_host_limit_total_acc
- Increasing available JPEG engines to 40. Current ASICs may not support all 40. These will be indicated as UINT16_MAX or N/A in CLI.
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: f69e65f7bd]
Memory partition test was changing original compute partiton based
on default compute mode. Corrected this to set back to original
compute partition.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/rocm_smi_lib commit: d8de415960]
Guest: Tests needed to account for not supporting changing compute
partitions.
BM: Tests need to account for invalid responses from Driver (due to
static CPX config).
[ROCm/rocm_smi_lib commit: 967493c39a]
Changes: - Fixed Device Name (market name)
- Added new API rsmi_dev_market_name_get()
- Updated tests
- Updated amdgpu_drm.h to match latest mainline kernel
- Fixed subsystem ID to only show hex value (not subsystem name)
- rocm_smi_lib now has a recommended requirement for libdrm
Change-Id: Ic438529e16c8c3dbbdd620da664918148c40c997
[ROCm/rocm_smi_lib commit: b951a65cf2]
1) When `clang` is used as system compiler, libraries were built without respecting LDFLAGS. For example, this affected LTO flags, if any (and it only affected clang, not gcc).
2) Linker flags are registered as CXX flags, which produces warnings during compilation:
```
clang++: warning: -Wl,-z,noexecstack: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-znoexecheap: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,relro: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,now: 'linker' input unused [-Wunused-command-line-argument]
```
3) Clang does not support `-Wtrampolines` flag:
```
warning: unknown warning option '-Wtrampolines' [-Wunknown-warning-option]
```
4) No linkers support `noexecheap` anymore. `noexecheap` linker flag was a part of PaX patches to GNU ld, (which were dropped in 2017)[https://www.gentoo.org/support/news-items/2017-08-19-hardened-sources-removal.html]. Now ld/ld.lld/ld.gold don't support it and protection of heap is managed by NX bit. Therefore every compiler produces this warning:
```
ld.lld: warning: unknown -z value: noexecheap
```
Closes#210.
Co-authored-by: Sv. Lockal <lockalsash@gmail.com>
[ROCm/rocm_smi_lib commit: 59cbeb57d1]
- Fixes the broken links in rocm_smi.h
- Uses automodule instead of autofunction in docs/reference/python_api.rst
- Fixes some warnings during docs build
- Update some of the versions in requirements.txt
[ROCm/rocm_smi_lib commit: fc61e40506]
The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/rocm_smi_lib commit: 6337f7b05b]
- To address https://github.com/ROCm/rocm_smi_lib/issues/208
where use of fake BDFs for partitions can cause confusion. This note
is already in the comments of the function definition, but was not
updated in the function declaration.
- Fix broken formatting for the location table for PCIE coordinate fields
- Tracked in SWDEV-501108
Change-Id: Ic85439866cb836bb43acc52314a7f1d026c3215d
[ROCm/rocm_smi_lib commit: fdc8d73c64]
Changes:
- Added new GPU metrics:
1) XGMI link status - Up/Down; 1 = up; 0 = down
2) Graphics clocks below host limit (per XCP)
accumulators -> used to help calculate a violation status
3) VRAM max bandwidth at max memory clock
- Updated rocm-smi --showmetrics to include new metrics.
Units/values reflect as indicated by driver, may differ
from AMD SMI or other ROCm SMI interfaces which
use these fields.
- N/A fields means the device does not support providing this
data.
Change-Id: I17b313345f15070a76b3a30dd8d5645d212d601b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 88a7e4b8ad]
Guest: Tests needed to account for not supporting changing compute
partitions.
BM: Tests need to account for invalid responses from Driver (due to
static CPX config).
Change-Id: I09ccee981c6b73684b64e5053068920a6c1b6439
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 23e945c6b3]
Memory partition test was changing original compute partiton based
on default compute mode. Corrected this to set back to original
compute partition.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/rocm_smi_lib commit: 1d6b8d9422]
Guest: Tests needed to account for not supporting changing compute
partitions.
BM: Tests need to account for invalid responses from Driver (due to
static CPX config).
[ROCm/rocm_smi_lib commit: 1dd9ca9df4]
Changes: - Fixed Device Name (market name)
- Added new API rsmi_dev_market_name_get()
- Updated tests
- Updated amdgpu_drm.h to match latest mainline kernel
- Fixed subsystem ID to only show hex value (not subsystem name)
- rocm_smi_lib now has a recommended requirement for libdrm
Change-Id: Ic438529e16c8c3dbbdd620da664918148c40c997
[ROCm/rocm_smi_lib commit: 6a5e94c451]
1) When `clang` is used as system compiler, libraries were built without respecting LDFLAGS. For example, this affected LTO flags, if any (and it only affected clang, not gcc).
2) Linker flags are registered as CXX flags, which produces warnings during compilation:
```
clang++: warning: -Wl,-z,noexecstack: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-znoexecheap: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,relro: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,now: 'linker' input unused [-Wunused-command-line-argument]
```
3) Clang does not support `-Wtrampolines` flag:
```
warning: unknown warning option '-Wtrampolines' [-Wunknown-warning-option]
```
4) No linkers support `noexecheap` anymore. `noexecheap` linker flag was a part of PaX patches to GNU ld, (which were dropped in 2017)[https://www.gentoo.org/support/news-items/2017-08-19-hardened-sources-removal.html]. Now ld/ld.lld/ld.gold don't support it and protection of heap is managed by NX bit. Therefore every compiler produces this warning:
```
ld.lld: warning: unknown -z value: noexecheap
```
Closes#210.
Co-authored-by: Sv. Lockal <lockalsash@gmail.com>
[ROCm/rocm_smi_lib commit: 4dbc2b6d57]
- Fixes the broken links in rocm_smi.h
- Uses automodule instead of autofunction in docs/reference/python_api.rst
- Fixes some warnings during docs build
- Update some of the versions in requirements.txt
[ROCm/rocm_smi_lib commit: e5c5a1d5b7]
The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/rocm_smi_lib commit: 94dca7073b]
- To address https://github.com/ROCm/rocm_smi_lib/issues/208
where use of fake BDFs for partitions can cause confusion. This note
is already in the comments of the function definition, but was not
updated in the function declaration.
- Fix broken formatting for the location table for PCIE coordinate fields
- Tracked in SWDEV-501108
Change-Id: Ic85439866cb836bb43acc52314a7f1d026c3215d
[ROCm/rocm_smi_lib commit: 67a0de4279]
Changes:
- Added new GPU metrics:
1) XGMI link status - Up/Down; 1 = up; 0 = down
2) Graphics clocks below host limit (per XCP)
accumulators -> used to help calculate a violation status
3) VRAM max bandwidth at max memory clock
- Updated rocm-smi --showmetrics to include new metrics.
Units/values reflect as indicated by driver, may differ
from AMD SMI or other ROCm SMI interfaces which
use these fields.
- N/A fields means the device does not support providing this
data.
Change-Id: I17b313345f15070a76b3a30dd8d5645d212d601b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 4de2168866]
Changes:
* [API] Removed checking board name, fixes for other MI ASICs
* [CLI] Increased progress bar to change memory partition modes
to 140 seconds, since driver reload is variable per system
Change-Id: Ifcaf40d28b4adf5eaa800c9e3748d33749dc414a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: d04cec7f1d]
Changes:
- Added warning screen to ROCm SMI users
setting memory partition
- Added new API (rsmi_dev_memory_partition_capabilities_get)
to retrieve memory partition capabilities
(What users can set memory partition modes to)
- Increased time-bar for CLI sets display to 40 seconds
- API now waits until the driver reloads with SYSFS files active
- [SWDEV-475712] [CLI/API] Fixed target_graphics_version field
not properly displaying for MI2x or Navi 3x ASICs.
- Updated tests
Change-Id: Iaf89d1b7ad9ceb449b289bc82ea198fe3b23992e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/rocm_smi_lib commit: 46902274b6]