Issues include:
SWDEV-480250
SWDEV-480255
SWDEV-480248
Known issue:
`amd-smi event` has threads taking events from the same device
which, in the case of resetting gpus, makes it seem like some gpus have
reset mulitple times and other have not reset at all.
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ic7dcc214e0366fc1532ece579d915d34d35d5407
Changes:
* [API] Removed checking board name, fixes for other MI ASICs
* [API] Fixed unable to restart AMD GPU, libdrm blocked
doing this operation
* [API] Added ability to unload/reload libdrm
from within AMD SMI APIs
* [CLI] Increased progress bar to change memory partition modes
to 140 seconds, since driver reload is variable per system
Change-Id: I52f227f2ab850c4a6332ff3ecdc899903b1080f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Changes:
- [CLI] Added warning screen to AMD SMI users
setting memory partition
- [CLI] Added a progress bar time-bar for CLI sets display to 40 seconds
- [API] Updated to wait until the driver reloads with SYSFS files active
- [CLI] Now users can set or reset without providing:
amd-smi set -g all <set arguments>
or amd-smi reset -g all <set arguments>
now can directly call -> sudo amd-smi set <set arguments>
or sudo amd-smi reset <set arguments>
- [SWDEV-475712][CLI/API] Fixed target_graphics_version field
not properly displaying for older MI or Navi ASICs.
- [All APIs] Added a catch for the driver to report invalid arguments
now these APIs will show AMDSMI_STATUS_INVAL
(ex. changing to NPS8 if the device does not support it)
- [Install] Modified paths for Python install commands to support
multi-ROCm installs
Change-Id: Id11f25d68a82d23c6b2d77ccb30b51e860dd0ca7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Move memory_caps defintion and correct the number in reserved to match Confluence
Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: Id94144f4b3d2d3d7b4d7327211ffc1957ffd0a93
Changes:
- Corrected max speed users can sample from FW/driver
is 100 ms
- Added warning to amdsmi_get_violation_status()
call on delay required 100ms to sample
- Removed guest support, this API will not be supported
- Updated CLI `amd-smi metric --throttle` outputs from
XXX_active -> XXX_status
XXX_percent -> XXX_activity
to align with host
- Changelog updated
Change-Id: Ib30dd35dcc04ff67904ca82c86a55a16689df226
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
The reset gpu partition support for both compute and memory were removed
Code changes related to the following:
* amdsmi_reset_gpu_compute_partition()
* amdsmi_reset_gpu_memory_partition()
* CLI
Change-Id: I372589074b4da172bedd39223edde18939e373ae
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
- Update the API names, parameters to return cpu handles and core
handles in the system.
- Update the amdsmi_wrapper.py.
- Update the amdsmi_interface.py to use the processor handles and
core handles API.
Change-Id: Ie24f62f345864f8b6773fdb3c6369993bca7e25b
Changes:
- amdsmi_violation_status_t now includes current accumulated/counter
values
- Tests/wrapper now include added values
- Removed ASIC references in header for host/bm alignment
- Fix violation_status->per_hbm_thrm /
violation_status->active_hbm_thrm
calculations.
Change-Id: Ic86a7cbad5198a41018f82f6b588b83158d9ba0b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
amdsmi_get_link_topology_nearest() is used to retrieve
the set of GPUs that are nearest to a given device
at a specific interconnectivity level.
Code changes related to the following:
* API
* CLI
* Unit tests
* Examples
Header Unification Change: "/amdsmi/+/1122408"
Change-Id: Id0317797c652c267742513936d321677793ec634
Signed-off-by: Lang Yu <lang.yu@amd.com>
partition_id also removed from the `amdsmi_asic_info_t` struct and
supporting API has been added for querying partition information.
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Id5a6291a77d11bb97a1c7a200fc465898e86e081
Changes:
- Updates to amdsmi_asic_info_t structure to include:
target_graphics_version, kfd_id, node_id, partition_id
- Updates to amd-smi static --asic to display new
samdsmi_asic_info_t fields
- Updates to gpu enumeration during amdsmi_init()
to discover all logical GPUs when in a non-SPX mode
(ex. DPX, TPX, QPX, or CPX)
- Updates to amdsmi_get_gpu_bdf_id(..) to include
partition_id details when in BDF or optional bits.
- bits [63:32] = domain
- bits [31:28] or bits [2:0] = partition id
- bits [27:16] = reserved
- bits [15:8] = Bus
- bits [7:3] = Device
- bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes
- C++/Python tests updated to reflect these outputs
Change-Id: I4be0ea35bb98f3109ae2ca9e82f6b21baa38de29
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
1. Add a API interface amdsmi_topo_get_p2p_status to retrieve
connection type and P2P capabilities between 2 GPUs.
2. Add getting p2p status test in hw_topology_read
to print P2P capability information.
3. Add below tables for cli topology sub commands:
- CACHE COHERANCY TABLE
- ATOMICS TABLE
- DMA TABLE
- BI-DIRECTIONAL TABLE
Change-Id: I199173030d4170115cea27c472958a4826e4e1bf
Signed-off-by: Tim Huang <tim.huang@amd.com>
number of compute units `amdgpu_gpu_info.num_of_compute_units` is exposed through amdsmi_get_gpu_asic_info().
Code changes related to the following:
* API
* CLI
* Unit tests
* Examples
Change-Id: Ibeb612d079ed87437a0e56124b8504098fc2dcfd
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Driver info `amdgpu_gpu_info.vram_bit_width` is exposed through amdsmi_get_gpu_vram_info().
Code changes related to the following:
* API
* CLI
* Unit tests
* Examples
Change-Id: I8abd8db7a603078b2b1c008b2685cecf35caf3d2
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
GPU Metrics info `gpu_metrics.vcn_activity` is exposed through amdsmi_get_utilization_count().
Code changes related to the following:
* API
* CLI
* Unit tests
Change-Id: I831b2a81bdc0e090a6698dcb689d10f91ed87dd9
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Fixes `amdsmi_get_gpu_process_list` now requires sudo to access pid and memory information
Code changes related to the following:
* amdsmi_get_gpu_process_list()
* CLI
Change-Id: I72b154c220276b354c350fcc067c9a7c32e6c173
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>