The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Multi-threaded application rsmi_dev_gpu_metrics_info_get() causes crash
Code changes related to the following:
* API implementation changes
Change-Id: I1f1fb39c1125569ec5d534b37fd6f68c8829eef7
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Authored-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Update status type for EPERM and ENOENT based on feedback from ticket.
Update error output to LOG_ERR.
---------
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Units were off and VCLK/DCLK outputs were not coming in
properly through amdsmi_get_clk_freq()
Now we match units sent back through rsmi_dev_gpu_clk_freq_get (MHz).
CLI now shows maximum of 2 VCLK/DCLKs otherwise shows N/A if there
is no current_freq listed.
Change-Id: I8a7b66cbb5263e8d396f8568c104e1ce3512923d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram
- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
Issues include:
SWDEV-480250
SWDEV-480255
SWDEV-480248
Known issue:
`amd-smi event` has threads taking events from the same device
which, in the case of resetting gpus, makes it seem like some gpus have
reset mulitple times and other have not reset at all.
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ic7dcc214e0366fc1532ece579d915d34d35d5407
Changes:
* [API] Removed checking board name, fixes for other MI ASICs
* [API] Fixed unable to restart AMD GPU, libdrm blocked
doing this operation
* [API] Added ability to unload/reload libdrm
from within AMD SMI APIs
* [CLI] Increased progress bar to change memory partition modes
to 140 seconds, since driver reload is variable per system
Change-Id: I52f227f2ab850c4a6332ff3ecdc899903b1080f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Changes:
- [CLI] Added warning screen to AMD SMI users
setting memory partition
- [CLI] Added a progress bar time-bar for CLI sets display to 40 seconds
- [API] Updated to wait until the driver reloads with SYSFS files active
- [CLI] Now users can set or reset without providing:
amd-smi set -g all <set arguments>
or amd-smi reset -g all <set arguments>
now can directly call -> sudo amd-smi set <set arguments>
or sudo amd-smi reset <set arguments>
- [SWDEV-475712][CLI/API] Fixed target_graphics_version field
not properly displaying for older MI or Navi ASICs.
- [All APIs] Added a catch for the driver to report invalid arguments
now these APIs will show AMDSMI_STATUS_INVAL
(ex. changing to NPS8 if the device does not support it)
- [Install] Modified paths for Python install commands to support
multi-ROCm installs
Change-Id: Id11f25d68a82d23c6b2d77ccb30b51e860dd0ca7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Implements DiscoverIOLinkPerNodeDirection() based on KFD Node infrastructure;
'/kfd/topology/nodes/*/io_links'
Code changes related to the following:
* Internal implementation
Change-Id: Iccd84d1d69234dbeae4d4925f657e7e3bd801106
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Skip missing vram_str_path and sdma_str_path if sysfs files not created when passing some, but not all, GPUs to a docker image.
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I83b7a62331672810688a94e4023b0ae740436e6d
Changes:
- Updates to amdsmi_asic_info_t structure to include:
target_graphics_version, kfd_id, node_id, partition_id
- Updates to amd-smi static --asic to display new
samdsmi_asic_info_t fields
- Updates to gpu enumeration during amdsmi_init()
to discover all logical GPUs when in a non-SPX mode
(ex. DPX, TPX, QPX, or CPX)
- Updates to amdsmi_get_gpu_bdf_id(..) to include
partition_id details when in BDF or optional bits.
- bits [63:32] = domain
- bits [31:28] or bits [2:0] = partition id
- bits [27:16] = reserved
- bits [15:8] = Bus
- bits [7:3] = Device
- bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes
- C++/Python tests updated to reflect these outputs
Change-Id: I4be0ea35bb98f3109ae2ca9e82f6b21baa38de29
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
1. Add a API interface amdsmi_topo_get_p2p_status to retrieve
connection type and P2P capabilities between 2 GPUs.
2. Add getting p2p status test in hw_topology_read
to print P2P capability information.
3. Add below tables for cli topology sub commands:
- CACHE COHERANCY TABLE
- ATOMICS TABLE
- DMA TABLE
- BI-DIRECTIONAL TABLE
Change-Id: I199173030d4170115cea27c472958a4826e4e1bf
Signed-off-by: Tim Huang <tim.huang@amd.com>
I ran a test that exercised this code in dev mode and ASAN found a memory access issue due to the iterator returned by lower_bound being dereferenced unconditionally. I believe the right fix is to check if the iterator is within the map and if not go to the else branch
Change-Id: I34fdce634791a09a89eee76c8b2b64a9607d57f9
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
number of compute units `amdgpu_gpu_info.num_of_compute_units` is exposed through amdsmi_get_gpu_asic_info().
Code changes related to the following:
* API
* CLI
* Unit tests
* Examples
Change-Id: Ibeb612d079ed87437a0e56124b8504098fc2dcfd
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
GPU Metrics info `gpu_metrics.vcn_activity` is exposed through amdsmi_get_utilization_count().
Code changes related to the following:
* API
* CLI
* Unit tests
Change-Id: I831b2a81bdc0e090a6698dcb689d10f91ed87dd9
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
The environment variable RSMI_MUTEX_THREAD_ONLY=1 to enable thread only mutex.
The RSMI_INIT_FLAG_THRAD_ONLY_MUTEX can also be pass to rsmi_init()
to enable thread only mutex.
Change-Id: I2d9844039b774e386f03bb9bb130d8c342504ea6
When discover the amdgpu, if the assigned numbers are not consecutive,
not all GPU can be discovered. The code is change to discover the
GPU based on max card number.
Change-Id: Icf4c1df4a1651093b5de3cd7a25a9bd69a299075
Drops checks that are invalid with the new pp_od_clk_voltage format
Code changes related to the following:
* get_od_clk_volt_info()
* get_od_clk_volt_curve_regions()
Change-Id: I534c920e00fa3dacdb980f431db5eef260ac93f5
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>