2
0
Gráfico de cometimentos

144 Cometimentos

Autor(a) SHA1 Mensagem Data
Joseph Narlo dc228398d0 [SWDEV-504583] Resolve Additional Compiler Warnings
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-01-28 15:36:44 -06:00
Kanangot Balakrishnan, Bindhiya 6fa991c39c [SWDEV-481004] Fix for incorrect gfx_version number (#52)
The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-01-21 15:42:05 -06:00
Arif, Maisam 03a2368655 [SWDEV-509389] AMD-SMI crash when multiple threads call SMI APIs (#53)
Multi-threaded application rsmi_dev_gpu_metrics_info_get() causes crash

Code changes related to the following:
  * API implementation changes

Change-Id: I1f1fb39c1125569ec5d534b37fd6f68c8829eef7

Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Authored-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2025-01-21 14:00:15 -06:00
Poag, Charis c1cd2b46ef [SWDEV-488276] Add partition 2.0 functionality (#44)
Changes:
* CLI:
  - Updated amd-smi partition
  - Updated amd-smi partition -c
  - Updated amd-smi partition -m
  - Updated amd-smi partition -a
  - Updated amd-smi set -M <NPS1/NPS2/NPS4/NPS8>
  - Updated amd-smi set -C <SPX/DPX/QPX/TPX/CPX>
  - Updated amd-smi set -C <ACCELERATOR_TYPE> or <PROFILE_INDEX>
    Where PROFILE_INDEX = available ACCELERATOR_TYPES
  - Updated amd-smi set --help, now includes more detail for
    amd-smi set -C <ACCELERATOR_TYPE> or <PROFILE_INDEX>

* API:
  - Added amdsmi_get_gpu_memory_partition_config
  - Added amdsmi_set_gpu_memory_partition_mode
  - Added amdsmi_get_gpu_accelerator_partition_profile_config
  - Updated amdsmi_get_gpu_accelerator_partition_profile_config
  - Added amdsmi_set_gpu_accelerator_partition_profile

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-01-16 00:53:46 -06:00
Castillo, Juan 60492e754f [SWDEV-495169] Update err output to log_err (#24)
Update status type for EPERM and ENOENT based on feedback from ticket.
Update error output to LOG_ERR.

---------

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
2025-01-07 17:35:39 -06:00
gabrpham 23da950ef0 Additional fixes for amd-smi static --clock
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
2024-12-20 14:45:20 -06:00
Charis Poag 3226a1d0ea [SWDEV-484382] Fix VCLK/DCLK outputs for monitor, static, metric
Units were off and VCLK/DCLK outputs were not coming in
properly through amdsmi_get_clk_freq()

Now we match units sent back through rsmi_dev_gpu_clk_freq_get (MHz).

CLI now shows maximum of 2 VCLK/DCLKs otherwise shows N/A if there
is no current_freq listed.

Change-Id: I8a7b66cbb5263e8d396f8568c104e1ce3512923d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-12-20 14:11:08 -06:00
Juan Castillo f8b8347627 [SWDEV-496693]GPU Metrics 1.7
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram

- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
2024-12-18 10:57:06 -06:00
Joe Narlo d0a7332d32 SWDEV-492272 [AMDSMI] Build/Compiler warnings messages
Fix compiler warnings

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I10657b8f3ef18a9b45311e8f6509958297a57823
2024-12-13 00:38:07 -05:00
gabrpham bd01cfc203 Fixed post reset and ring_hang issues
Issues include:
	SWDEV-480250
	SWDEV-480255
	SWDEV-480248
Known issue:
	`amd-smi event` has threads taking events from the same device
which, in the case of resetting gpus, makes it seem like some gpus have
reset mulitple times and other have not reset at all.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ic7dcc214e0366fc1532ece579d915d34d35d5407
2024-12-06 17:46:00 -05:00
Charis Poag 7d061f9ae4 [SWDEV-499029] Fix unable to change memory partition modes
Changes:
  * [API] Removed checking board name, fixes for other MI ASICs
  * [API] Fixed unable to restart AMD GPU, libdrm blocked
    doing this operation
  * [API] Added ability to unload/reload libdrm
    from within AMD SMI APIs
  * [CLI] Increased progress bar to change memory partition modes
    to 140 seconds, since driver reload is variable per system

Change-Id: I52f227f2ab850c4a6332ff3ecdc899903b1080f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-25 09:28:02 -05:00
Joe Narlo 3052ad4220 SWDEV-495787 [AMDSMI] Different license headers
Change copyrights to MIT and remove date

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I16f5b412f2b9ddefaaa1771aa714cc18829a1be4
2024-11-22 08:55:28 -05:00
gabrpham 50eaf14b9e [SWDEV-498453] Enabled 'amd-smi set --clk-limit' for virtual environments
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I23e994502d4abc1a602d2341e77ad9c50fcf4839
2024-11-19 16:17:29 -06:00
Charis Poag 3ea4a42a6e [SWDEV-488276/SWDEV-497613] Update memory partition set functionality
Changes:
  - [CLI] Added warning screen to AMD SMI users
    setting memory partition
  - [CLI] Added a progress bar time-bar for CLI sets display to 40 seconds
  - [API] Updated to wait until the driver reloads with SYSFS files active
  - [CLI] Now users can set or reset without providing:
    amd-smi set -g all <set arguments>
    or amd-smi reset -g all <set arguments>
    now can directly call -> sudo amd-smi set <set arguments>
    or sudo amd-smi reset <set arguments>
  - [SWDEV-475712][CLI/API] Fixed target_graphics_version field
    not properly displaying for older MI or Navi ASICs.
  - [All APIs] Added a catch for the driver to report invalid arguments
    now these APIs will show AMDSMI_STATUS_INVAL
    (ex. changing to NPS8 if the device does not support it)
  - [Install] Modified paths for Python install commands to support
    multi-ROCm installs

Change-Id: Id11f25d68a82d23c6b2d77ccb30b51e860dd0ca7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-12 16:50:32 -04:00
Jorge López 172a3e233b Updates driverInitialized() to support amdgpu built as module as well as kernel built-in. Fixes ROCm/rocm_smi_lib#102 and is an updated version of ROCm/rocm_smi_lib#104
Change-Id: Icb3abe820bc67035b822358a1c04bd09a7c22b6b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Reviewed-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-05 16:30:34 -05:00
adapryor 02cbffb42a [SWDEV-412505] Handle mclk permission errors as not supported
Change-Id: Idb3eeed76ff55c507f28b5e692f8704704c3e46e
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-10-31 17:40:34 -04:00
Oliveira, Daniel 25bcf6af2a [SWDEV-488526] BI-Direction Table mismatch
Implements DiscoverIOLinkPerNodeDirection() based on KFD Node infrastructure;
'/kfd/topology/nodes/*/io_links'

Code changes related to the following:
  * Internal implementation

Change-Id: Iccd84d1d69234dbeae4d4925f657e7e3bd801106
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-10-17 15:27:09 -04:00
Maisam Arif 27a48e69d8 Corrected clean local data partition indexing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ib0eeb065f160fccd3c3f4a2d13f0869af01a74ae
2024-10-10 10:54:45 -05:00
Charis Poag 3a4abbd8c0 [SWDEV-422195/SWDEV-440985] GPU metrics 1.6
Changes:
    - Added new GPU metrics:
      1) Violation status' (ex. PVIOL/TVIOL) accumulators
      2) XCP (Graphics Compute Partitions) statistics
      3) pcie other end recovery counter
    - CLI/API/tests changes were made accordingly

Change-Id: I589b9b1f570f25dda12d95bb501feca85da8b3bb
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-27 12:04:21 -05:00
Maisam Arif b40b405332 [SWDEV-456049] & [SWDEV-442181] Fix early exiting loop while enumerating GPU stats
Skip missing vram_str_path and sdma_str_path if sysfs files not created when passing some, but not all, GPUs to a docker image.

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I83b7a62331672810688a94e4023b0ae740436e6d
2024-09-20 03:01:22 -05:00
Eisuke Kawashima 1b6ec8df07 chore: unset executable permission
Change-Id: I06727774f3b1657a7955b172a40d0dfc9c76d6b9
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-16 17:34:39 -04:00
Maisam Arif 105db1afcd Udpated License Dates
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8ca199c129c06508bc3e23745ab5ac2d20dce928
2024-09-16 16:14:47 -04:00
danzimm 91199279b0 Explicitly specify data_type in capture
Change-Id: I3a49ee3acc235df88c2df1d150803b2db2143aee
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-16 15:45:15 -04:00
Charis Poag a33e4c9e14 [SWDEV-483526] Fix MI3x partitions not showing all logical nodes
Changes:
- Updates to amdsmi_asic_info_t structure to include:
  target_graphics_version, kfd_id, node_id, partition_id
- Updates to amd-smi static --asic to display new
  samdsmi_asic_info_t fields
- Updates to gpu enumeration during amdsmi_init()
  to discover all logical GPUs when in a non-SPX mode
  (ex. DPX, TPX, QPX, or CPX)
 - Updates to amdsmi_get_gpu_bdf_id(..) to include
   partition_id details when in BDF or optional bits.
     - bits [63:32] = domain
     - bits [31:28] or bits [2:0] = partition id
     - bits [27:16] = reserved
     - bits [15:8]  = Bus
     - bits [7:3] = Device
     - bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes

- C++/Python tests updated to reflect these outputs

Change-Id: I4be0ea35bb98f3109ae2ca9e82f6b21baa38de29
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-11 16:35:17 -05:00
Tim Huang 260edaa752 [SWDEV-463402] - Support retrieving connection type and P2P capabilities between two GPUs
1. Add a API interface amdsmi_topo_get_p2p_status to retrieve
connection type and P2P capabilities between 2 GPUs.

2. Add getting p2p status test in hw_topology_read
to print P2P capability information.

3. Add below tables for cli topology sub commands:
  - CACHE COHERANCY TABLE
  - ATOMICS TABLE
  - DMA TABLE
  - BI-DIRECTIONAL TABLE

Change-Id: I199173030d4170115cea27c472958a4826e4e1bf
Signed-off-by: Tim Huang <tim.huang@amd.com>
2024-09-06 09:42:34 -04:00
Maisam Arif 97c487372f Clean up unused files & Update License info
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5b58e8fe3d9eeac207b07ce0fe4134dd717dbd90
2024-09-05 09:52:48 -04:00
Xiaodong Wang 2066872297 Fix ASAN issue in DiscoverAmdgpuDevices
I ran a test that exercised this code in dev mode and ASAN found a memory access issue due to the iterator returned by lower_bound being dereferenced unconditionally.  I believe the right fix is to check if the iterator is within the map and if not go to the else branch

Change-Id: I34fdce634791a09a89eee76c8b2b64a9607d57f9
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-09-04 10:14:10 -04:00
Charis Poag d9d6637cb7 [SWDEV-451960] [WIP] Add Pytest
Updates:
- Added pytest to shared/pytest folder
- User can execute tests:

[pytest]
python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -s -v
python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -s -v

[unittest]
/opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -v
/opt/rocm/share/amd_smi/tests/pytest/integration_test.py -v

- Automatically installs pytest

Change-Id: Ia3281a9608aeeb803b91f8b83f87ff84b01037f4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-08-29 10:09:29 -04:00
Oliveira, Daniel b05849dad0 SWDEV-463401: amdsmi_get_gpu_asic_info() adds num_of_compute_units
number of compute units `amdgpu_gpu_info.num_of_compute_units` is exposed through amdsmi_get_gpu_asic_info().

Code changes related to the following:
  * API
  * CLI
  * Unit tests
  * Examples

Change-Id: Ibeb612d079ed87437a0e56124b8504098fc2dcfd
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-08-28 10:15:07 -04:00
Oliveira, Daniel af3670d758 SWDEV-463372: amdsmi_get_utilization_count() adds decoder_activity
GPU Metrics info `gpu_metrics.vcn_activity` is exposed through amdsmi_get_utilization_count().

Code changes related to the following:
  * API
  * CLI
  * Unit tests

Change-Id: I831b2a81bdc0e090a6698dcb689d10f91ed87dd9
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-08-27 16:58:34 -05:00
Bill(Shuzhou) Liu 97e70d44cf Set soft min or max clock
Add the API to support set soft min or max clock.

Change-Id: Ia34381a721ef3c3d894d5a89d25afa757be46a79
2024-08-20 13:22:32 -04:00
Maisam Arif 8bc8307c60 [SWDEV-474450] Removed DEVICE_MUTEX from gpu_reset
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I706fb47288738bfbde94b56fee66bbf807b3c0cb
2024-07-19 11:47:52 -04:00
Bill(Shuzhou) Liu dbba33d3f5 Support thread only mutex
The environment variable RSMI_MUTEX_THREAD_ONLY=1 to enable thread only mutex.
The RSMI_INIT_FLAG_THRAD_ONLY_MUTEX can also be pass to rsmi_init()
to enable thread only mutex.

Change-Id: I2d9844039b774e386f03bb9bb130d8c342504ea6
2024-07-18 20:43:38 -05:00
Bill(Shuzhou) Liu 33dab0c232 Remove const to avoid compile error
Fix the compile error

Change-Id: I422b606b2b969b418c2e77b47a3afad0cfc732a1
2024-07-18 18:15:43 -04:00
Bill(Shuzhou) Liu 7a617e6ef2 Make the the devInfoTypesStrings.at(type) exception safe
Wrap it in a function to make it exception safe.

Change-Id: I29835993ae4fe2b7aa1a7027fab88c05ba89e6e3
2024-06-26 08:33:44 -05:00
Bill(Shuzhou) Liu 4441249ffa Add return character when set the PM policy using sysfs
When set PM policy in sysfs, the driver expect a return character.

Change-Id: I83cddb3cdb14c226e6e856776176000eea33b251
2024-06-13 11:02:13 -04:00
Bill(Shuzhou) Liu 4cf59c4edb Change the name of clear sram to run cleaner shader
The function is to clean the local data in LDS/GPRs. The clear sram
is misleading.

Change-Id: I0385e6d6348602fe0f347d17e48ed8983f7ceb87
2024-06-05 12:07:39 -05:00
Maisam Arif e5d1ba4621 Use different sysfs for soc_pstate and xmgi_plpd
The sysfs is changed to use the pm_policy folder with multiple
dpm_policy files.

Change-Id: I40fac8de2d0cb127950d238b8196f6d2416778d0
2024-05-31 01:38:41 -04:00
Maisam Arif 7d999aa34c SWDEV-458102 - Updates to pp_od_clk_voltage parsing
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I650dae1a99856dcde914fe66917cf9111f3ce0e2
2024-05-15 03:18:24 -05:00
Bill(Shuzhou) Liu 437cb07db6 Discover the amdgpu when card numbers are not consecutive.
When discover the amdgpu, if the assigned numbers are not consecutive,
not all GPU can be discovered. The code is change to discover the
GPU based on max card number.

Change-Id: Icf4c1df4a1651093b5de3cd7a25a9bd69a299075
2024-05-13 09:53:09 -04:00
Maisam Arif 52843152a5 SWDEV-444567 - Added Ring Hang Event
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I2e73ba08ee0004f6f30660b2fa425ea94bafceca
2024-05-03 17:21:28 -04:00
Maisam Arif 11c72946eb Revert "SWDEV-458102 - Deprecated Voltage Curve API"
This reverts commit 1423fb632e.

Change-Id: I8a3eaf0a9f28200e09fb35d5260fbc070fe8a4a9
2024-05-02 15:27:16 -05:00
Maisam Arif 1423fb632e SWDEV-458102 - Deprecated Voltage Curve API
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I111c3ce26d2ab66d5e755432f4b8a9bfa631f805
2024-05-02 02:53:29 -04:00
Bill(Shuzhou) Liu a0d0210761 Process isolation sysfs format change
The process isolation sysfs format is changed. This fix will
adapt to the new sysfs format.

Change-Id: Id6fd7eeb3e25525047dccab248fd9cfb206cbf62
2024-04-25 08:11:05 -05:00
Bill(Shuzhou) Liu 7d2ab7970d Process isolation and clean shader
A few APIs and command line options are added to support process
isolation and clean shader.

Change-Id: I98ad3fc9fc7429799a21798b7fca1c307de7f403
2024-04-24 13:22:20 -04:00
Oliveira, Daniel 1ae3a5b6cb fix: [SWDEV-458102] [rocm/amd_smi_lib]
Drops checks that are invalid with the new pp_od_clk_voltage format

Code changes related to the following:
  * get_od_clk_volt_info()
  * get_od_clk_volt_curve_regions()

Change-Id: I534c920e00fa3dacdb980f431db5eef260ac93f5
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-04-23 18:23:39 -05:00
Maisam Arif 1bd18c1a65 Added new ecc blocks and adjusted metric --ecc-block filtering
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ib2f69c7d59ee5108024794434fb202b5e4f58738
2024-04-18 15:01:41 -04:00
Oliveira, Daniel 08e2e21bab fix: [SWDEV-442525] [rocm/amd_smi_lib]
Fixes gpu_process_list

Code changes related to the following:
  * amdsmi_get_gpu_process_list()
  * CLI
  * Examples
  * Unit tests
  * Changelog
  * Readme
  * rocm_smi_lib commit: 677433b367

Change-Id: I9210fbca7a5da92d0a8b472b72ca82597c8e4fb5
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-03-27 16:48:24 -05:00
Bill(Shuzhou) Liu e4085c6414 Get and set the XGMI PLPD
Update the API and CLI to support XGMI Per-Link Power Down Policy.

Change-Id: Iaf04a771eb8bb0829a5b3088d803a7355a8dfd0b
2024-03-26 01:48:14 -05:00
Bill(Shuzhou) Liu 108e6d4ae6 Set and get DPM policy for GPU device
Add new APIs to set and get dpm policy for the GPU device.

Change-Id: I26fa49cd17d0ce66bda3446c38945a6cf35717ff
2024-03-12 10:32:31 -04:00