* Add common module
* Added information to help with unknowns
* Allow paring of cmds
* change cmd print default
* Reduce cmds to be tested
---------
Signed-off-by: amd-josnarlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <joseph.narlo@amd.com>
* Fix the amdgpu version string comparison
The intention behind it was to avoid showing the string if it's not
got information.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
* Display the kernel version in amd-smi output
This is an interesting debugging point, especially in the case of
not having a DKMS package installed.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Moving os_kernel_version to static --driver
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
---------
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
Currently if the input file name already exists, the tool
appends output to existing file. Added overwrite, append,
or no(discard) options to choose from.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Read the ids_flags when fetching GPU info
The ids_flags contains the flags that can help identify if a GPU
is a dGPU or an APU.
* Show correct memory pool for APUs
The kernel policy for APUs will be to choose the bigger pool of
memory (GTT or VRAM) for KFD work. Adjust the policy for the monitor
and default commands to show the right memory pool when using an APU.
* Don't require powercap support
APUs don't necessarily support setting a power cap from sysfs.
Ignore failures of the file missing.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Show edge temperature in default output if hotspot is missing
APUs don't have a hotspot temperature, they have an edge though.
Use that.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Format all "power" keys as watts
There will be more power keys when APU support is added, so format
them properly.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Don't show power limit in output if it's invalid
APUs can't set power limit using power_cap1 interface. The limit
will be 0 and thus the UX looks weird in default output.
Only add the `/power_limit` if it's valid.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Unify sizes of `amdsmi_power_info_t`
Sizes are used inconsistently. This causes tools to not show
N/A when they should. Make them unified.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Stop trying to fit too much in one line for default view
The default view is really cramped trying to put a lot of version
information into one line, to the point that some strings are
cropped. Instead of cropping the strings just put each into it's
own line.
For running without a ROCm release installed hide the ROCm version
line.
Sample output:
```
+------------------------------------------------------------------------------+
| AMD-SMI 26.1.0+2a668c34 |
| amdgpu version: Linuxver |
| VBIOS version: 023.010.001.022.000001 |
| Platform: Linux Baremetal |
|-------------------------------------+----------------------------------------|
| BDF GPU-Name | Mem-Uti Temp UEC Power-Usage |
| GPU HIP-ID OAM-ID Partition-Mode | GFX-Uti Fan Mem-Usage |
|=====================================+========================================|
| 0000:c1:00.0 ...adeon 890M Graphics | N/A 59 °C 0 17 W |
| 0 0 N/A N/A | 25 % N/A 479/512 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes: |
| GPU PID Process Name GTT_MEM VRAM_MEM MEM_USAGE CU % |
|==============================================================================|
| No running processes found |
+------------------------------------------------------------------------------+
```
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Don't show amdgpu version on mainline kernels
amdgpu version doesn't exist on a mainline kernel.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Truncate amdgpu version string to 80 characters
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Allow longer AMD-SMI version strings
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Adjusted version header format
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Co-authored-by: Mario Limonciello (AMD) <superm1@kernel.org>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
- **Added evicted_time metric for kfd processes**.
- Time that queues are evicted on a GPU in milliseconds
- Added to CLI in `amd-smi monitor -q` and `amd-smi process`
- Added to C API and Python API:
- amdsmi_get_gpu_process_list()
- amdsmi_get_gpu_compute_process_info()
- amdsmi_get_gpu_compute_process_info_by_pid()
---------
Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
[ROCm/amdsmi commit: 2144cfbba4]
Changes:
- Update violation status logic and metric naming for XCP/XCC metrics (thrm/thm consistency)
- Added XCP identifier in monitor to allow partition metrics to be shown with applicable APIs
(Violation Status is the first example of this in monitor)
- Improve CLI monitor output:
support multiple GPU lines per GPU, add new columns, and better formatting
- Refactor helpers and logger for flexible unit formatting and table rendering
- Add examples for amdsmi_get_gpu_pm_metrics_info()/amdsmi_get_gpu_reg_table_info()
new metrics APIs in C++ example
- Sync Python/C++ interface and structures for new metrics fields and naming
- Remove deprecated/unused RSMI activity APIs, documentation not needed since
the APIs no longer exist in ROCm SMI either.
- Cleanup metric violations + fix handle watch arguments
- Provide better handling/doc for average_flattened_ints()
- Group xcp metrics with brackets in human readable + adjust output size
Signed-off-by: Poag, Charis <Charis.Poag@amd.com>
[ROCm/amdsmi commit: e2e4fc65c1]
The amd-smi command will will show only executable
name of a process by stripping absolute path. This
cause "N/A" process names incorrectly display as
"A" in the output. Corrected the same.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: b16a66b2c5]
Modified the file used to fetch process name so that complete name with path can be displayed.
Changes:
amd-smi monitor -q
- human readable format will output only the process name
- csv and json formats will print the full path
amd-smi process
- name will always be the full path to the process
amd-smi (default output)
- name will always be truncated.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 514517e536]
* [SWDEV-537852] Update process name help text
Currently process name displays N/A if that need elevated
permissions. Updated the default amd-smi, process and monitor
commands help texts to display elevated permission requirement.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: ce230efaaa]
The custom_dump function was not printing list's key
and so static numa was not displaying list keys
CPU affinity and Socket affinity. Updated custom_dump
to print the keys.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 6fbda16098]
The amd-smi partition --json output was not in valid json
format. Changes are done to get the output in valid
json format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 872c58b7a3]
The amd-smi xgmi --json output was not in valid json
format. Changes are done to get the output in valid
json format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 8f943b03e1]
* Added degree symbol and fixed power usage
* Added degree symbol and fixed power usage
* fixed default command
---------
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
[ROCm/amdsmi commit: bc158d2b51]
When all clocks are N/A's, it will be filtered. To
avoid confusion, single N/A is added.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 62294df49a]
The N/A leaves filering was removing clock in static.
To avoid this, removed N/A filtering from single tier.
Signed-off-by: Kanangot Balakrishnan, Bindhiya <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: e26e26e308]
The 'amd-smi metric --clock' was listing values with N/A. Filtered these outputs to show only available values.
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 797e4fba07]
Earlier, the amd-smi metric and static json output
was not in valid json format. Changes are done to
get the output in valid json format.
---------
Change-Id: I5576333269509f63b3c800f225c3d73127ce80cf
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 8e5f6b1a8d]
Added power cap to display on amd-smi monitor -p.
Updated help and Changelog as well.
Signed-off-by: Kanangot Balakrishnan, Bindhiya <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 7d109001ac]
Earlier amd-smi monitor was showing VRAM usage as used and total.
Modified it to display free VRAM and VRAM percentage. Updated
Changelog.
Signed-off-by: Kanangot Balakrishnan, Bindhiya <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 3ddfbcc0a3]
* [SWDEV-513807] Fix amd-smi partition --accelerator not returning AMDSMI_STATUS_NO_PERM
Changes:
- Fixed amdsmi_get_gpu_accelerator_partition_profile_config() from not
returning AMDSMI_STATUS_NO_PERM
- Changed amd-smi partition --accelerator to provide user with a warning
if users does not use sudo or root permissions.
- Updated changelog for fixes planned for 6.4.1 release
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 0402bb4d75]
Full output:
$ amd-smi metric:
AttributeError: 'AMDSMILogger' object has no attribute 'clear_multiple_devices_output'. Did you mean: 'clear_multiple_devices_ouput'?
Changes:
* Changed CLI function definition clear_multiple_devices_ouput(self) ->
clear_multiple_devices_output(self)
* Updated all references to clear_multiple_devices_ouput() to use
clear_multiple_devices_ouput()
Change-Id: Ibd4e210ea30c9dd51fba17981a524b823f2db054
[ROCm/amdsmi commit: 1d2272490e]
Units were off and VCLK/DCLK outputs were not coming in
properly through amdsmi_get_clk_freq()
Now we match units sent back through rsmi_dev_gpu_clk_freq_get (MHz).
CLI now shows maximum of 2 VCLK/DCLKs otherwise shows N/A if there
is no current_freq listed.
Change-Id: I8a7b66cbb5263e8d396f8568c104e1ce3512923d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 3226a1d0ea]
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram
- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
[ROCm/amdsmi commit: f8b8347627]
List string should take into account dictionary value types
Change-Id: Icc08288cb0007d43eacd1aff6d44c40a84ea9448
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 57f45954b7]