* Stop trying to fit too much in one line for default view
The default view is really cramped trying to put a lot of version
information into one line, to the point that some strings are
cropped. Instead of cropping the strings just put each into it's
own line.
For running without a ROCm release installed hide the ROCm version
line.
Sample output:
```
+------------------------------------------------------------------------------+
| AMD-SMI 26.1.0+2a668c34 |
| amdgpu version: Linuxver |
| VBIOS version: 023.010.001.022.000001 |
| Platform: Linux Baremetal |
|-------------------------------------+----------------------------------------|
| BDF GPU-Name | Mem-Uti Temp UEC Power-Usage |
| GPU HIP-ID OAM-ID Partition-Mode | GFX-Uti Fan Mem-Usage |
|=====================================+========================================|
| 0000:c1:00.0 ...adeon 890M Graphics | N/A 59 °C 0 17 W |
| 0 0 N/A N/A | 25 % N/A 479/512 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes: |
| GPU PID Process Name GTT_MEM VRAM_MEM MEM_USAGE CU % |
|==============================================================================|
| No running processes found |
+------------------------------------------------------------------------------+
```
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Don't show amdgpu version on mainline kernels
amdgpu version doesn't exist on a mainline kernel.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Truncate amdgpu version string to 80 characters
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Allow longer AMD-SMI version strings
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Adjusted version header format
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Co-authored-by: Mario Limonciello (AMD) <superm1@kernel.org>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
## Motivation
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
- __Reduced Code Duplication__: Version parsing logic moved from individual Dockerfiles to the central build script
- __Improved Edge Case Handling__: Better handling of ROCm versions with and without patch numbers (e.g., `6.2` vs `6.2.0`)
- __Easier Maintenance__: Future version-related changes only need to be made in one place
- __Cleaner Dockerfiles__: Simplified Dockerfiles focus on package installation rather than complex shell logic
- __Updated Platform Support__: Refreshed container matrix to reflect current platform/ROCm version combinations
- __Fix OpenSUSE Docker Generation__: OpenSUSE container generation fails due to a change to the `binutils-gold` package
- __Error Handling__: Fix bug where errors in docker image build were being masked, allowing workflow to pass anyway.
## Technical Details
<!-- Explain the changes along with any relevant GitHub links. -->
- Updated `Dockerfile.opensuse` and `Dockerfile.opensuse.ci` docker files to remove `binutils-gold`
- Not needed since we build `binutils` with systems anyways
- Updated `rocprofiler-systems-containers.yml` to remove `pushd/popd` commands and just run the shell scripts
- There was a silent failure observed here, which I verified in this PR before adding the fix for openSUSE
- Refactor ROCm version parsing. Move this logic to the `build-docker.sh` script to reduce duplication.
- Fix bug that caused ROCm 7.0 to fail installation. The trailing `.0` was being trimmed.
- Fixed inconsistencies in `containers.yml` that lead to invalid ROCm-OS_VERSION combinations.
- Formatting fixes
- Removed trailing whitespace
- Fix docker build warnings. Use an `=` rather than ` ` when assigning an environment variable.
This patch enhances compatibility for DXG environments by introducing conditional
checks for DRM operations, particularly around buffer object metadata handling
in IPC scenarios. These changes improve robustness in DXG IPC memory management
without impacting existing functionality in standard Linux environments.
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
* Fix powercap default to enum for sensor_ind
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
* [SWDEV-559965] Refactor amdsmi set power cap
Modified power cap set to accept args with
optional power_cap type. Added power_cap helper
validate_and_set_power_cap(). Fixed JSON output
format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Add exception handling for native tool path search
* Fix formatting in roofline benchmark code
* Fix detection of .so files
* include hip code and native tool code in standalone binary
* add fallback path for ROCM_PATH
Changes:
- Fixed `amd-smi` showing:
```console
$ amd-smi
Traceback (most recent call last):
File "/opt/rocm/bin/amd-smi", line 53, in <module>
from amdsmi_init import *
File "/opt/rocm/libexec/amdsmi_cli/amdsmi_init.py", line 38, in <module>
from amdsmi import amdsmi_interface, amdsmi_exception
File "/usr/local/lib/python3.8/dist-packages/amdsmi/__init__.py", line 24, in <module>
from .amdsmi_interface import amdsmi_init
File "/usr/local/lib/python3.8/dist-packages/amdsmi/amdsmi_interface.py", line 5581, in <module>
) -> tuple[int, int]:
TypeError: 'type' object is not subscriptable
```
This was a python3.8 issue, which is now resolved by using
`Tuple[int, int]` typing for Python 3.8 compatibility.
* Enable running tests from installation only
* Use cmake option -DTEST_FROM_INSTALL=ON to enable running tests from installation folder only
* It is not possible to run tests from build folder in this case
* This option prevents changing working directory to source folder
* Fix SourceFileLoader to import rocprof-compute main module correctly
* Install sample executables in the test folder
* fix num_xcds_cli_output test
* Fix tests
* Skip autogen. config. test and add a TODO task for re-design of this
test
* Add flexible import of source code in test_gpu_specs.py
* Update cmake to install tests/workloads folder when INSTALL_TESTS=ON
* Fix sys.argv[0] for tests
* fix live attach detach test
* refactor: centralize update_env across binaries with unit test added for testing
* removed unused includes suggested by clangd and small cleanup
* use centralized update_env in argparse as well
* review comments incorporated
* move update_env tests closer to common library
* fix: missing common:: prefix in rocprof-sys-sample
* cmake formatting
* reenable gfx1100
use the modified version of the flat_store_short assembly instruction as suggested by the compiler team (32bit input value instead of 16bit)
* add fix for gfx1201
add the same fix for gfx1201 that was introduced for gfx1100
[ROCm/rocshmem commit: 224c969bef]
* reenable gfx1100
use the modified version of the flat_store_short assembly instruction as suggested by the compiler team (32bit input value instead of 16bit)
* add fix for gfx1201
add the same fix for gfx1201 that was introduced for gfx1100
* [hip-tests] Fix Unit_Assert_Positive_Basic_KernelFail
This test was expecting a call to abort() when assertions
where hit on AMD devices. This is no longer true since
aborts from assertions are disabled unless
HIP_SKIP_ABORT_ON_GPU_ERROR is set.
This PR simplifies the test by removing the SIGABRT signal
handling (which was also undefined behaviour). Instead,
if HIP_SKIP_ABORT_ON_GPU_ERROR is set, the test is skipped.
## Motivation
Resolved: SWDEV-566226
The current implementation of agents inside of rocprof-systems keeps just the minimal necessary set of information required for populating the `info_agent` table inside of rocpd database. There is a sufficient amount of data that is being left out from database, so this change should fix that and store the additional agent information as an `extdata` row inside of `info_agent` table.
## Technical Details
This PR introduces additional filed inside of `agent` structure inside which is representing the JSON formatted string of all the additional information we can acquire about particular agent. This data is processed and added during the initial fetching of agents, and afterwards pushed inside of the database.
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* SWDEV-557412 - Incorporate proper chunk offset when remapping virtual memory (#1848)
* SWDEV-557412 - Incorporate proper offset when remapping virtual memory
* Fix condition to check if VMHeap allocation address matches a chunk address
* Move offset calculation outside if/else block
---------
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
* SWDEV-567852 - Clean-up hip::init() (#1948)
* SWDEV-559267 - Use CLPrint to DevLogPrintf with Log Level - detail debug. (#1160)
* SWDEV-548892 - Stop using ocml isinf wrapper (#1854)
* SWDEV-562708 - change default maximum SVM size to 256GB (#1731)
* SWDEV-503089 - Fix and enable disabled HIP tests from math group (#1319)
* SWDEV-503089 - Fix and enable disabled HIP tests from math group
* SWDEV-503089 - Move single precision reduced run to a common function
* SWDEV-548892 - Stop using ockl steadyctr function (#1882)
Directly use the builtin
* Implement PTL support (#1957)
* Implement PTL support
Signed-off-by: adapryor <Adam.pryor@amd.com>
(cherry picked from commit 45bc31292e7940a3b8fca044ef7df22047b95733)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
---------
Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
* SWDEV-558080 - Add recommended granularity (#1176)
* Add recommended granularity
* Improve granularity testing
* Update based on feedback
* Fix and enable VMM tests on cuda (#1855)
* Fix and enable VMM tests on cuda
* Minor syntax fixes
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* [rocprofiler-systems] Add support for ompt_callback_thread_begin (#1681)
* Add thread_begin callback
* Make OMPT callbacks that are instant have start_ts = end_ts
* SWDEV-567514: Remove default stream wait (#1977)
- when virtual map command is called
- can create deadlock
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
* Fix flaky test Unit_hipStreamAddCallback_StrmSyncTiming (#2022)
* Review comments
* skip the 3 failing tests to merge hip-tests rocm-systems PR
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Co-authored-by: GunaShekar <agunashe@amd.com>
Co-authored-by: agunashe <ajay.gunashekar@amd.com>
Co-authored-by: Ethan Trinh <Ethan.Trinh@amd.com>
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
Co-authored-by: Victor Zhang <111778801+victzhan@users.noreply.github.com>
Co-authored-by: German Andryeyev <56892148+gandryey@users.noreply.github.com>
Co-authored-by: usrihari123 <srihari.u@amd.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Co-authored-by: anujshuk-amd <anujshuk@amd.com>
Co-authored-by: itrowbri <Ian.Trowbridge@amd.com>
Co-authored-by: marantic-amd <marantic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: cadolphe-amd <chris.adolphe@amd.com>
Co-authored-by: Karthik Jayaprakash <54370791+kjayapra-amd@users.noreply.github.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Todd tiantuo Li <88386084+lttamd@users.noreply.github.com>
Co-authored-by: amilanov-amd <Aleksandar.Milanov@amd.com>
Co-authored-by: Adam Pryor <61172547+adam360x@users.noreply.github.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: AidanBeltonS <abeltons@amd.com>
Co-authored-by: Rahul Manocha <153310294+manocharahul@users.noreply.github.com>
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
Co-authored-by: Shadi Dashmiz <94885391+shadidashmiz@users.noreply.github.com>
Co-authored-by: Ioannis Assiouras <38722728+iassiour@users.noreply.github.com>
Co-authored-by: Ajay GunaShekar <86270081+agunashe@users.noreply.github.com>
WSL uses the call just for the thread wake-up, however under Windows
KMD needs the actual value (SWDEV-568592). The interface is changed
to avoid programming of a modified write_ptr value, which somewhat
changes the client's logic.
Changed ipc_sock_server_conns_ map's value type to size_t. Previous
type of int caused allocations of sizes greater than 2GB to overflow,
causing the message len to be stored as a negative value, preventing the
IPC server from exporting dmabuf file descriptors, which lead to hangs.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* Add host-side rocshmem_alltoallmem_on_stream function
Function signature:
rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void *dest,
const void *source, size_t size,
hipStream_t stream)
- The function launches rocshmem_alltoallmem_kernel which calls
device-side alltoall<char> workgroup collective through default context.
- Uses dynamic block size determination via occupancy API.
- Implemented for all backends.
* Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends
When allocating memory for alltoall_pSync_pool in setup_teams() and
teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE
instead of ROCSHMEM_ALLTOALL_SYNC_SIZE.
* Add functional test for team_alltoallmem_on_stream
This commit adds a new functional test to verify the correctness of
the host-side rocshmem_team_alltoallmem_on_stream API.
* Add documentation for rocshmem_alltoallmem_on_stream
This commit adds API documentation for the host-side
rocshmem_alltoallmem_on_stream function in the collective routines
section. The documentation includes:
[ROCm/rocshmem commit: 5577feb70d]
* Add host-side rocshmem_alltoallmem_on_stream function
Function signature:
rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void *dest,
const void *source, size_t size,
hipStream_t stream)
- The function launches rocshmem_alltoallmem_kernel which calls
device-side alltoall<char> workgroup collective through default context.
- Uses dynamic block size determination via occupancy API.
- Implemented for all backends.
* Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends
When allocating memory for alltoall_pSync_pool in setup_teams() and
teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE
instead of ROCSHMEM_ALLTOALL_SYNC_SIZE.
* Add functional test for team_alltoallmem_on_stream
This commit adds a new functional test to verify the correctness of
the host-side rocshmem_team_alltoallmem_on_stream API.
* Add documentation for rocshmem_alltoallmem_on_stream
This commit adds API documentation for the host-side
rocshmem_alltoallmem_on_stream function in the collective routines
section. The documentation includes:
* updated the libva requirements for 7.2
* updated with feedback from Aryan
* messed up the library reqs; fixed
* added a later
* made the changelog clearer
[ROCm/rocjpeg commit: 1a86352fdd]
* updated the libva requirements for 7.2
* updated with feedback from Aryan
* messed up the library reqs; fixed
* added a later
* made the changelog clearer