* SWDEV-541623 - cuda parity hipLaunchCooperativeKernelMultiDevice and hipExtLaunchMultiKernelMultiDevice
numDevices does not match the system devices
* SWDEV-541623 - enable Unit_hipExtLaunchMultiKernelMultiDevice_Negative_MultiKernelSameDevice
---------
Co-authored-by: agunashe <ajay.gunashekar@amd.com>
This is to prevent calling catch2 macros from outside catch2 TEST_CASE
that can lead to undefined bahavior. This change also disables
hipGetProcAddress tests that are not supported on static build.
Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
* SWDEV-548838 Add local and global fence support for barrier function
The original barrier function didn't distinct between local and global scope. There was only __CLK_LOCAL_MEM_FENCE which triggers both local and global fence. This commit introduces __CLK_LOCAL_MEM_FENCE and __CLK_GLOBAL_MEM_FENCE that properly distinguish the scopes.
---------
Co-authored-by: Tim <Tim.Gu@Amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Tim Gu <timgu102@amd.com>
Updated: rocm_smi.py
- Remove all else: clauses from functions where rsmi_ret_ok is part of the if clause, as requested.
- rsmi_ret_ok() function already handles unsucessful return codes and gracefully handles them.
- Updated check_runtime_status() function to sweep through /sys/class/drm to find active runtime_status.
- Updated the message to' AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status'
- This clarifies the status of the GPU and tells them where to check for more info.
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
* Added check in Queue::sync to verify that there is a callback for every dispatch
* Removed new atomic, using get_balanced_signal_slots() atomic with initial value of NUM_SIGNALS to verify dispatches complete
* rocr: Fix Incorrect Assertion Check
The wrong variable is used in the assertion statement, should be error
checking for the value of paramEndLoc after it is modified by the call
to find().
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* rocr: Fix Potential Undefined Behaviour
In the event that the SvmProfileControl destructor is called and
event == -1 is true then the call to close(event) is effectively
close(-1) which is undefined behaviour. This has been changed to only
call close() on valid file descriptors.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* rocr: Add Error Check on Bytes Read
In the case that there is an incomplete read the call to copyTo() will
now return an error.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* rocr: Fix Exception Error
Destructors are implicitly marked with noexcept being true by default
so if its not explicitly marked false in the destructor or the
functions it calls, any thrown exceptions will cause the program to
crash.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
---------
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>
* Use queries instead of views in summary.py
* Export queries when created
* Remove HIP and HSA from output
* Fix domain query
* Export summary queries in the main function
* Fix comments and variable names
* Change syntax for old python versions
---------
Co-authored-by: Young Hui <young.hui@amd.com>
Legal Requirements:
For AMD software being released as open source, add copyright at the top of each new file.
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
* Fix grid_group::group_dim to return grid_dim and not block_dim
* Add unit test for grid_group.group_dim()
* Fix unit test errors
* Skip group_dim() assertions for base_type test
* Added Fortran (amdflang) openmp tests using the openmp-vv project
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* add more derived metrics for navi4.
* addr comments
* addr comments, and add more derived counters.
* EOF.
* misc.
* remove duplicate counter.
* misc.
* Remove gfx12 architecture definition for ldslatency
* remove extra architectures for gfx12.
* use wgp for normalization
* move these changes to another PR.
---------
Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
If a compressed changelog exists from a previous build, reconfiguring
the project fails with
```
[rocm-core configure] CMake Error at utils.cmake:213 (message):
[rocm-core configure] Failed to compress: gzip:
[rocm-core configure] /home/ben/src/TheRock/build/base/rocm-core/build/DEBIAN/changelog.Debian.gz
[rocm-core configure] already exists; not overwritten
```
Add `-f` to force overwriting.