This will allow the default target list to be branch
specific.
Change-Id: If8ecc14e2b7fb5ed2eb25ab447480308d539b248
[ROCm/ROCR-Runtime commit: d699039284]
Report traps and fatal exceptions through a wavefront's
amd_queue_t.queue_inactive_signal. Previously, only traps were
reported and requireed the compiler to pass in the signal pointer
in s[0:1].
The signal is obtained through a mapping from doorbell index to
amd_queue_t*. The doorbell is fetched within a wavefront through
the gfx9+ S_SENDMSG(MSG_GET_DOORBELL) instruction.
Change-Id: I319b45f2e15dfcfe4db8f4065da1136e9539a42b
[ROCm/ROCR-Runtime commit: ff8f439112]
Assembler toolchains are moving from SP3 to LLVM. Replace trap handler
source code with LLVM equivalent.
Fix a trap issue with SQ_WAVE_IB_STS restore. Mostly harmless as all
traps are currently considered fatal to the wavefront.
Change-Id: Iacecd9dd31a1d96a083c8b8327f442f33c861f9f
[ROCm/ROCR-Runtime commit: 6ed686ee29]
Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback
to notify when HSA memory has been released.
Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d
[ROCm/ROCR-Runtime commit: 299874f17d]
In several places aql packets were written to queue all at once
instead of doing the header atomically. These cases have been
fixed.
There were a few hsa_signal leaked that have been addressed.
There was some duplication of code that has been addressed.
Addresses ROCMOPS-456
Change-Id: Ia1869bc370f92e49ac560301df47741d5f76978e
[ROCm/ROCR-Runtime commit: 081a2cc875]
IPC was failing due to calling fork when HSA was open. The fix
was correcting incomplete cleanup in several other tests.
TestBase::Close (via CommonCleanUp) now checks that HSA is properly
closed between tests.
rocrtstPerf.Memory_Async_Copy uses hwloc which uses OpenCL which
has no shutdown routine. Consequently this test can not cleanup
properly. I added a hack to force HSA refcount to the value
it should have if OpenCL were cleaning up but this leaks resources
and potentially puts hwloc & OpenCL in a bad state.
OpenCL loads LLVM which installs some exit handlers. Those handlers
can't execute in a child process and can't be removed since OpenCL
doesn't cleanup. IPC hacks around this by aborting rather than exiting
in the child process.
Change-Id: I92326a73d7b11632208717d99728e6dafdc7d3ca
[ROCm/ROCR-Runtime commit: bb980462e7]
CPUClockCounter is not NTP adjusted (CLOCK_MONOTONIC_RAW) so should be
better for measurements. However, it is implemented with syscall while
CLOCK_MONOTONIC is implemented via vDSO. The latency increase becomes
significant when language layers make corresponding clock measurements.
Reverting to CLOCK_MONOTONIC will reduce latency and allow small
duration events to be measured at the cost of incorporating NTP
frequency skew errors. NTP may adjust frequency by 500ppm so limits us
to ~3 decimals in elapsed time.
Change-Id: I920b9f707f47109d80d6c256c475638c03fb8d76
[ROCm/ROCR-Runtime commit: 4b22d24346]
Description was inconsistent with itself and code. Existing behavior
returns HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL == true for system
memory pools only and system memory pools do require hsa_amd_agents_allow_access.
Change-Id: I64b287bff9fdb21688aa169296e410edf1b209b5
[ROCm/ROCR-Runtime commit: bbb90bdfc9]
Check if it is true or not. The string() call would define this to an
empty string, which would pass. This would then leave a trailing -
in the version string, which dpkg would error on during package
installation.
Change-Id: Ifb5fc15f5dde506e96bff7881a5d3f22d983406e
[ROCm/ROCR-Runtime commit: 0016c6ce5b]
API is a stateless lookup of RO data and needed to interpret
hsa_init error codes.
Change-Id: If80cba2f697843d08e529da0f790acf3c37127a7
[ROCm/ROCR-Runtime commit: 22de0e7fb9]
Search the local src directories first. If using a system
installed hsakmt, this would pick the installed hsa headers.
Change-Id: I9746d6e9db1749a130e4d93e024556754a537083
[ROCm/ROCR-Runtime commit: 22d29b55a4]
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.
Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65
[ROCm/ROCR-Runtime commit: a913549190]
Small times may be given to time conversion if GPU clocks are used to
accumulate elapsed time. Because HSA APIs deal in absolute time this
leads to large conversion offsets of order system uptime. Variation
in relative clock ratio estimation may be amplified in this case,
destroying elapsed time measurements.
This patch fixes the relative clock ratio used for times which predate
the call to hsa_init. This correlates errors in such times allowing
the elapsed time to be correctly computed.
The effective maximum system uptime before elapsed time conversion becomes
inaccurate is ~3.5 months. GPU event timestamps are good for process uptime
of ~3.5 months. These are limited by double's mantissa precision.
Change-Id: I48752ff354920439d91016d6f2b0c8ddfa60b445
[ROCm/ROCR-Runtime commit: 6e2a056e1b]
Exposed via agent info query. Only valid if fine grain PCIe memory is enabled.
Change-Id: Ib4770901592ec047276458926a947737f9b93bb5
[ROCm/ROCR-Runtime commit: 06376e726b]
At the moment it is not possible to build ROCr with Clang. This is
a spurious limitation. The present PR addresses it by guarding GCC
only flags and by fixing some additional warnings that Clang triggers;
one of said warnings did outline a rather interesting issue with math
being done on void*s. - AlexVlx
Void ptr arithmetic had already been fixed in amd-master branch.
Change-Id: I5ee97e20b5c40b10dd73facecabe75f02ba46462
[ROCm/ROCR-Runtime commit: e89f9807f1]
Non-paged memory can be IPC-shared even when HSA_USERPTR_FOR_PAGED_MEM
is enabled.
Change-Id: I8b1fa6d7a4a9327c78a77b3679697fbf55397093
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 0c6b9532d4]
KFD no longer reports MemoryAccessFault.Failure with retry fault
implementation. ROCr ignores the memory event when Failure = 0.
Use the Flags field instead, which will be non-zero when the
event is triggered.
Change-Id: Ie90799a303b0b2f1b476b20ffafdde79ae137182
[ROCm/ROCR-Runtime commit: 56f280c8a7]
Makes malloc memory accessible to GPUs so that the memory has the
capabilities of the pool it is locked to.
This admits fine grained locked memory and reserves API space for any future
special CPU pools.
Change-Id: If8c3dd8582a43f19d3d36b3763c1a688cc419ef0
[ROCm/ROCR-Runtime commit: a535e18cc1]
GCC allows arithmetic on void* treating void as char. Clang and
the language spec does not.
Change-Id: I939f2432f276979bb81881406e10528597ac6001
[ROCm/ROCR-Runtime commit: e5de33dd9a]
Modify the system event handler to support multiple users.
Name memory fault reason codes.
Change-Id: I1b5979b36ab15637eb2be59a61e2d57e76d0a70e
[ROCm/ROCR-Runtime commit: 67376e06ab]
Part 1 of 2.
Enables fine grain vram over PCIe based on env flag.
Part 2 will extend to XGMI.
Change-Id: I8ad506e004b398d56d462b0200274eae2293a461
[ROCm/ROCR-Runtime commit: c56d86100b]
hsa_exceptions with empty what() strings will not report in debug builds.
Change-Id: I0d424d3b1d3044808ece1720a460a57d68bf878e
[ROCm/ROCR-Runtime commit: 344d964f9f]
Version is now a fixed string that matches previous internal builds.
This also matches released DEB/RPM builds (but not github versions).
Change-Id: Id4819b9de8c855250aadf1a1cebb187b5c031721
[ROCm/ROCR-Runtime commit: 400304aa10]
Both support dynamic scratch allocation so there is no reason
to preemptively allocate on APUs.
Change-Id: I22eaec01a83a091ee9dc1f594a1a9106e8dd81fc
[ROCm/ROCR-Runtime commit: 65d39cc476]
- Skip symbols that are STB_LOCAL and not STT_AMDGPU_HSA_KERNEL
Change-Id: I68567f58de9bf3f07dbd8020ef63f47667c86367
[ROCm/ROCR-Runtime commit: 8bee6e4976]
Decrease number of iterations and array sizes in some cases.
Change-Id: I1a0a43faa907b28662ff3a44c172950ed7b1500e
[ROCm/ROCR-Runtime commit: 6bca866e6c]
HW has limited bits for wave scratch base address stride. Enforcement
prevents programs with larger than supported scratch allocations from
running and clobbering neighboring scratch space.
Change-Id: I574da888e9d1d5e290a9c0025ba13b5ef9f1e5c0
[ROCm/ROCR-Runtime commit: 8e4177382a]