doorbell_queue_map should always be allocated or we will need to
add branches around all accesses.
Change-Id: I994c0eaf4be62c1a4a37bd06894272dba1fc1da6
[ROCm/ROCR-Runtime commit: f9d3796db8]
sdma end ts must be 256 bit aligned in oss 3.0 and prior. Using
the ts pool requires copying into the signal and is a significant
performance penalty for small copies.
SharedSignal is 128 bytes due to alignment so can host the end ts.
Move sdma end ts into SharedSignal and remove ts pool and ts copy.
Change-Id: I7899bda36ebc9adcaad1d3a3d2b7a489857cc9e8
[ROCm/ROCR-Runtime commit: ec5ac95dce]
Impacts GPU_ONLY signal type latency when waiting for small operations.
Using this type improves total SDMA small copy performance by ~40% if
the signal is allowed to spin freely.
Change-Id: I27aa128c63a1bacb3f51fb08f166e4e1d6fef651
[ROCm/ROCR-Runtime commit: 5adb73fffd]
Remove agent lookup in time stamp translation for IPC signals. The copy
agent handle is not shared so does not need to be checked for cross
process use. Cross process copy-timestamp read is illegal and continues
to deliver garbage.
Store the copy agent properly when doing CPU-CPU copies.
Change-Id: Ib4008f66ff866922047749dd556c84a32021c1fd
[ROCm/ROCR-Runtime commit: ea8c99f452]
ucode versions are per asic so not valid for feature enablement outside
of bringup/dev. Feature is older than the latest ioctl change that
the thunk depends on so use of this patch with kernel packages that
don't contain the feature is not possible in a supported environment.
Change-Id: I36b14176a7d642017ef1518aeade454b0f3dc749
[ROCm/ROCR-Runtime commit: 8133563a93]
Also removed an unnecessary cache flush in dependency barrier packet.
Change-Id: I573df3bdf0a10df0bcd78025672c44038f8091ff
[ROCm/ROCR-Runtime commit: 4647a5454d]
This is to allow allocations in system memory that exceed sizes
reported by a CPU device
Change-Id: I3d10d192aafcefbe4107f69b7c5e30bf7f836619
[ROCm/ROCR-Runtime commit: 3201f68f72]
If M0[23] is set then the driver will interpret the interrupt as a
debug event, rather than a signal event.
Clear M0 before sending the interrupt. All paths here are terminal so
it's not necessary to save/restore M0.
Change-Id: Ibd85b8cc6f8556941f2308a2c3fa3c68702cd606
[ROCm/ROCR-Runtime commit: ad717d2e98]
agentOwner from thunk reflects the GPU which holds the device alias.
We need to return a CPU to better reflect that the memory is system memory.
Change-Id: I9233f8779a4bfd471f68dbbbce07ae4528412e18
[ROCm/ROCR-Runtime commit: 6e07bc8dc4]
Allow user specified profiles if the HSAIL note is not found.
Konstantin reviewed and approved. HSAIL note is not generated by LLVM.
Change-Id: I40fbfbaedd6787b6a716507918f698d02007afe1
[ROCm/ROCR-Runtime commit: 465a8eb40b]
This will allow the default target list to be branch
specific.
Change-Id: If8ecc14e2b7fb5ed2eb25ab447480308d539b248
[ROCm/ROCR-Runtime commit: d699039284]
Report traps and fatal exceptions through a wavefront's
amd_queue_t.queue_inactive_signal. Previously, only traps were
reported and requireed the compiler to pass in the signal pointer
in s[0:1].
The signal is obtained through a mapping from doorbell index to
amd_queue_t*. The doorbell is fetched within a wavefront through
the gfx9+ S_SENDMSG(MSG_GET_DOORBELL) instruction.
Change-Id: I319b45f2e15dfcfe4db8f4065da1136e9539a42b
[ROCm/ROCR-Runtime commit: ff8f439112]
Assembler toolchains are moving from SP3 to LLVM. Replace trap handler
source code with LLVM equivalent.
Fix a trap issue with SQ_WAVE_IB_STS restore. Mostly harmless as all
traps are currently considered fatal to the wavefront.
Change-Id: Iacecd9dd31a1d96a083c8b8327f442f33c861f9f
[ROCm/ROCR-Runtime commit: 6ed686ee29]
Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback
to notify when HSA memory has been released.
Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d
[ROCm/ROCR-Runtime commit: 299874f17d]
In several places aql packets were written to queue all at once
instead of doing the header atomically. These cases have been
fixed.
There were a few hsa_signal leaked that have been addressed.
There was some duplication of code that has been addressed.
Addresses ROCMOPS-456
Change-Id: Ia1869bc370f92e49ac560301df47741d5f76978e
[ROCm/ROCR-Runtime commit: 081a2cc875]
IPC was failing due to calling fork when HSA was open. The fix
was correcting incomplete cleanup in several other tests.
TestBase::Close (via CommonCleanUp) now checks that HSA is properly
closed between tests.
rocrtstPerf.Memory_Async_Copy uses hwloc which uses OpenCL which
has no shutdown routine. Consequently this test can not cleanup
properly. I added a hack to force HSA refcount to the value
it should have if OpenCL were cleaning up but this leaks resources
and potentially puts hwloc & OpenCL in a bad state.
OpenCL loads LLVM which installs some exit handlers. Those handlers
can't execute in a child process and can't be removed since OpenCL
doesn't cleanup. IPC hacks around this by aborting rather than exiting
in the child process.
Change-Id: I92326a73d7b11632208717d99728e6dafdc7d3ca
[ROCm/ROCR-Runtime commit: bb980462e7]
CPUClockCounter is not NTP adjusted (CLOCK_MONOTONIC_RAW) so should be
better for measurements. However, it is implemented with syscall while
CLOCK_MONOTONIC is implemented via vDSO. The latency increase becomes
significant when language layers make corresponding clock measurements.
Reverting to CLOCK_MONOTONIC will reduce latency and allow small
duration events to be measured at the cost of incorporating NTP
frequency skew errors. NTP may adjust frequency by 500ppm so limits us
to ~3 decimals in elapsed time.
Change-Id: I920b9f707f47109d80d6c256c475638c03fb8d76
[ROCm/ROCR-Runtime commit: 4b22d24346]
Description was inconsistent with itself and code. Existing behavior
returns HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL == true for system
memory pools only and system memory pools do require hsa_amd_agents_allow_access.
Change-Id: I64b287bff9fdb21688aa169296e410edf1b209b5
[ROCm/ROCR-Runtime commit: bbb90bdfc9]
Check if it is true or not. The string() call would define this to an
empty string, which would pass. This would then leave a trailing -
in the version string, which dpkg would error on during package
installation.
Change-Id: Ifb5fc15f5dde506e96bff7881a5d3f22d983406e
[ROCm/ROCR-Runtime commit: 0016c6ce5b]
API is a stateless lookup of RO data and needed to interpret
hsa_init error codes.
Change-Id: If80cba2f697843d08e529da0f790acf3c37127a7
[ROCm/ROCR-Runtime commit: 22de0e7fb9]
Search the local src directories first. If using a system
installed hsakmt, this would pick the installed hsa headers.
Change-Id: I9746d6e9db1749a130e4d93e024556754a537083
[ROCm/ROCR-Runtime commit: 22d29b55a4]
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.
Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65
[ROCm/ROCR-Runtime commit: a913549190]
Small times may be given to time conversion if GPU clocks are used to
accumulate elapsed time. Because HSA APIs deal in absolute time this
leads to large conversion offsets of order system uptime. Variation
in relative clock ratio estimation may be amplified in this case,
destroying elapsed time measurements.
This patch fixes the relative clock ratio used for times which predate
the call to hsa_init. This correlates errors in such times allowing
the elapsed time to be correctly computed.
The effective maximum system uptime before elapsed time conversion becomes
inaccurate is ~3.5 months. GPU event timestamps are good for process uptime
of ~3.5 months. These are limited by double's mantissa precision.
Change-Id: I48752ff354920439d91016d6f2b0c8ddfa60b445
[ROCm/ROCR-Runtime commit: 6e2a056e1b]
Exposed via agent info query. Only valid if fine grain PCIe memory is enabled.
Change-Id: Ib4770901592ec047276458926a947737f9b93bb5
[ROCm/ROCR-Runtime commit: 06376e726b]
At the moment it is not possible to build ROCr with Clang. This is
a spurious limitation. The present PR addresses it by guarding GCC
only flags and by fixing some additional warnings that Clang triggers;
one of said warnings did outline a rather interesting issue with math
being done on void*s. - AlexVlx
Void ptr arithmetic had already been fixed in amd-master branch.
Change-Id: I5ee97e20b5c40b10dd73facecabe75f02ba46462
[ROCm/ROCR-Runtime commit: e89f9807f1]