Temporary workaround for 2.10 release. RCCL, compiler, or firmware
must be corrected and this code reverted before another ASIC release.
Change-Id: I27851353289b93df9acb72d28b8c6ccb9f7f7d7a
Debugger path is taken for (trap_id >= 3) and single step exceptions.
Other traps/exceptions behave as before.
Change-Id: I276c0eb69953709968353a57717ee017d22348a2
Strip should only apply to the output target library. Symlinks
with .so endings which will be relocated during install will cause
strip to fail, aborting the build.
Change-Id: Ieb598c2cec5277d9d14c8afa88b91ca2c7f4412d
Using branch point for count since last change since we don't
have questions answered on tags yet.
Removed unused CMake files.
Restructured CMake to use the cache rather than only commandline
and be ccmake & cmake-gui friendly.
Dependency search paths are added for the Repo tree layout.
Search paths still needed for install paths.
Simplified packaging. hsa-ext-rocr-dev package and contents now
build from the package CMake rather than being 3 separate projects.
Not applying new version number or new install paths!
Change-Id: Ibea50dc8a6ab091e91857f78833f5379a4511547
- Use new buffer resource descriptor layout
- Handle wave32 scratch allocation error from CP
- Make wavefront size a property of scratch allocation requests
- Repurpose wave64-specific amd_queue_t.scratch_workitem_byte_size field
- Clear index_stride field in V# on gfx10, calculated per-dispatch by CP
Change-Id: If2acdf6430772abd4d6a8c792fc8c11260764dda
doorbell_queue_map should always be allocated or we will need to
add branches around all accesses.
Change-Id: I994c0eaf4be62c1a4a37bd06894272dba1fc1da6
sdma end ts must be 256 bit aligned in oss 3.0 and prior. Using
the ts pool requires copying into the signal and is a significant
performance penalty for small copies.
SharedSignal is 128 bytes due to alignment so can host the end ts.
Move sdma end ts into SharedSignal and remove ts pool and ts copy.
Change-Id: I7899bda36ebc9adcaad1d3a3d2b7a489857cc9e8
Impacts GPU_ONLY signal type latency when waiting for small operations.
Using this type improves total SDMA small copy performance by ~40% if
the signal is allowed to spin freely.
Change-Id: I27aa128c63a1bacb3f51fb08f166e4e1d6fef651
Remove agent lookup in time stamp translation for IPC signals. The copy
agent handle is not shared so does not need to be checked for cross
process use. Cross process copy-timestamp read is illegal and continues
to deliver garbage.
Store the copy agent properly when doing CPU-CPU copies.
Change-Id: Ib4008f66ff866922047749dd556c84a32021c1fd
ucode versions are per asic so not valid for feature enablement outside
of bringup/dev. Feature is older than the latest ioctl change that
the thunk depends on so use of this patch with kernel packages that
don't contain the feature is not possible in a supported environment.
Change-Id: I36b14176a7d642017ef1518aeade454b0f3dc749
If M0[23] is set then the driver will interpret the interrupt as a
debug event, rather than a signal event.
Clear M0 before sending the interrupt. All paths here are terminal so
it's not necessary to save/restore M0.
Change-Id: Ibd85b8cc6f8556941f2308a2c3fa3c68702cd606
agentOwner from thunk reflects the GPU which holds the device alias.
We need to return a CPU to better reflect that the memory is system memory.
Change-Id: I9233f8779a4bfd471f68dbbbce07ae4528412e18
Allow user specified profiles if the HSAIL note is not found.
Konstantin reviewed and approved. HSAIL note is not generated by LLVM.
Change-Id: I40fbfbaedd6787b6a716507918f698d02007afe1
Report traps and fatal exceptions through a wavefront's
amd_queue_t.queue_inactive_signal. Previously, only traps were
reported and requireed the compiler to pass in the signal pointer
in s[0:1].
The signal is obtained through a mapping from doorbell index to
amd_queue_t*. The doorbell is fetched within a wavefront through
the gfx9+ S_SENDMSG(MSG_GET_DOORBELL) instruction.
Change-Id: I319b45f2e15dfcfe4db8f4065da1136e9539a42b
Assembler toolchains are moving from SP3 to LLVM. Replace trap handler
source code with LLVM equivalent.
Fix a trap issue with SQ_WAVE_IB_STS restore. Mostly harmless as all
traps are currently considered fatal to the wavefront.
Change-Id: Iacecd9dd31a1d96a083c8b8327f442f33c861f9f
Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback
to notify when HSA memory has been released.
Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d
CPUClockCounter is not NTP adjusted (CLOCK_MONOTONIC_RAW) so should be
better for measurements. However, it is implemented with syscall while
CLOCK_MONOTONIC is implemented via vDSO. The latency increase becomes
significant when language layers make corresponding clock measurements.
Reverting to CLOCK_MONOTONIC will reduce latency and allow small
duration events to be measured at the cost of incorporating NTP
frequency skew errors. NTP may adjust frequency by 500ppm so limits us
to ~3 decimals in elapsed time.
Change-Id: I920b9f707f47109d80d6c256c475638c03fb8d76
Description was inconsistent with itself and code. Existing behavior
returns HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL == true for system
memory pools only and system memory pools do require hsa_amd_agents_allow_access.
Change-Id: I64b287bff9fdb21688aa169296e410edf1b209b5