On gfx11, with a sequence such as
s_trap 2
s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
s_endpgm
the s_sendmsg does deallocate registers while the wave is supposed to be
stopped. As a result, the wave cannot do the expected context save
operations, and cannot context save.
To avoid this problem, park the wave in the trap handler for gfx11.
Note that gfx11 has implemented an instruction cache prefetch. When
parked, the prefetch tries to access memory past the end of trap handler
which causes memory violation exceptions to be reported. To avoid this,
we need to add padding at the end of the trap handler. The padding
consists of `s_code_end` instructions Given that the trap handler is
loaded at a 0x1000 aligned address the maximum prefetch amount (in
bytes) is given by `256 - (trap_handler_size % 64)`.
Change-Id: I5446da54a965a64f21cb0fd3ce3caa4b6137a933
[ROCm/ROCR-Runtime commit: 2f2ba050f6]
gfx940 uses ttmp11 to hold the queue packet index so the first level
trap handler uses ttmp13 instead to save ib_sts.
Repurpose ttmp11[31] to mean that the ttmps are initialized. The issue
was that the debugger could not tell whether ttmp6 was written by the
trap handler when determining the stop reason.
If ttmp11[31]=0, then the trap handler has not been executed and ttmp6
should be assumed to be 0. If ttmp11[31]=1, then ttmp6 holds the
trap_id, if an s_trap instruction caused the exception.
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Lancelot Six <lancelot.six@amd.com>
Change-Id: I9af903abae044b9ec530306229caf3b883f3ee46
[ROCm/ROCR-Runtime commit: f31b312611]
Also fix hsaKmtRuntimeEnable error handling. Continue if ioctl fails.
Change-Id: I754ccba5910ccfef6f1ada1415593ef89ce33aba
[ROCm/ROCR-Runtime commit: 7e4088309d]
Park the wave, if it is stopped, to avoid halting it at an s_endpgm
instruction if the architecture does not support it.
Free ttmp6 by converting the dispatch_ptr into a queue packet index
(25-bit) and storing it in ttmp7[24:0].
Save the exception PC in ttmp11[22:7] ttmp6[31:0].
Change-Id: Iaa3c5baf5b488c0b534044d338f12bffa63ddce2
[ROCm/ROCR-Runtime commit: ea6ee0aa81]
Replace the stop reasons ttmp11.trap_raised and ttmp11.excp_raised
with ttmp11.wave_stopped which indicates that the trap handler has
halted the wave as the result of an event (trap, single-step or
exception).
If the wave is stopped because of a trap, also record the trap_id in
ttmp11.saved_trap_id[7:0].
Save status.halt in ttmp11.saved_status_halt, so that it can be
restored when resuming a wave (changing a wave's state from stopped to
running or single-stepping).
Change-Id: I7322f59b60e8cc1b92bf5f067dba606a3109ef49
[ROCm/ROCR-Runtime commit: 9ca79d072a]
To support single stepping the instruction preceding an s_endpgm,
unwind the PC by 8 bytes and set ttmp11[9] to notify the debugger
that the wave is halted with a modified PC.
Bump the debug r_version for this new trap handler ABI.
Change-Id: I55e4e0d65576f92da14a336266c31c513baab547
[ROCm/ROCR-Runtime commit: 8aec53969f]
Code object V2 had the ability to support the following queries:
- HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE
- HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE
- HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT
- HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT
However code object V3 onwards cannot support these as the kernel
descriptor changed. These queries need to be deprecated.
Until then return more reasonable values:
- For kernarg alignment return 16 which is the minimum alignment
required by the HSA standard.
- For kernarg size return the field from the kernel descriptor which
is a hint. If it is 0 then the compiler is not specifying the kernarg
size, or the kernel has no kernarg.
Change-Id: I19ce6cd0f3658a2bf62277492f39100ea5ab4256
[ROCm/ROCR-Runtime commit: ef755e4c82]
l_name is populated by strdup which requires using free rather
than delete.
Change-Id: I9d9bdcfaa3ef095502270f332b95a0ee5c0bbcfc
[ROCm/ROCR-Runtime commit: 9c20f0e649]
Adds the following:
- New factory method to create a code object reader from
file with offset and size.
- A pair of queries on a loaded code object to get the URI name/length.
- A bump to the AMD vendor loader extension API and its associated table.
Change-Id: I17c83e9c2447d29a43c438459395365f786a3611
[ROCm/ROCR-Runtime commit: 9eb735ec24]
New trap handler ABI: Record in ttmp11[8:7] the event that caused the
trap handler to be entered. We currently record 2 events, trap_raised
if an s_trap instruction was executed, or excp_raised if an exception
(MEM_VIOL or ILLEGAL_INST) was raised.
Change-Id: Ie278c8277437b3b67c2737dcd1a12fe6511df428
[ROCm/ROCR-Runtime commit: 00da82f951]
Iterate the loaded shared objects to see if the given elf image binary
is part of a loaded segment.
Change-Id: I074cacd99eb5b59f883f4ce2bd901e0e35a660b8
[ROCm/ROCR-Runtime commit: 5f783494f1]
Attribute optimize(0) doesn't appear to be helpful helpful. This
prevents optimization in the function but not at call sites to the
function. The function may still be inlined since it has no side
effect (in some cases that we currently don't support).
Having a side effect prevents a call site optimization that allows
removal of a noinline function call with no side effect. Call site
optimization should only happen (in GCC at least) when using whole
program optimization so this may be stronger than we strictly need.
Also added _amdgpu_r_debug to the exported symbol list (global) and
switched to the standard macro for an exported symbol (HSA_API).
Without being in the global list the debugger will not find this
symbol if the binary has been stripped.
Change-Id: Ieb00175ccc55fda4491deee44711cd55b3f24aeb
[ROCm/ROCR-Runtime commit: 3e9aca0f34]
Allow user specified profiles if the HSAIL note is not found.
Konstantin reviewed and approved. HSAIL note is not generated by LLVM.
Change-Id: I40fbfbaedd6787b6a716507918f698d02007afe1
[ROCm/ROCR-Runtime commit: 465a8eb40b]
- Skip symbols that are STB_LOCAL and not STT_AMDGPU_HSA_KERNEL
Change-Id: I68567f58de9bf3f07dbd8020ef63f47667c86367
[ROCm/ROCR-Runtime commit: 8bee6e4976]
- Process dynamic relocation even if there is
no symbol associated to it.
Change-Id: Iaefee682ee52f5acda8280e5764e6d5fd992774a
[ROCm/ROCR-Runtime commit: a447d79430]
This includes the changes provided by Konstantin, "Add xnack from elf header" (Change 136389).
Change-Id: I95e51141caa0d7c21903b09212c02e4906ec54a3
[ROCm/ROCR-Runtime commit: 8e3d26c617]
- Add support for R_AMDGPU_RELATIVE64 relocation record.
- Return status error if any unsupported relocation record encountered.
Change-Id: Icbb5dcb81109a70c1f2195412a0df58a11be9da1
[ROCm/ROCR-Runtime commit: d472b24d05]
1. Add hsa ext api hsa_amd_register_vmfault_handler for debugger to register callback in case of VM fault.
2. Extend hsa_ven_amd_loader API to:
(1) iterate loaded code objects in executable:
hsa_ven_amd_loader_executable_iterate_loaded_code_objects
(2) get loaded code object info:
hsa_ven_amd_loader_loaded_code_object_get_info
3. Make the id of hsa_queue the same as the one used in communication with thunk (for amd_aql_queue)
Change-Id: I68910809e59e24297350d262606f00e96c14bcbd
[ROCm/ROCR-Runtime commit: ce6aee01ed]
- Includes Sean's latest changes
- Cleanups/improvements
- Fixes for few bugs that crept over from previous releases
Change-Id: I839dc4895bf13ebd0afc8843424387a9fef667b0
[ROCm/ROCR-Runtime commit: c2c993e0d8]
HSA Finalizer: Add dumping of code object, ISA and executable to loader.
This is controlled by loader options -dump-all, -dump-isa, -dump-code, -dump-exec
The options can now also be set with env variable LOADER_OPTIONS_APPEND.
Added tests to finalizer_offline
Testing: smoke, dumping on hardware
Reviewed by: Konstantin Zhuravlyov
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1255351]
[ROCm/ROCR-Runtime commit: a795909bca]