- Correct defintion of HSA_QUEUE_TYPE_COOPERATIVE to be a queue type
and not a bit mask.
- Correct implementation of hsa_queue_type_t to treat is as an
enumeration type and not a bit mask. In particular
HSA_QUEUE_TYPE_COOPERATIVE is a distinct queue type that uses the
multi producer protocol, and is not a bit set value.
Change-Id: I9415be8853671e5511e16e306caf16020e8c84af
There are a couple problems with this. First, llvm-dis is an unstable
llvm development tool and 3rd party users should generally not rely on
it. The text format is unstable, and the regex here isn't even
explicitly looking for the target triple field, so it could
accidentally find something else. Second, picking the target to
compile based on the library you are linking is a fundamentally
backwards decision. The target you're compiling for changes the
library you would want to link. The device libraries are only ever
compiled with amdgcn-amd-amdhsa. If we had a second triple, this
should be explicitly building for any it cares about.
Change-Id: I3bae8398f60f78df61ab2177aa9e83f47ec6dea4
The ROCR trap handler should check for all end program instructions
and not halt on them. Mask off the imm16 before comparing the
instruction to the s_endpgm opcode.
Change-Id: I669ffc7f5b699d7daf0c8ec5761ed7bb193f07a7
New trap handler ABI: Record in ttmp11[8:7] the event that caused the
trap handler to be entered. We currently record 2 events, trap_raised
if an s_trap instruction was executed, or excp_raised if an exception
(MEM_VIOL or ILLEGAL_INST) was raised.
Change-Id: Ie278c8277437b3b67c2737dcd1a12fe6511df428
Remove the hard-coding of "SHARED" as the lib type, and move any
SO-specific linking to only happen if the .so exists in the first place
Change-Id: I3f0bfd5c03f19b2425423b4dc8eed8fd87acc1d6
Changes in the compiler are being made to add controls for XNACK and SRAM ECC
for all targets which can support these features. By default the conservatively
correct settings of XNACK on and SRAM ECC on will be used. This change is to
facilitate these backend updates.
Change-Id: I2fd6b6bc1d32937737e7f56d8e08c70fe781c745
IPC create must only be used on whole ROCr allocations.
Fragments were allowing handle creation with offsets.
Change-Id: I1faa96d36bc7a6199bdc2e3ff1b8871d1a36a2fa
This has been the default mode for a while now since we don't
distribute or build the finalizer. Removing the attempt cleans
up debug mode messages that are causing confusion.
Change-Id: I8162c95abd5bbedaa22b90191f7a384a34c388ae
Lock API suceeds but the GPU still faults on the address.
This should be fixed in Thunk and/or KFD as well.
Change-Id: I8b2fbcae61ab181e4fe7f0b64e43a5f0772efb24
Iterate the loaded shared objects to see if the given elf image binary
is part of a loaded segment.
Change-Id: I074cacd99eb5b59f883f4ce2bd901e0e35a660b8
- Update the documentation comment in hsa_ext_amd.h, which contained
contradictory and incorrect information about an argument to the
hsa_amd_agents_allow_access function.
Change-Id: I60b0dbbdc761078cd81906bc2c63a27d7e6b53e1
- Symlink creation is corrected only for deb packages
- It is follow up package of http://git.amd.com:8080/c/hsa/ec/hsa-runtime/+/334403
- configure_file() is called to update the scripts with proper cmake variable values
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: I0e833ead265166411e83593fd57265a9ab356904
CPack now incorrectly adds two copies of directory symlinks when
building Debian packages. This causes dpkg to see a file conflict
and fail installing.
The correct long term solution is to remove the symlink and use a
flat directory structure. This patch adds the symlink in the post
install script as a workaround until we can switch to flat layout.
Change-Id: I879b6cbc2661c19df3db639cb42fba0972fddb93
HDP will now be used for coarse grain kernarg so needs to be
reported without consideration of fine grain vram over pcie.
Change-Id: I648167299faa583876a3d8685c3b3c4d8d31ebf9
Setting to 1 prevents the scratch handler from reducing peak occupancy.
Scratch allocations that would normally reduce peak occupancy will
instead fail.
Diagnostic for TF and PyTorch.
Change-Id: I2d7ea47077eb5cf708251c8aa3fd183ad4261be0
scratch_used_large_ was uninitialized leading to the observed hang.
DynamicScratchHandler would wait for a large scratch release despite no
large scratch having yet been allocated. Fixes .
The patch also removes a potential race between AddScratchNotifier and
ReleaseQueueScratch. The race condition does not exist today since both
scratch alloc and release run on the same thread. The changes will
prevent this potential race from manifesting if the async event handler
is ever updated to use multiple threads.
Also enhances scratch occupancy reduction reporting. Reporting now
prints the initial request size as well as the allocated size and the
effect on occupancy this has. Occupancy is computed in terms of the
requesting dispatch grid size so may be >100%.
Change-Id: I0fc5ee01467ff4c29bdd25d545177c97862c3bd9
Ensures that all CPU agents will have a pool handle to allocate
system memory. These pools will have no numa binding since the
node their owning Agent represents has no installed memory.
Change-Id: I9f72b455d633646839753c6719ff7f6a4c41f7c4
- This new path is required when libhsaruntime.so is referred
from the top level ROCm lib directory.
- Once ROCm stack lib/lib64 structure is flatten, RUNPATH in all
the libraries needs to be updated.
Change-Id: I369131ce93e14958ec57a54701671f2bfd8d522a
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Attribute optimize(0) doesn't appear to be helpful helpful. This
prevents optimization in the function but not at call sites to the
function. The function may still be inlined since it has no side
effect (in some cases that we currently don't support).
Having a side effect prevents a call site optimization that allows
removal of a noinline function call with no side effect. Call site
optimization should only happen (in GCC at least) when using whole
program optimization so this may be stronger than we strictly need.
Also added _amdgpu_r_debug to the exported symbol list (global) and
switched to the standard macro for an exported symbol (HSA_API).
Without being in the global list the debugger will not find this
symbol if the binary has been stripped.
Change-Id: Ieb00175ccc55fda4491deee44711cd55b3f24aeb
Adding patch number based on ROCM build/release to have unique
file name for libraries across multiple versions of ROCM.
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: I58d665b0e7d577b5bd7a6000d1202a0242672727
Lack of cache controls only allow operating SDMA at
agent scope. All copy APIs are defined at system scope so may
result in data errors.
Change-Id: I9cd10007defddcbf8feb14a2e3daa1ba17c0489f
Queues should transition to ref counting for all queues eventually.
That cleanup will be part of shared queue pooling support.
Change-Id: I217ff5d573156678b9559da6fb81baa8cd31c617
Temporary workaround for 2.10 release. RCCL, compiler, or firmware
must be corrected and this code reverted before another ASIC release.
Change-Id: I27851353289b93df9acb72d28b8c6ccb9f7f7d7a
Debugger path is taken for (trap_id >= 3) and single step exceptions.
Other traps/exceptions behave as before.
Change-Id: I276c0eb69953709968353a57717ee017d22348a2
Strip should only apply to the output target library. Symlinks
with .so endings which will be relocated during install will cause
strip to fail, aborting the build.
Change-Id: Ieb598c2cec5277d9d14c8afa88b91ca2c7f4412d
Using branch point for count since last change since we don't
have questions answered on tags yet.
Removed unused CMake files.
Restructured CMake to use the cache rather than only commandline
and be ccmake & cmake-gui friendly.
Dependency search paths are added for the Repo tree layout.
Search paths still needed for install paths.
Simplified packaging. hsa-ext-rocr-dev package and contents now
build from the package CMake rather than being 3 separate projects.
Not applying new version number or new install paths!
Change-Id: Ibea50dc8a6ab091e91857f78833f5379a4511547