- Correct defintion of HSA_QUEUE_TYPE_COOPERATIVE to be a queue type
and not a bit mask.
- Correct implementation of hsa_queue_type_t to treat is as an
enumeration type and not a bit mask. In particular
HSA_QUEUE_TYPE_COOPERATIVE is a distinct queue type that uses the
multi producer protocol, and is not a bit set value.
Change-Id: I9415be8853671e5511e16e306caf16020e8c84af
[ROCm/ROCR-Runtime commit: bccb25fc33]
There are a couple problems with this. First, llvm-dis is an unstable
llvm development tool and 3rd party users should generally not rely on
it. The text format is unstable, and the regex here isn't even
explicitly looking for the target triple field, so it could
accidentally find something else. Second, picking the target to
compile based on the library you are linking is a fundamentally
backwards decision. The target you're compiling for changes the
library you would want to link. The device libraries are only ever
compiled with amdgcn-amd-amdhsa. If we had a second triple, this
should be explicitly building for any it cares about.
Change-Id: I3bae8398f60f78df61ab2177aa9e83f47ec6dea4
[ROCm/ROCR-Runtime commit: 96d4140609]
The ROCR trap handler should check for all end program instructions
and not halt on them. Mask off the imm16 before comparing the
instruction to the s_endpgm opcode.
Change-Id: I669ffc7f5b699d7daf0c8ec5761ed7bb193f07a7
[ROCm/ROCR-Runtime commit: df03a377f5]
Image swizzle mode will be set by the preferred surface info
function.
Change-Id: I41e639be53cafbb4db6cf15c159aa2bd457ec5be
[ROCm/ROCR-Runtime commit: 1440da3e15]
New trap handler ABI: Record in ttmp11[8:7] the event that caused the
trap handler to be entered. We currently record 2 events, trap_raised
if an s_trap instruction was executed, or excp_raised if an exception
(MEM_VIOL or ILLEGAL_INST) was raised.
Change-Id: Ie278c8277437b3b67c2737dcd1a12fe6511df428
[ROCm/ROCR-Runtime commit: 00da82f951]
Remove the hard-coding of "SHARED" as the lib type, and move any
SO-specific linking to only happen if the .so exists in the first place
Change-Id: I3f0bfd5c03f19b2425423b4dc8eed8fd87acc1d6
[ROCm/ROCR-Runtime commit: 33133ebd07]
Changes in the compiler are being made to add controls for XNACK and SRAM ECC
for all targets which can support these features. By default the conservatively
correct settings of XNACK on and SRAM ECC on will be used. This change is to
facilitate these backend updates.
Change-Id: I2fd6b6bc1d32937737e7f56d8e08c70fe781c745
[ROCm/ROCR-Runtime commit: 87202d4408]
IPC create must only be used on whole ROCr allocations.
Fragments were allowing handle creation with offsets.
Change-Id: I1faa96d36bc7a6199bdc2e3ff1b8871d1a36a2fa
[ROCm/ROCR-Runtime commit: 7712c7e743]
This has been the default mode for a while now since we don't
distribute or build the finalizer. Removing the attempt cleans
up debug mode messages that are causing confusion.
Change-Id: I8162c95abd5bbedaa22b90191f7a384a34c388ae
[ROCm/ROCR-Runtime commit: 3fe891d5da]
Pool size was being used where alloc_max_size should be.
Changes are necessary on NUMA systems where not all nodes have
installed memory.
Change-Id: If8f507cae50a8dfeae8572d4e39df757abe28599
[ROCm/ROCR-Runtime commit: a9470e3563]
Lock API suceeds but the GPU still faults on the address.
This should be fixed in Thunk and/or KFD as well.
Change-Id: I8b2fbcae61ab181e4fe7f0b64e43a5f0772efb24
[ROCm/ROCR-Runtime commit: 9fe44ed675]
Iterate the loaded shared objects to see if the given elf image binary
is part of a loaded segment.
Change-Id: I074cacd99eb5b59f883f4ce2bd901e0e35a660b8
[ROCm/ROCR-Runtime commit: 5f783494f1]
- Update the documentation comment in hsa_ext_amd.h, which contained
contradictory and incorrect information about an argument to the
hsa_amd_agents_allow_access function.
Change-Id: I60b0dbbdc761078cd81906bc2c63a27d7e6b53e1
[ROCm/ROCR-Runtime commit: 6d5781bb14]
- Symlink creation is corrected only for deb packages
- It is follow up package of http://git.amd.com:8080/c/hsa/ec/hsa-runtime/+/334403
- configure_file() is called to update the scripts with proper cmake variable values
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: I0e833ead265166411e83593fd57265a9ab356904
[ROCm/ROCR-Runtime commit: 241cdfdd01]
CPack now incorrectly adds two copies of directory symlinks when
building Debian packages. This causes dpkg to see a file conflict
and fail installing.
The correct long term solution is to remove the symlink and use a
flat directory structure. This patch adds the symlink in the post
install script as a workaround until we can switch to flat layout.
Change-Id: I879b6cbc2661c19df3db639cb42fba0972fddb93
[ROCm/ROCR-Runtime commit: f3b532b42d]
Checks for an IPC memory error and updates comments relevant
to rocr_visible_devices.
Change-Id: I9d2f2dd27f3fa04881d17387cce2692bc046edb2
[ROCm/ROCR-Runtime commit: a1c2439213]
HDP will now be used for coarse grain kernarg so needs to be
reported without consideration of fine grain vram over pcie.
Change-Id: I648167299faa583876a3d8685c3b3c4d8d31ebf9
[ROCm/ROCR-Runtime commit: 9c35780836]
Setting to 1 prevents the scratch handler from reducing peak occupancy.
Scratch allocations that would normally reduce peak occupancy will
instead fail.
Diagnostic for TF and PyTorch.
Change-Id: I2d7ea47077eb5cf708251c8aa3fd183ad4261be0
[ROCm/ROCR-Runtime commit: dc165c92bc]
scratch_used_large_ was uninitialized leading to the observed hang.
DynamicScratchHandler would wait for a large scratch release despite no
large scratch having yet been allocated. Fixes .
The patch also removes a potential race between AddScratchNotifier and
ReleaseQueueScratch. The race condition does not exist today since both
scratch alloc and release run on the same thread. The changes will
prevent this potential race from manifesting if the async event handler
is ever updated to use multiple threads.
Also enhances scratch occupancy reduction reporting. Reporting now
prints the initial request size as well as the allocated size and the
effect on occupancy this has. Occupancy is computed in terms of the
requesting dispatch grid size so may be >100%.
Change-Id: I0fc5ee01467ff4c29bdd25d545177c97862c3bd9
[ROCm/ROCR-Runtime commit: 6c556002d8]
Ensures that all CPU agents will have a pool handle to allocate
system memory. These pools will have no numa binding since the
node their owning Agent represents has no installed memory.
Change-Id: I9f72b455d633646839753c6719ff7f6a4c41f7c4
[ROCm/ROCR-Runtime commit: d53fe07687]
- This new path is required when libhsaruntime.so is referred
from the top level ROCm lib directory.
- Once ROCm stack lib/lib64 structure is flatten, RUNPATH in all
the libraries needs to be updated.
Change-Id: I369131ce93e14958ec57a54701671f2bfd8d522a
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
[ROCm/ROCR-Runtime commit: e931fd424b]
Attribute optimize(0) doesn't appear to be helpful helpful. This
prevents optimization in the function but not at call sites to the
function. The function may still be inlined since it has no side
effect (in some cases that we currently don't support).
Having a side effect prevents a call site optimization that allows
removal of a noinline function call with no side effect. Call site
optimization should only happen (in GCC at least) when using whole
program optimization so this may be stronger than we strictly need.
Also added _amdgpu_r_debug to the exported symbol list (global) and
switched to the standard macro for an exported symbol (HSA_API).
Without being in the global list the debugger will not find this
symbol if the binary has been stripped.
Change-Id: Ieb00175ccc55fda4491deee44711cd55b3f24aeb
[ROCm/ROCR-Runtime commit: 3e9aca0f34]
Adding patch number based on ROCM build/release to have unique
file name for libraries across multiple versions of ROCM.
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: I58d665b0e7d577b5bd7a6000d1202a0242672727
[ROCm/ROCR-Runtime commit: 54d94d02bd]
Lack of cache controls only allow operating SDMA at
agent scope. All copy APIs are defined at system scope so may
result in data errors.
Change-Id: I9cd10007defddcbf8feb14a2e3daa1ba17c0489f
[ROCm/ROCR-Runtime commit: 22a601292d]
Queues should transition to ref counting for all queues eventually.
That cleanup will be part of shared queue pooling support.
Change-Id: I217ff5d573156678b9559da6fb81baa8cd31c617
[ROCm/ROCR-Runtime commit: 0a43a107b1]
Temporary workaround for 2.10 release. RCCL, compiler, or firmware
must be corrected and this code reverted before another ASIC release.
Change-Id: I27851353289b93df9acb72d28b8c6ccb9f7f7d7a
[ROCm/ROCR-Runtime commit: 35c1ffa863]