The access type for extended scope fine grained memory was being returned as never
allowed by default
Change-Id: I0167ea0e5931053f22f2d2755bf426d43d2bb8e5
[ROCm/ROCR-Runtime commit: 82e7979c61]
On gfx11, with a sequence such as
s_trap 2
s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
s_endpgm
the s_sendmsg does deallocate registers while the wave is supposed to be
stopped. As a result, the wave cannot do the expected context save
operations, and cannot context save.
To avoid this problem, park the wave in the trap handler for gfx11.
Note that gfx11 has implemented an instruction cache prefetch. When
parked, the prefetch tries to access memory past the end of trap handler
which causes memory violation exceptions to be reported. To avoid this,
we need to add padding at the end of the trap handler. The padding
consists of `s_code_end` instructions Given that the trap handler is
loaded at a 0x1000 aligned address the maximum prefetch amount (in
bytes) is given by `256 - (trap_handler_size % 64)`.
Change-Id: I5446da54a965a64f21cb0fd3ce3caa4b6137a933
[ROCm/ROCR-Runtime commit: 2f2ba050f6]
Thread yield doesn't drop the scoped acquired mutex so drop it around
yield to prevent a multithread deadlock.
Change-Id: Ie21f3bff89f6f9e4c57e5b3ccf17968f253fa23a
[ROCm/ROCR-Runtime commit: 70f0a44910]
Fix a condition where we can get a divide-by-zero in the
TranslateTime(tick) function if the GPU tick predates HSA
startup and we did not do a SyncClocks since initialization.
Change-Id: I0dcec8553ccb8f01211928991f4b3ed3cb4a1ebb
[ROCm/ROCR-Runtime commit: bc585bd8de]
In ASAN builds, the compiler used is clang. The initialization of
variable sized array using assignment operator is causing compilation
failure in ASAN builds. Used memset to fix the same.
Change-Id: Ifc748291a41a9886243e0fb1ba576d2760f5e15e
[ROCm/ROCR-Runtime commit: cd4632ccbc]
I've just reverted some code what it was in 5.5 by wrapping new x86
specific bits with #if's, e.g.:
- CPUID is x86 specific
- mwait is x86 specific
Change-Id: I6cefae34282c777c7340daf3f934d2a11742502e
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
[ROCm/ROCR-Runtime commit: 132a19e9c3]
This fixes a segfault error in cases where the linking order of
compilation unit varies. Reason behind the segfault is that one
global variable in one compilation unit depends on another global
variable in another compilation unit, but there is no guarantee that
this other compilation unit is initialized first. The fix forces a
reinitialization at the first invocation of the library.
Change-Id: I1428592c6898bca13a330c4588941de260ff0370
[ROCm/ROCR-Runtime commit: d220e16000]
Unless SDMA blits have actually been used for copies, prevent the DMA
copy status from querying the blit's pending byte status to avoid
creating an unnecessary HW queue.
Change-Id: Ied1fbed73c08f0408f0e3583f9b56f2768c71708
[ROCm/ROCR-Runtime commit: 92467fd282]
Querying pending bytes on a blit kernel is unnecessary when runtime
runs out of SDMA resource since we are returning an SDMA availabilty
mask.
Change-Id: I347efba0c85b70ea3ba8749d76a499afc23909e8
[ROCm/ROCR-Runtime commit: 8c60f04a99]
Added HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXT_SCOPE_FINE_GRAINED flag to enable extended scope memory region
where the device-scope atomics act as system-scope atomics
Change-Id: I79fc3207cb630dfc68bed2f8aabd75f35fe80b12
[ROCm/ROCR-Runtime commit: 77bf357647]
Enable sleep for all waiters with event age tracking support kernel.
Change-Id: Icd4e1e8d83b4a54e9f6aaa99691a6573211b3337
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 36666f5895]
KFD kernel version 1.13 starts to support event age
tracking which help elimating unncessary busy wait.
Change-Id: Ib447ed6e0350f3110a4d6b9b80a0388000dd0e72
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 5871b28503]
Compiler behavior is undefined if the right operand is negative,
or greater than or equal to the width of the promoted left operand.
For release builds with address sanitizer enabled, this compiler
optimization behavior leads to unsupported queue size value since
current method shifts till 128 bits on a 64 bit value.
Change-Id: Iddcc15b43d2331bc8bf5fc3aa4725f76844655ec
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
[ROCm/ROCR-Runtime commit: ea2f832a43]
Copy on engine API still needs to respect HSA_ENABLE_SDMA settings.
Change-Id: I26038b1e3082d62687c2e279615557583d20f229
[ROCm/ROCR-Runtime commit: 3e3e11bc5a]
Earlier, hsa-runtime was unable to find symbols from a stripped ELF-image becasue
no support to find symbols from ".dynsym" section.
Looking for symbols in .dynsym is enabled by LOADER_USE_DYNSYM=1
environment variable
Change-Id: I4f0e8dd0eb053a6066d4d49b670c52e51149531a
[ROCm/ROCR-Runtime commit: 4142a77375]
Explicitly mention that IPC handles can only be created on GPU agents.
Change-Id: I19bc3578d6e5243c795bf6fbf981ea4bd3bfc2e8
[ROCm/ROCR-Runtime commit: 5e4490f180]
GFX11 and up including some GFX9 devices will not support
old trap handling without the new exception handling.
Instead of a hard assert failure that runs into a core dump,
let ROCr initialization continue instead.
Change-Id: I309becdc72ef4fb2fafd118c1faf0801407e658e
[ROCm/ROCR-Runtime commit: bfb94b3b6e]
status.priv may be read after returning from the trap handler, which
causes sq_interrupt_word_wave.priv to be 0 even though the s_sendmsg
instruction was initiated when status.priv was 1.
To work around this, added a s_waitcnt lgkmcnt(0) after s_sendmsg
to make sure the message is sent before continuing.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Laurent Morichetti <Laurent.Morichetti@amd.com>
Change-Id: Ieb75005ca1559ef03d0efac80e966f521e41fcb7
[ROCm/ROCR-Runtime commit: 6a82b0a038]
Removing this definition as this should already be defined by compiler.
This is causing compile errors on newer versions of llvm because the
macro is being redefined.
Change-Id: Ica6a06f46a14e16d3f52e83b9b5ee8cfd7359510
[ROCm/ROCR-Runtime commit: e4fffa140a]
A patch was made in gfx940 npi branch to move the kernel object file
loading to outside the rocrtstNeg.Queue_Validation_* main queue creation
and submission loops, and added a clear_code_object() after the loop.
Another patch was made to the non-npi branch which adds a
clear_code_object() inside the loop. When the npi branch patch was
merged, this was causing the code object to be cleared at the end of
the first loop. Remove these clear_code_object() calls.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id4188e78411e81c5071bf715c1f02491f571ab79
[ROCm/ROCR-Runtime commit: dbe2a82e35]
Adding support for gfx941 and gfx942 ISAs.
gfx940 ISA will use sc0:1 sc1:1 on load/store operations
gfx942 ISA will use default load/store operations
Change-Id: If1efbef86f59e2cf2d48fe359cd4166405a0a579
[ROCm/ROCR-Runtime commit: 41f6d0426d]
When compiling in ASAN mode, remap the first page of device allocations
to system memory. ASAN's memory allocator uses a small amount of extra
memory to store data for housekeeping purpose. But because this memory
is from the GPU memory pool, it might have uncommon memory type for host
to access. Mapping this section of memory to the host makes this memory
accessible to ASAN.
Change-Id: I36f659d616a4d15558372592439a8723c5c84a69
Signed-off-by: Bing Ma <Bing.Ma@amd.com>
[ROCm/ROCR-Runtime commit: 50e754d08b]
Add support for HSA_ENABLE_PEER_SDMA env variable that can be used to
disable use of SDMA engines for device-to-device transfers. Note that
setting HSA_ENABLE_SDMA=0 will disable all SDMA transfers and override
HSA_ENABLE_PEER_SDMA values.
Change-Id: I737b3c2b2efcf3ff237f98bc748f49b8252ed24a
[ROCm/ROCR-Runtime commit: a397373cea]
RUNPATH in libraries will be : $ORIGIN
RUNPATH in binaries will be : $ORIGIN/../lib
Change-Id: Iafa66a8e02cc8c5783903d40927b63652042d2f1
[ROCm/ROCR-Runtime commit: ad002f1e7b]
Update documentation for hsa_amd_pointer_info to clarify which fields
are invalid when the allocation type is HSA_EXT_POINTER_TYPE_UNKNOWN.
Change-Id: Idaed985962c4a98d281ebe01bef8ec2459da3985
[ROCm/ROCR-Runtime commit: 39feb83b88]
Some workloads running on multi-GPU create 1 process per GPU. So each
process creates a GPU agent on every GPU, but will only create queues on
one GPU. This would cause un-necessary scratch reservation.
Change-Id: I50a216f0bcc0b5f707f3943147390b0ecec1ac22
[ROCm/ROCR-Runtime commit: 38e832a682]
If the required scratch allocation is too large, ROCr will attempt to
reduce it by lowering the dispatch's targeted occupancy. The reduction
loop however was prone to overflow if waves_per_cu was not a multiple of
waves_per_group. Ensure no overflow by aligning waves_per_cu to
waves_per_group.
On GC 9.4.3 dGPU, dispatches with a large grid size and a
waves_per_group of e.g. 16 may require to reduce occupancy such that
waves_per_cu is less than waves_per_group to ensure the allocation size
is small enough. Allow this while also ensuring the tmpring scratch wave
count is kept divisible by the number of SEs per XCC.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie4016dcd8166a9ae69e9decc26a3eec882b49480
[ROCm/ROCR-Runtime commit: bd63e5045c]
Scratch cache reserved memory is only available for scratch memory use
so do not report this memory as available to the user via the
HSA_AMD_AGENT_INFO_MEMORY_AVAIL api.
Change-Id: I52f96e62536458bcaa52b9f4be5de856d5680dc4
[ROCm/ROCR-Runtime commit: 3477fbc661]
Temporarily disabling rocrtstNeg.Queue_Validation_InvalidGroupMemory
until it is fixed.
Change-Id: Ifc1973a960c8d0bae27e2628e4bfddc60f70325d
[ROCm/ROCR-Runtime commit: 7b74271d5e]
Using wrapper header files will result in #warning message by default
Change-Id: I87739cabb365b9370b1182cf23ca9b54d99149c3
[ROCm/ROCR-Runtime commit: fbcbcd9e73]
Negative queue validation tests were doing many redundant from-file
kernel object loads in a loop. This was creating many simulataneous open
file handles within many dynamically allocated CodeObject objects. While
the CodeObject class implements RAII on the file handles to cleanup on
destruction, clear_code_object() only gets called on the destruction of
the TestBase-derived test objects (these being a suite abstraction).
Due to this we were hitting file open() EMFILE errors (too many open
files) in gfx94x CPX mode. Move LoadKernelFromObjFile outside of the
test loops and clear_code_object() for each test on each agent.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I6f9d23fd122720c49a58c22698f097906d2fc97c
[ROCm/ROCR-Runtime commit: 7a4c9273d7]
Add HSA_ENABLE_SRAMECC environment variable that can be used to
override SRAM ECC mode reported by KFD
Change-Id: I2b95511820a2d3d146a76b03070659c0695b61fd
[ROCm/ROCR-Runtime commit: a180c9ee78]