An AQL packet header field is stored using an atomic release, and needs
to be read using atomic acquire if it may be written by another thread.
Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749
[ROCm/ROCR-Runtime commit: 395ad3b77b]
Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add
support to intercept queue to invoke a callback for these packets.
Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149
[ROCm/ROCR-Runtime commit: 23b4ce501d]
When the intecept queue copies packets from the proxy queue to the
wrapped queue, it should not attempt to copy packets that are outside
the proxy queue. This could happen if the user of the proxy queue
advances the write pointer beyond the number of free slots and the
packet rewriter reduces the number of packets.
Change-Id: Id02f5df8aee0ed7269f4de813731d507cf2126b3
[ROCm/ROCR-Runtime commit: b020f66d39]
If an intercept queue is created and multiple packet rewriters are
registered, and if one of the rewriters invokes the packet writer
multiple times, then on returning from the packet writer the packet
rewriter index needs to be restored. Otherwise the next packet writer
call will start with an index of 0 which will be decremented and result
in out of bounds vector access.
Change-Id: Icb3f6a81ea04f1f7b91551b974a1f48c4f32db60
[ROCm/ROCR-Runtime commit: b64a845105]
It is possible that packet rewriting an initial packet for the intercept
queue produces more packets that the size of the wrapped queue. The code
would never submit the such a set of packets as it attempted to submit
all or none. This can result in an infinite loop.
This is corrected to submit what will fit if the rewrite is larger than
the wrapped queue.
Change-Id: I8f03228c2e15151287e25de46eaee998f829c62a
[ROCm/ROCR-Runtime commit: 9f4d651d14]
The intercept queue submit needs to be obstruction free as it can be
invoked by the runtime async handler helper thread. The code had a busy
wait loop waiting for a free slot to be available to add the retry
barrier packet. Blocking that thread prevents it servicing other async
handlers which may need to execute in order to allow packets on the
hardware queue to be processed to free up a slot.
Change the code to always leave one free slot unless there is a retry
barrier packet already on the queue.
Change-Id: If901c865550258b790b995d58037b0f99f1968cc
[ROCm/ROCR-Runtime commit: d16c392338]
Describe the assumption being made when checking if there is a retry
barrier packet on the queue. Also enforce the consequential requirement
of the minimum queue size.
Change-Id: I0efaffc5a79b9e2fdab3655b8b74270118a5c2ff
[ROCm/ROCR-Runtime commit: ca99795c58]
The intercept queue was processing all the packets on the proxy queue.
This could result in the rewrite of more than one packet being put on
the overflow queue. If there are a lot of packets on the intercept
queue this could result in the overflow queue having more packets than
the size of the hardware queue. The code to submit the overflow queue
fails if it is unable to put all the packets of the overflow on the
hardware queue. This resulted in an infinite loop. It also resulted in
an assert being reported that packets are being added to the overflow
queue when it is not empty.
Correct this by checking if the overflow queue is non-empty after
rewriting each packet. If it is non-empty then stop processing
additional packets. The additional packets will be processed when the
barrier packet added to the hardware queue is executed due to its asyn
handler. This barrier packet is added to the hardware queue whenever
packets are saved on the overflow queue.
Change-Id: I2537911d3c3ba1aac61a0a35f1ab97426a66b5a2
[ROCm/ROCR-Runtime commit: be6b8bb055]
When forcing SDMA copies, engine ID specified by the requester should
still be used since the requester has hint of engine availability.
Change-Id: Idefa9494e407e31da510aa4c7c1fa283c85a4f6e
[ROCm/ROCR-Runtime commit: a36856b02a]
The Vendor specific header is only 8-bits and this would break the
behavior on big-endian machines. Renaming field to amd_format to match
name in spec sheets.
Change-Id: I65559757657565d3d3ff489d2663a0be42cf8ba5
[ROCm/ROCR-Runtime commit: 22be526230]
For large memory allocations (>2MB) the thunk should use the
MADV_HUGEPAGE flag for madvise call to optimize allocation performance
on certain operating systems that rely on madvise hint when Traspatent
Huge Pages is not set to always.
Suggested-by: Joseph Greathouse <joseph.greathouse@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Change-Id: Ic0c753f89a177b0f715942d6e2a7108b08a85f20
[ROCm/ROCR-Runtime commit: 5047eb161f]
Provided pkgconfig file contains interface description which is arch
dependent. In such cases .pc files should be installed in libdir.
Signed-off-by: Tomasz Kłoczko <kloczek@github.com>
Change-Id: Ibbc85ad4aee1ef014c409dfa63313873b590464b
[ROCm/ROCR-Runtime commit: a226542fc3]
Set CWSR svm range granularity to 0xff, then KFD will migrate the entire
CWSR range from VRAM back to system memory when recovering the CPU page
fault if rocgdb access CWSR area, this avoid the partial CWSR range
migration and stall CWSR GPU mapping issue.
This is a temporary workaround, it should be reverted once the KFD is
fixed.
Change-Id: I80a7248244574edba25b13858b7ebcf1c77b8930
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 85a47fa66b]
Some new CPUs have different cache reporting structure causing thunk to
leave the cache information empty. Allow the cache information for CPU
agents to be empty as they are not used by language-runtimes
Change-Id: Ic5e880171ab20aa114b4b62bdb4479eb54066f7b
[ROCm/ROCR-Runtime commit: 96b3c4a0aa]
IOMMUv2 is removed from AMDGPU/KFD.
Change-Id: Ia00f9aa879a5f32a42bec914936d105d6845bc60
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 693e686c4d]
This share resource is for IOMMUv2 which is removed from
AMDGPU/KFD.
Change-Id: Ia6e9311f1adc56fac2c9e8fa05b24c5ec8c272a5
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: d195deeec4]
IOMMUv2 is removed from AMDGPU/KFD.
Change-Id: I9fcf20ae9288cb40bb4b696284fc70534fb6484b
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 277d5e27ff]
IOMMUv2 is removed from AMDGPU/KFD.
Change-Id: Ib87f501c07d9de90e6b83b98f98daacd5913e98a
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 274b5b51ca]
Using new ExtendedCoherent KFD HSA memory flag to achieve system
scope coherence on atomic instructions. Non-compliant systems may
have the need to perform explicit HDP flushes to achieve system
scope coherence using this flag.
Change-Id: Ic6b47c0e97285086fa1f52bbfa4597b81cadafeb
[ROCm/ROCR-Runtime commit: 4eb6ed7799]
Add support for new flag for memory allocation that will provide
system-scope coherent atomics
Change-Id: I426d66223e8d2b570f69b4c0e61145ce9b2290d2
[ROCm/ROCR-Runtime commit: 8e06dce573]
Some negative tests can trigger C++ exceptions to be thrown, which
causes code to leave the ref counts in inconsistent state.
Change-Id: Ifa6d8be986941efcdf20d7ac8b86eb15a8fe9932
[ROCm/ROCR-Runtime commit: 06eefdeb1b]
Modify hsa_amd_vmem_get_access to handle pointers that are within VA
range of an existing memory mapping
Change-Id: I9f806ec39f6e9a33da8d86dd65d9a472438fa8ed
[ROCm/ROCR-Runtime commit: dd61f54171]
The debug address watch test will hang when running with the
entire KFD test.
Disable it for now.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I1d0479fa2717d2f398cc32e0605ca6dcc17ebcd5
[ROCm/ROCR-Runtime commit: 986e82d677]
Silence warnings on more stringent compile checks for lack of override
declaration.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Iaa54dfc3dd74f5ee55763cafbbcf2db73493bb21
[ROCm/ROCR-Runtime commit: 6b4365ae4c]
Debug test shaders should use camel case and suffix *Isa to match other
test shader naming convention.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I64e14183ba1c7c9664b13a742a0e5683866e8223
[ROCm/ROCR-Runtime commit: fcec22716a]
On busy systems, the memory allocation can take long duration and
increase calls to hsa_signal_create/hsa_amd_signal_create. This
mitigates this issue.
Change-Id: Ib7640273262ebc3dbf1f07049ce5da10b1d6b158
[ROCm/ROCR-Runtime commit: 9a127193a8]
MCPU const char * always returns true, so check the value instead.
Before: if (!MCPU) {
After: if (!*MCPU) {
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I414e091ca764095937311648c534351d6abf30e6
[ROCm/ROCR-Runtime commit: 5f117f7608]
For some reason, non-Ubuntu builds have some sort of memory
corruption when running this test, which affect subsequent running
tests. Disable it for now.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I5f54ee4c63286a33c6948bc818aa1501c4a6751e
[ROCm/ROCR-Runtime commit: 6ec529fe68]
Generates shader bytecode stream in amd_blit_shaders_v2.h at build time
Change-Id: I5228ec5442a78d074fd85ca9cd7f7a156dd84da3
[ROCm/ROCR-Runtime commit: 4e675ce730]
Add compile time asserts to force incrementing API table STEP versions
each time a new function is added to each table. This is required for
profiler team to be able to add preprocessor macros to determine which
versions contain the new APIs.
Also incrementing the major versions to 2 to indicate new numbering
scheme.
Change-Id: I148a436a5ceab6be3906f8263b40ea9b07841577
[ROCm/ROCR-Runtime commit: 03f2f69d16]
Use memset to avoid general 0 set padding issues and ASAN compile issues
for debug tests.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I0a5aca5b7b631083599573b47f1ae87d5d0d5d71
[ROCm/ROCR-Runtime commit: f9e20c8a93]
Some GFX9 devices will drop commands if ring buffer submission is less
than 64 DWORDs. Pad submission with a NOP head an trailing null
DWORDs in this case.
Change-Id: I850af490fb699f7efe8aef96d97c600a8e76516b
[ROCm/ROCR-Runtime commit: cdd0728d9b]
Also changed enum value to leave gap between enums that only exist in
hsa_region_info_t and enums that exist in both hsa_amd_memory_pool_info_t
Change-Id: I8f9f31200de66648e9328e4203ab283068c993f0
[ROCm/ROCR-Runtime commit: 4317f8dece]
We don't need to keep track of specific blit engines in gang for
submission anymore as ganging early exits on pending bytes.
So tidy up the fluff.
Change-Id: I77e80bf1ad8f561a03fff77bce33aa09d02760c6
[ROCm/ROCR-Runtime commit: 132815bcfb]
In ASAN builds, the compiler used is clang. The initialization of
variable sized array using assignment operator is causing compilation
failure in ASAN builds. Used memset to fix the same.
Change-Id: I02aef3b99a6cad0cce3a378210a48732e07a88fb
[ROCm/ROCR-Runtime commit: 65911e8368]
Add test to catch trap on wave start or end override event.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Icb57af64475fbd2d8a6c0af9a2ee5db5d1a169c6
[ROCm/ROCR-Runtime commit: a3f8085025]
Address watch test will test read and write operations.
Test will also check if operation is precise if precise
address watch is available.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I7ef835790e26bf6345682755d7dd26a35853bcd5
[ROCm/ROCR-Runtime commit: 8311ca5bfa]
For GFX11 debugger testing, waves require to start in non-priv mode for
some test cases, so allow tester to set this.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Iee93fda926bfd336d51c79c086f1f75bc35b70e5
[ROCm/ROCR-Runtime commit: 6c5121faff]