The PM4 IB must have executable permission.
A second part of this fix concerns robustness when this is not the case.
This remains under investigation.
This fix will shortly be cleaned up in a refactoring pass to consolidate
calls to hsaKmtAllocMemory.
Change-Id: I326fe01949a77669e0b07c3cadc9fd44b8065055
EventId is needed in calling hsaKmtDestroyEvent() when mmap failed,
so we should move it ahead of mmap call.
Change-Id: I5f4288b953611799a02b0e988d6b2e48104466a0
Due to a misinterpretation of the HSA specification the microcode has,
until now, been responsible for ensuring a coherent view of the
amd_kernel_code_t object when acquire_fence_scope is set to agent or system.
To correct this the runtime must instead assume this responsibility.
Introduce GpuAgentInt::InvalidateCodeCaches to perform this operation
on-demand. Invoke this after code object allocation. Extend the Queue
implementations to support PM4 command submission, through which the
PM4 command ACQUIRE_MEM can be submitted to perform cache invalidation.
Submit through a runtime-managed queue shared with the blit implementation.
This change depends on microcode support and this is checked against the
running version. Older microcode builds will perform cache invalidation
themselves, so it is acceptable for this change to do nothing in that case.
Change-Id: I268dd2b83af3decdd9ad07430a81df8a2ecb6bd2
This option was disabled by default to address issues writing to stderr
in Windows applications. The lack of an error message for memory access
faults is confusing to users, however.
Enable the error message by default on Linux only.
Change-Id: I1f44ba42362f8874abdc7c8e63ddd54a855b5394
The runtime needs a queue on which to submit cache management commands.
Device-to-device blit copy already creates a queue unconditionally.
We can share this queue for both purposes.
This change restructures the BlitKernel interface to accept, rather than
create, a queue. GpuAgent creates queues as needed for both cache
management and blit compute.
Fix queue full detection in AcquireWriteIndex (<= vs <).
Change-Id: I61d0c6b9d04f2dba74872f0676ad791435778ba4
get_block_properties uses the complete DID to identify the GPU. This list
is getting too long when more devices are added. Reading the 12 most
significant digits is good enough to identify the GPU.
Change-Id: Ieebb05402bbe08af12eb7289dfeb5bbf1f515b0f
This is the first part of transitioning to the LLVM-based assembler.
SP3 is deprecated and all references to the library are removed.
Pending LLVM support, relevant shaders have been precompiled.
Change-Id: I7d44cef5ded1836c4a74b77881af5bea8803d2c1
On multi-node systems only the first CPU node was recognized in the
signal consumer list, causing fallback to non-interrupt signals.
Change-Id: I9bd0706bafbe046be9d7f210d05fa4cf1fcd16fa
Before this change, runtime hard code the device name, in this commit,
we will query the name from KFD. Will use codecvt to do UTF-16 to
UTF-8 transfer after GCC supports it.
Change-Id: I7c4dc32ef857296296c810d083888c5ba1c808b6
Have amd::MemoryRegion::Lock not assert if the alternate_va
is null but use the host_ptr instead because in the case where
the src/dst memory pointer is allocated via KFD, the host_ptr
is a GPUVA already.
Change-Id: If44368cc2854d4c0c477ae56e4eeabc37e54c1a5
Reduces the number of blit queues from 3 to 2, when SDMA is unavailable,
improving the availability of queue slots for applications.
Change-Id: I8860d2b6c6d6527494b9fc35d164099e1313886a
for the kernel args.
Most image-related HSA conformance tests pass now
Many more ocltst/oclperf image ones pass too.
Change-Id: I3f28d4ee7369f0ebc7af5128d3ffe1390957db98
Add performance counters for gfx70x. The reference is the gfx7 register spec.
The register being looked at is SQ_PERFCOUNTER0_SELECT.
Change-Id: I344bfb7452f6148f4dc268163d12c553c6be8424
Stepping 1 indicates higher double-precision float performance and
potentially other runtime workarounds needed for lack of PCIe atomics
on gfx70x.
Change-Id: I97185c1233e7d24caaf20a1eadea931d5a2bc664
In a NUMA system, topology should report NumCaches as the number of caches
within the node but current code reports the total caches in the system. This
patch fixes the error. This patch also uses cpuid to get cache information
instead of reading from sysfs files. See "Intel Corporation, Intel 64 and IA-32
Architectures Software Developer's Manual Volume 2(2A, 2B & 2C) Instruction
Set Reference" 3-179 for cpuid instruction features used in this patch.
Change-Id: I8ecece6c2b230741822620b44e66ddc201ff5112