Add vm_find_object_by_userptr_range so QueryPointerInfo can find the
object as well when the pointer is not the starting address but it's
inside the memory range. Also rename vm_find_object_xxx functions to
_by_address and _by_address_range to be consistent.
Change-Id: I5c2b3a05b41493e32b7fd9154665bf078b043606
[ROCm/ROCR-Runtime commit: 4911c91389]
Introducing tiling format for images, still using LINEAR for now.
Using the new KFD/Thunk API hsaKmtGetTileConfig API for the address library.
Change-Id: Ic0677429dd320eef09ab62dddaf9b2dd94c4f904
[ROCm/ROCR-Runtime commit: 538736a660]
Add CPUVM aperture to keep track of memory allocation that is not known
to GPU driver. Together with GPUVM, this patch adds the pointer attributes
support to APU.
Change-Id: If13f9cf01ff8b9f709b99b66661e7505246adf4c
[ROCm/ROCR-Runtime commit: 19f2676ea7]
Add two pointer attributes APIs:
hsaKmtQueryPointerInfo - allow the user to query the memory information
using a pointer. This pointer can point to any address inside the
range known to HSA.
hsaKmtSetMemoryUserData - allow the user to attach data to a pointer to
add memory tracking information. This pointer must match the start
address of a memory allocation or registration.
TODO: This patch implements support on dGPU. Needs to add APU.
Change-Id: I4711809274248434901f0794f50ebfa13a7371a8
[ROCm/ROCR-Runtime commit: 51e4d27c37]
C11 atomics are not statically guaranteed to be lock free and so
may not be atomic with respect to atomic operations originating
outside the standard library, such as platform atomics.
C11 macros to statically discover always lock free operations
(ATOMIC_*_LOCK_FREE) do not cover uint64_t in GCC and
std::atomic<uint64_t> is not a type alias of any covered type.
All use of __atomic by atomic_helpers.h is statically checked to be
always lock free.
GCC builtin fencing does not appear to be strong enough for WC memory.
Added an option (enabled) to enforce consistency for WC memory on x64.
__sync builtin's were not used as they were declared legacy by GCC.
Added a strongly conservative option (ALWAYS_CONSERVATIVE) to enable
use of full memory fences in place of partial fences and compiler
driven processor specific optimization.
Change-Id: Id7aaaca626144070f58759f6a348cbee4612bbc0
[ROCm/ROCR-Runtime commit: 1bc15bbf79]
For APUs, use /proc/cpuinfo to get Marketing name.
Change-Id: I4a17516d26a092683f36631032be00ad44f7e7fe
Signed-off-by: Lan Xiao <Lan.Xiao@amd.com>
[ROCm/ROCR-Runtime commit: df593aa076]
Change hsa_code_object_serialize and hsa_code_object_deserialize to use memcpy instead of hsa_memory_copy since it is system->system copy
Change-Id: I329e270ae4e2fc25e177dc8080d93662ffb261ab
[ROCm/ROCR-Runtime commit: 73ed2116d5]
Compiling in 32bit mode is broken, and we don't have an intention on
restarting compatibility with 32bit apps.
Change-Id: I5524b5b63fe62e6026aa04d84c4510e290a86106
[ROCm/ROCR-Runtime commit: e0c77a38cb]
Route all device-visible system memory allocations through system_allocator.
Change-Id: I5e90a1bf491e432678a6d8ab1f9f3770734cbda1
[ROCm/ROCR-Runtime commit: 74f5aca93d]
HSA thunk API is currently reporting engineering name to MarketingName
and returning NULL when querying for AMDName.
-Change current name reporting from MarketingName to AMDName.
-Use libpci to get MarketingName
Change-Id: I819a6de7b067a2e724a6695e7d800274b83a71f8
Signed-off-by: Lan Xiao <Lan.Xiao@amd.com>
[ROCm/ROCR-Runtime commit: 9cbbf30be7]
The thunk spec requires that CUMaskCount be divisible by 32. Check this
and return INVALID_PARAMETER if it is not.
Change-Id: I4e0c8502d996d3da31224b817a5d4ff2c6054e13
[ROCm/ROCR-Runtime commit: 70b1b5b17e]
- Includes Sean's latest changes
- Cleanups/improvements
- Fixes for few bugs that crept over from previous releases
Change-Id: I839dc4895bf13ebd0afc8843424387a9fef667b0
[ROCm/ROCR-Runtime commit: c2c993e0d8]
The PM4 IB must have executable permission.
A second part of this fix concerns robustness when this is not the case.
This remains under investigation.
This fix will shortly be cleaned up in a refactoring pass to consolidate
calls to hsaKmtAllocMemory.
Change-Id: I326fe01949a77669e0b07c3cadc9fd44b8065055
[ROCm/ROCR-Runtime commit: f71de56c79]
EventId is needed in calling hsaKmtDestroyEvent() when mmap failed,
so we should move it ahead of mmap call.
Change-Id: I5f4288b953611799a02b0e988d6b2e48104466a0
[ROCm/ROCR-Runtime commit: 9c9bfa30c0]
Counter IDs in SQ_PERFCOUNTER0_SELECT are identical on gfx803 10 and
gfx803 11.
Change-Id: I5cfefd44b52989efd1d89311cf8c70c84ea2b230
[ROCm/ROCR-Runtime commit: 0b5c65a903]
- Doxygenify comments
- Match order of implementation with order of declaration
Change-Id: I3c7e486c4dd3616f4b10b2f3e69532a4b5fb9e8e
[ROCm/ROCR-Runtime commit: 01dc3a8ff3]
Due to a misinterpretation of the HSA specification the microcode has,
until now, been responsible for ensuring a coherent view of the
amd_kernel_code_t object when acquire_fence_scope is set to agent or system.
To correct this the runtime must instead assume this responsibility.
Introduce GpuAgentInt::InvalidateCodeCaches to perform this operation
on-demand. Invoke this after code object allocation. Extend the Queue
implementations to support PM4 command submission, through which the
PM4 command ACQUIRE_MEM can be submitted to perform cache invalidation.
Submit through a runtime-managed queue shared with the blit implementation.
This change depends on microcode support and this is checked against the
running version. Older microcode builds will perform cache invalidation
themselves, so it is acceptable for this change to do nothing in that case.
Change-Id: I268dd2b83af3decdd9ad07430a81df8a2ecb6bd2
[ROCm/ROCR-Runtime commit: f76577ae43]
The default optimization level may interfere with debugging.
Change-Id: Ie694ef35b05e4cf2bf4f68bc346e8d60a2d27bc8
[ROCm/ROCR-Runtime commit: d2a4629c55]
This option was disabled by default to address issues writing to stderr
in Windows applications. The lack of an error message for memory access
faults is confusing to users, however.
Enable the error message by default on Linux only.
Change-Id: I1f44ba42362f8874abdc7c8e63ddd54a855b5394
[ROCm/ROCR-Runtime commit: acc5f15e4c]
The runtime needs a queue on which to submit cache management commands.
Device-to-device blit copy already creates a queue unconditionally.
We can share this queue for both purposes.
This change restructures the BlitKernel interface to accept, rather than
create, a queue. GpuAgent creates queues as needed for both cache
management and blit compute.
Fix queue full detection in AcquireWriteIndex (<= vs <).
Change-Id: I61d0c6b9d04f2dba74872f0676ad791435778ba4
[ROCm/ROCR-Runtime commit: f7ab361347]
get_block_properties uses the complete DID to identify the GPU. This list
is getting too long when more devices are added. Reading the 12 most
significant digits is good enough to identify the GPU.
Change-Id: Ieebb05402bbe08af12eb7289dfeb5bbf1f515b0f
[ROCm/ROCR-Runtime commit: 6c4d19a9d2]
This is the first part of transitioning to the LLVM-based assembler.
SP3 is deprecated and all references to the library are removed.
Pending LLVM support, relevant shaders have been precompiled.
Change-Id: I7d44cef5ded1836c4a74b77881af5bea8803d2c1
[ROCm/ROCR-Runtime commit: 712ea75377]
On multi-node systems only the first CPU node was recognized in the
signal consumer list, causing fallback to non-interrupt signals.
Change-Id: I9bd0706bafbe046be9d7f210d05fa4cf1fcd16fa
[ROCm/ROCR-Runtime commit: b44417043b]
Before this change, runtime hard code the device name, in this commit,
we will query the name from KFD. Will use codecvt to do UTF-16 to
UTF-8 transfer after GCC supports it.
Change-Id: I7c4dc32ef857296296c810d083888c5ba1c808b6
[ROCm/ROCR-Runtime commit: 88708b8e5a]
Have amd::MemoryRegion::Lock not assert if the alternate_va
is null but use the host_ptr instead because in the case where
the src/dst memory pointer is allocated via KFD, the host_ptr
is a GPUVA already.
Change-Id: If44368cc2854d4c0c477ae56e4eeabc37e54c1a5
[ROCm/ROCR-Runtime commit: 4e93bdc99c]
Reduces the number of blit queues from 3 to 2, when SDMA is unavailable,
improving the availability of queue slots for applications.
Change-Id: I8860d2b6c6d6527494b9fc35d164099e1313886a
[ROCm/ROCR-Runtime commit: 38fddca9fe]
for the kernel args.
Most image-related HSA conformance tests pass now
Many more ocltst/oclperf image ones pass too.
Change-Id: I3f28d4ee7369f0ebc7af5128d3ffe1390957db98
[ROCm/ROCR-Runtime commit: c64f646711]