Non-paged allocation for queue memory necessary for binding wptr to
GART. Required to support usermode queue oversubscription with MES for
GFX11.
Adds AllocateNonPaged entry to MemoryRegion::AllocateEnum for clarity;
aliases AllocateIPC.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I1a97a1820da26cf2433d9c237b2e6d2b0b8628b4
[ROCm/ROCR-Runtime commit: 061aa04147]
Adding new ImageManager class for GFX11 GPUs
ImageManagerGfx11 functions copied from ImageManagerNv.
Register descriptions in resource_gfx11.h updated for gfx11.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Change-Id: I48b39f6a633aef14aa829f7240a43fe0feb1c290
[ROCm/ROCR-Runtime commit: 907e05c1b3]
GPUs excluded by RVD are not expected to have scratch, memory, trap
handling nor memory regions set up. Now that these GPUs are added to
a new list, early return on agent destruction to prevent bad function
calls on destroy.
Also fix up broken memory releases between the gpu lists and ugly braces.
Change-Id: I52fc6e86ceba0a0383cedc63310eb409515eaf9f
[ROCm/ROCR-Runtime commit: 9d2fe1ac2a]
Fix the issue of rocrtst test - The runtime failed to allocate the necessary resources
Change-Id: Ie4ffeb939fb322db068f3132a7973a359c204176
[ROCm/ROCR-Runtime commit: 8a0fe6a832]
Atomic memory operations on these memory buffers are not guaranteed
to be visible at system scope
Change-Id: I4cccde114632071a000384502a83bc191e77e85b
[ROCm/ROCR-Runtime commit: 364715cbc6]
The current state of hsa-rocr does
NOT requires thunk lib as its dependency.
Its unnecessary pulling thunk package while
installing rocr. This patch corrects
the same
Change-Id: Id98ede8b66ffd9aaf4a47da96ba2f981f4c3da73
[ROCm/ROCR-Runtime commit: a229f5c320]
Maintainer distribution list field had wrong information.
Adding the newly formed DL by the component team.
Change-Id: I61651e429375cdc512d0fe4b0768f917506b5392
[ROCm/ROCR-Runtime commit: 23f908708a]
A work group processor (WGP) require both its CU to be enabled
in order to be enabled.
The KFD will round robin distribute by even-indexed pairs so
enforce this requirement for runtime set mask calls.
Change-Id: Ic46661b01f398aa1fe24d96b5c9c31f122f967a3
[ROCm/ROCR-Runtime commit: f600687537]
Discovered agent handles should only apply to copy routing, not to
copy device selection. The user may not have mapped all allocations
to all GPUs so we must ensure that the copying device is one passed
by the user.
Change-Id: I2532e66d30e6842624e594f235dd144a186220d4
[ROCm/ROCR-Runtime commit: a8603b9397]
Allows easy building on platforms without native hwloc v1 support.
Change-Id: I20d711f914d176decb1b64381fd4b51ccc4262b5
[ROCm/ROCR-Runtime commit: 33e8919743]
Mostly a demo at this point. Logs SVM (aka HMM) info to
HSA_SVM_PROFILE if set.
Example: HSA_SVM_PROFILE=log.txt SomeApp
Change-Id: Ib6fd688f661a21b2c695f586b833be93662a15f4
[ROCm/ROCR-Runtime commit: 965df6eef7]
During hsa initializing stage, ROCr now searches all the loaded libraries
for a symbol "HSA_AMD_TOOL_PRIORITY" and adds all those libraries to
the tools library init list. Tools libraries listed in HSA_TOOLS_LIB
env variable are also loaded in the given order and take priority
over HSA_AMD_TOOL_PRIORITY.
Change-Id: I739af42bbd777c44a9152c11e17dd69979b65e82
[ROCm/ROCR-Runtime commit: e7fc301aa7]
Adds a script to run clang-format on the latest patch so we don't
need to remember the command line.
Also applies missing formatting to the prior commit,
"Add API for available GPU memory".
Change-Id: Ida51aedc38af229f6a26e275072654860748fa93
[ROCm/ROCR-Runtime commit: e7152c8b16]
Add support for AMD Agent to return amount of memory available
Change-Id: I5c32e2cebbaa2993b044250aefe434e4cc02d8c2
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
[ROCm/ROCR-Runtime commit: 4ac840269c]
Discards all user provided async copy agent info and relies on
pointer info discovery.
Change-Id: Ife3e708a49ffccbede4983ab47d5ed0032970857
[ROCm/ROCR-Runtime commit: 3ebe99f96d]
Only block info can return an agent which is disabled in the
process.
Change-Id: I34cb1f9eea9217e10a484726c90d930e3414e769
[ROCm/ROCR-Runtime commit: 13a0cdfa77]
Physical owning agent may not be visible to the current process
due to RVD.
Change-Id: Ib463336a5ed73a479f3aa74eb140932b9e0435fb
[ROCm/ROCR-Runtime commit: 247606c455]
IPC use cases with RVD set can't convey proper agent handles.
Runtime discovery is required to properly route the copy in this
case.
Change-Id: I4c97e132fb4b6ac1040de1cb17fe5a3e36d6be48
[ROCm/ROCR-Runtime commit: c289a43e88]
Simplifies behavior. A memory type now either always generates an
entry or never does.
Change-Id: Ie98cddea01e801308ac0ba650795fdef92b7e47d
[ROCm/ROCR-Runtime commit: 0ba9b162db]
DeviceLibs is still needed but is found and included by clang now.
Change-Id: I03ff7dc91c028d2ee6747aa1779d223a9ba13915
[ROCm/ROCR-Runtime commit: 7f370dd84c]
This is consistent with KFD and has significantly better latency.
KFD is taking this as the definition of the SystemClockCounter.
Change-Id: I4c1b3bc58c738206265c55ebefd41356c013bfe5
[ROCm/ROCR-Runtime commit: 0ee82742a7]
Eliminates the need for manually assembling the source of the
second level trap handler to produce the shader binary. Also
separated blit shaders' binary source and version one second
level trap handler binary sources into different header files.
Change-Id: If29a18ee06dc083ec880ea962f234c6b5cac806a
[ROCm/ROCR-Runtime commit: 1b0440e7b3]
Host to device SDMA copies do not require an HDP cache flush when
connected by xGMI since data copies over the data fabric and not HDP.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Sean Keely <sean.keely@amd.com>
Change-Id: I78d73a47edcc1a9c0ba59f33cf91485f13f1c45b
[ROCm/ROCR-Runtime commit: 658b053943]
Declare the type of HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT
and add a missing break statement.
Change-Id: I86ce8a2e620438e046b60cee991ce1fbe07a3e88
[ROCm/ROCR-Runtime commit: 64dae113b1]
On gfx10+ we need to issue a minimum count of active lanes or
groups before ADC moves on. Ensure that scratch allocations
attempt to reach this limit.
Occupancy throttling due to OOM condition may still drop below this
limit.
Change-Id: I0edf2e40fbe1a95e9a262564cebd2b6a82501a0b
[ROCm/ROCR-Runtime commit: 2eedf953f3]
__x86_64__ and __AMD64__ should be already defined by the compiler to
specify the compilation target and shouldn't be defined manually.
I fixed two x86_64 checks to include VS variables, as removing this
might cause it to fail to compile on that compiler.
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I600ff449af85bf7d83ecab167d97933922e2d917
[ROCm/ROCR-Runtime commit: 178a7a5cfa]
Instead of installing to lib or include, use CMAKE_INSTALL_LIBDIR and
CMAKE_INSTALL_INCLUDEDIR to allow the builder to override if desired.
The default LIBDIR should be "lib" to avoid breaking ROCm packaging, but
using GNUInstallDirs would use lib64 on RHEL. By setting a default value
prior to including GNUInstallDirs, we can always use "lib" unless the
builder explicitly overrides it via "-DCMAKE_INSTALL_LIBDIR", which is
typical in most distro scripts.
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I135f21bcfeb02b6849f6e8ca403b39c029a02d5c
[ROCm/ROCR-Runtime commit: ddf4edcafc]
Image support does not compile on other archectures, since it relies on
the x86 only header "x86intrin.h".
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I120d15870e74e20bd618e6f5da8c05e28fb1203b
[ROCm/ROCR-Runtime commit: a0931f4a3c]
Each time delay is grown we need to reset elapsed. We want to take
the most accurate sample from the set at fixed delay.
Without this we will hang if there is ever an insufficiently accurate,
high unit clock read.
Change-Id: Ic65f364067789ac85a6572d67af2d77528e265bb
[ROCm/ROCR-Runtime commit: 4e9849034d]
Release staging buffers after loading has completed. The debugger
no longer uses this copy.
Change-Id: I46f36b50033bebe5a9ebc648b291d46f1d09b21d
[ROCm/ROCR-Runtime commit: 03a52655a8]