When compiling in ASAN mode, remap the first page of device allocations
to system memory. ASAN's memory allocator uses a small amount of extra
memory to store data for housekeeping purpose. But because this memory
is from the GPU memory pool, it might have uncommon memory type for host
to access. Mapping this section of memory to the host makes this memory
accessible to ASAN.
Change-Id: I36f659d616a4d15558372592439a8723c5c84a69
Signed-off-by: Bing Ma <Bing.Ma@amd.com>
[ROCm/ROCR-Runtime commit: 50e754d08b]
Add support for HSA_ENABLE_PEER_SDMA env variable that can be used to
disable use of SDMA engines for device-to-device transfers. Note that
setting HSA_ENABLE_SDMA=0 will disable all SDMA transfers and override
HSA_ENABLE_PEER_SDMA values.
Change-Id: I737b3c2b2efcf3ff237f98bc748f49b8252ed24a
[ROCm/ROCR-Runtime commit: a397373cea]
For aqua_vanjaram APU mode, KFDEvictTest and KFDSVMEvictTest are
skipped. Those tests passed on dGPU mode with memory reporting partition
support on GFX 9.4.3.
Change-Id: I56357843c6743b01b807359dbb37b32391fd9a25
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 5df82e3d14]
Add support functions to remap the first page of device memory (GPU/GTT)
to share host ASAN logic.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: I4c27d5417ba80a172dccb0a079a597c5dc1c8f85
[ROCm/ROCR-Runtime commit: 1e6d728730]
RUNPATH in libraries will be : $ORIGIN
RUNPATH in binaries will be : $ORIGIN/../lib
Change-Id: Iafa66a8e02cc8c5783903d40927b63652042d2f1
[ROCm/ROCR-Runtime commit: ad002f1e7b]
Update documentation for hsa_amd_pointer_info to clarify which fields
are invalid when the allocation type is HSA_EXT_POINTER_TYPE_UNKNOWN.
Change-Id: Idaed985962c4a98d281ebe01bef8ec2459da3985
[ROCm/ROCR-Runtime commit: 39feb83b88]
Some workloads running on multi-GPU create 1 process per GPU. So each
process creates a GPU agent on every GPU, but will only create queues on
one GPU. This would cause un-necessary scratch reservation.
Change-Id: I50a216f0bcc0b5f707f3943147390b0ecec1ac22
[ROCm/ROCR-Runtime commit: 38e832a682]
If the required scratch allocation is too large, ROCr will attempt to
reduce it by lowering the dispatch's targeted occupancy. The reduction
loop however was prone to overflow if waves_per_cu was not a multiple of
waves_per_group. Ensure no overflow by aligning waves_per_cu to
waves_per_group.
On GC 9.4.3 dGPU, dispatches with a large grid size and a
waves_per_group of e.g. 16 may require to reduce occupancy such that
waves_per_cu is less than waves_per_group to ensure the allocation size
is small enough. Allow this while also ensuring the tmpring scratch wave
count is kept divisible by the number of SEs per XCC.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie4016dcd8166a9ae69e9decc26a3eec882b49480
[ROCm/ROCR-Runtime commit: bd63e5045c]
When we merge thunk into ROCr, kfdtest will be in a different folder
structure. Add the new location to ensure that we can build now and in
the future with no disruptions
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I6517e061cb0da7137d903abbc380bfc7126f40d4
[ROCm/ROCR-Runtime commit: d966243783]
Scratch cache reserved memory is only available for scratch memory use
so do not report this memory as available to the user via the
HSA_AMD_AGENT_INFO_MEMORY_AVAIL api.
Change-Id: I52f96e62536458bcaa52b9f4be5de856d5680dc4
[ROCm/ROCR-Runtime commit: 3477fbc661]
Temporarily disabling rocrtstNeg.Queue_Validation_InvalidGroupMemory
until it is fixed.
Change-Id: Ifc1973a960c8d0bae27e2628e4bfddc60f70325d
[ROCm/ROCR-Runtime commit: 7b74271d5e]
Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work
on unmapped queues for usermode queue oversubscription (no aggregated doorbell)
Change-Id: I10e30fdc2bec587cef9427faa4874957988c34b3
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
[ROCm/ROCR-Runtime commit: d319660838]
If MES is enabled, wptr has to be non paged memory,
Add an API to check this condition.
Change-Id: I53af1f6687d5332d102e7062c3d760e33b96e722
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
[ROCm/ROCR-Runtime commit: 53ed978c3d]
Using wrapper header files will result in #warning message by default
Change-Id: I8301e433d39f3e5d39384ede6f0e4464d0eb20a6
[ROCm/ROCR-Runtime commit: b487f87363]
Using wrapper header files will result in #warning message by default
Change-Id: I87739cabb365b9370b1182cf23ca9b54d99149c3
[ROCm/ROCR-Runtime commit: fbcbcd9e73]
If Dev0 and Dev1 are not the same gfx, we should temporarily
set the target ASIC for compiling Shader code.
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Signed-off-by: Shikai Guo <shikai.guo@amd.com>
Change-Id: I5836beb16ade519f5a148d3d2b9c2875554f0c35
[ROCm/ROCR-Runtime commit: 5d6f900353]
Overload Assembler::RunAssembleBuf to take in an extra Gfxv parameter.
Using this overload will temporarily set the target ASIC to Gfxv before
calling RunAssemble, and copy back the original MCPU literal upon
completion. The copy to reset the original MCPU in this case is safe as
the MCPU length is always known.
This will be useful in multi-device test cases whereby the devices are
not necessarily the same gfx version. The overload is explicitly for the
RunAssembleBuf wrapper rather than RunAssemble to ensure the default
MCPU is always reset independent of errors in RunAssemble.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I7fe5a962876314b6df32e4b7160174949d98f9e3
[ROCm/ROCR-Runtime commit: 54136f60a0]
Negative queue validation tests were doing many redundant from-file
kernel object loads in a loop. This was creating many simulataneous open
file handles within many dynamically allocated CodeObject objects. While
the CodeObject class implements RAII on the file handles to cleanup on
destruction, clear_code_object() only gets called on the destruction of
the TestBase-derived test objects (these being a suite abstraction).
Due to this we were hitting file open() EMFILE errors (too many open
files) in gfx94x CPX mode. Move LoadKernelFromObjFile outside of the
test loops and clear_code_object() for each test on each agent.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I6f9d23fd122720c49a58c22698f097906d2fc97c
[ROCm/ROCR-Runtime commit: 7a4c9273d7]
Add HSA_ENABLE_SRAMECC environment variable that can be used to
override SRAM ECC mode reported by KFD
Change-Id: I2b95511820a2d3d146a76b03070659c0695b61fd
[ROCm/ROCR-Runtime commit: a180c9ee78]
The gfx940 does not support IMAGE instructions. Any get_info with
IMAGE attributes should return failure.
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: I12005628f92780f551ab6f8b41526c66b54c6a59
[ROCm/ROCR-Runtime commit: 46b667e530]
The function IDs used to be 0 on previous asics but on gfx94x and newer
asics, these bits are set. These bits are used by user applications to
uniquely identify the locations of GPU nodes. These exta bits break
hwloc and are not needed for rocrtst.
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: I1202f504645b0662d009b9c0926eebb7ddc08d73
[ROCm/ROCR-Runtime commit: d7fa654338]
gfx940 uses ttmp11 to hold the queue packet index so the first level
trap handler uses ttmp13 instead to save ib_sts.
Repurpose ttmp11[31] to mean that the ttmps are initialized. The issue
was that the debugger could not tell whether ttmp6 was written by the
trap handler when determining the stop reason.
If ttmp11[31]=0, then the trap handler has not been executed and ttmp6
should be assumed to be 0. If ttmp11[31]=1, then ttmp6 holds the
trap_id, if an s_trap instruction caused the exception.
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Lancelot Six <lancelot.six@amd.com>
Change-Id: I9af903abae044b9ec530306229caf3b883f3ee46
[ROCm/ROCR-Runtime commit: f31b312611]
LLVM MC does not seem to accept multi-line conditionals. This may be
fixable in the future with macros. The Aqua Vanjaram shader spec states
that while buffer_invl2 has been replaced by buffer_inv, the former may
still be used for compatibility. However, this does not seem to be
implemented. For now, fix conditional.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I7f8b64c96055371d7e0090b758d2cfd2a37ecd3c
[ROCm/ROCR-Runtime commit: 92f3d4a458]
Previous code might fail to get the correct ln node. And trigger extra
walk through of the tree. Fix it.
While walking through the tree, better to search from right to left as
the node->start likely close to *address*.
Change-Id: If86ddf73e59a1eb88225d1ea90797818e8165488
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
[ROCm/ROCR-Runtime commit: 77761836ae]
These tests should also pass on Aqua Vanjaram, so enable them
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ibbb9cd43d653c63b08c39efd1d7326cfac1f8411
[ROCm/ROCR-Runtime commit: eed5518e4c]
Aqua Vanjaram is intended to have fine-grained coherency
from anywhere to anywhere else using read-acquire and
write-release primitives.
Add a test that writes to memory covered by five
different cache lines, then write-releases, while
another thread read-acquires, then reads those
five locations in memory.
There are nine variations of the test to cover
CPU-GPU, same-GPU and across-GPU, vector instructions and
scalar instructions, and data local to the
acquirer or receiver.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I20d2db5c53bd280e971479aad7e61df6ed5d3623
[ROCm/ROCR-Runtime commit: 30b1f23f7a]
For vector iterator loop access current node directly, don't need
gpuNodesAll.at(i), which also causes out of range access.
Change vector index loop to iterator loop to simplify the code.
Change-Id: I2627ef8d13b5d2c9cd8c51cf4dacc3e8a97fcfb0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 0696f06c16]
tempnam has been marked as obsolete.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ie64d9a351bf386da00a96ceff059f685e11f2cca
[ROCm/ROCR-Runtime commit: e82025bffa]
AppAPU VRAM is part of system memory managed by Linux kernel, no
VRAM eviction and restore is needed between VRAM and system memory.
Those Evict test failed on AppAPU now, skip those tests on AppAPU.
No page migration between VRAM and system on AppAPU, HMMProfilingEvent
depends on migration event, skip it on AppAPU.
Change-Id: I4c809b97c947e809d136c1f88db2278cf74f5b47
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 21abaef3f8]
If there is connection between GPU and CPU with weight 13,
KFD_CRAT_INTRA_SOCKET_WEIGHT, then this is AppAPU.
This will be used to skip tests not suitable for AppAPU.
Change-Id: If6fad81528b52afd4ac4cefa508d787b0f6637ca
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: e2df2c21af]
On Linux, the os_thread abstraction is built on top of pthread. Many of
the pthread calls might fail and return error codes. The error
conditions are only checked via assertions (if ever checked) which means
that when doing a release build, no error condition is checked. The
same goes for dlsym/dlinfo and clock_gettime.
This commit improves the situation this by checking the error conditions
and acting accordingly. When the error condition is detected in a
function with a mean to indicate some error to its caller, then this
patch prints some error message and returns. If there is no way to
propagate the error up the call stack, print some error message and
abort the process.
For the os_info::os_info ctor, the only user is CreateThread, which
checks that the built thread is Valid(). If not, nullptr is returned to
the caller.
It could be possible to use exceptions when functions cannot pass
errors, but for now I only use abort as it is what abort would do with
debug build.
Change-Id: I815703c3b95777cc29bb89a7d654ac879c14a759
[ROCm/ROCR-Runtime commit: 183f5d90aa]
When building with g++-11.3.0, I have the following warning:
/home/.../core/runtime/runtime.cpp: In member function ‘hsa_status_t rocr::core::Runtime::GetSystemInfo(hsa_system_info_t, void*)’:
/home/.../core/runtime/runtime.cpp:693:56: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
693 | kfd_version.KernelInterfaceMajorVersion == 1 &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
694 | kfd_version.KernelInterfaceMinorVersion >= 12)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This patch adds the parenthesis as suggested. This silences the
compiler warning.
No functional change expected.
Change-Id: I69c1a73a432b0f2393dbaf36d4424cf0056c535f
[ROCm/ROCR-Runtime commit: 72219b8237]
We should check compute core instead of cpu core,
in order to exclude the case of APU.
Signed-off-by: Jesse zhang <jesse.zhang@amd.com>
Change-Id: I2ec2a6807f51f49f80e0e500f5d9af81c2efae37
[ROCm/ROCR-Runtime commit: 4d54d6e706]
For GC 9.4.0, modifications were made to various shaders since certain
flat_ instructions no longer support glc/slc modifiers (replaced with
nt/sc1/sc0). Instead of repeating conditionals inside various shader
bodies, we can make use of LLVM AMDGCN macros.
This patch modularizes the shader macros into seperated defines. Prior
to the core raw-string literal, each shader now starts with the
SHADER_START literal (".text\n") plus any number of SHADER_MACRO_*
literals. This allows us to seperate the macro definitions logically and
use the pre-processor to only include the required macro groups on a
per-shader basis.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I19eb3fd14252a0601bb7509249051b68e7fdb02a
[ROCm/ROCR-Runtime commit: e2435d9e93]
Previously, KFDEvictTest.QueueTest and KFDSVMEvictTest.QueueTest
would create a variable number of wavefronts, one for each 64MB
of memory under test. This ran into limits on the buffers used
by the wavefronts, and may at some point have exceeded the
wavefront limit.
Restrict the number of wavefronts to 512, and adjust the shader
to accomodate a variable buffer size
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I2ec292e2900e2efa62a08313bca3d2f4bdabca8b
[ROCm/ROCR-Runtime commit: 680c8ca5a9]
GC 9.4.3 to set gfx target version to 9.4.x dependent on revision and
capabilities. Due to this, where applicable, mask off the gfx target
stepping version and only check major/minor version (9.4). There are no
collisions due to this change since GC 9.4.3 is the only ASIC that uses
gfx target version 9.4.x.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I72803e594c421f054d18ccfa7e92c507128fa5be
[ROCm/ROCR-Runtime commit: 831d1ad352]
KFDMemoryTest.DeviceHdpFlush requires device node 0 is large bar to
check VRAM content from CPU, run the test only if device 0 is large
bar GPU.
Change-Id: I874b153219550c50b724625e971e3ed3a84dc652
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 598e3e8d86]
Nodes with XGMI have no HDP, so DriverHDPFlush should skip.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: If5a87e660712e51d03e750d8e044786036b2e603
[ROCm/ROCR-Runtime commit: e32278a612]
Even with the restriction to only compile on gfx90a, this
shader still fails CompileShaders test.
There don't seem to be any systems that actually use it.
Leave it in the shader store, but remove it otherwise
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I41bec6ba10363d42b163ac101c3a92edaad6d6df
[ROCm/ROCR-Runtime commit: 16c6530330]
A gfx940 code path was erroneously added to this shader.
It's unneccesary; without this path, the shader uses
the scalar store, which works just fine on gfx940 without changes.
Remove it.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I825cbbebbdb25c4a7c2f16e228c2bea6a6bcc30c
[ROCm/ROCR-Runtime commit: 2a01e5c33b]