Eliminates the need for manually assembling the source of the
second level trap handler to produce the shader binary. Also
separated blit shaders' binary source and version one second
level trap handler binary sources into different header files.
Change-Id: If29a18ee06dc083ec880ea962f234c6b5cac806a
[ROCm/ROCR-Runtime commit: 1b0440e7b3]
Host to device SDMA copies do not require an HDP cache flush when
connected by xGMI since data copies over the data fabric and not HDP.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Sean Keely <sean.keely@amd.com>
Change-Id: I78d73a47edcc1a9c0ba59f33cf91485f13f1c45b
[ROCm/ROCR-Runtime commit: 658b053943]
[WHY]
These tests force HW exceptions in the GPU driver. Some of these
exceptions might print page fault error messages at kernel level.
These are expected errors due to the nature of the tests, but still
might cause confusion to users.
[HOW]
Add log message to warn the user about these kernel error messages.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I8eef87b83939e37230da0c374c2f77d2d484baa9
[ROCm/ROCR-Runtime commit: b9e8bc1f52]
Declare the type of HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT
and add a missing break statement.
Change-Id: I86ce8a2e620438e046b60cee991ce1fbe07a3e88
[ROCm/ROCR-Runtime commit: 64dae113b1]
On gfx10+ we need to issue a minimum count of active lanes or
groups before ADC moves on. Ensure that scratch allocations
attempt to reach this limit.
Occupancy throttling due to OOM condition may still drop below this
limit.
Change-Id: I0edf2e40fbe1a95e9a262564cebd2b6a82501a0b
[ROCm/ROCR-Runtime commit: 2eedf953f3]
Transistion KFDTest to use open source LLVM compiler instead of SP3
compiler
Change-Id: I26fff6a958bc48cb1f5509a11ec194d2ececf0ce
[ROCm/ROCR-Runtime commit: b9651d3118]
Includes a simple AssembleShader test which loops through all shaders
for all supported targets, dispatching a RunAssemble call for each
shader.
Also adds extra safety on a couple shaders that only work on
gfx9/gfx90a.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I3ca1c92136f3871eb62fcb9645694f22287aaeec
[ROCm/ROCR-Runtime commit: 7eeba830f8]
With LLVM-based assembly these shaders are now valid for GFX10, with the
exception of KFDSVMEvictTest.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Idc872139176bbc1cc8d7ae61a8e4572360ecb5d5
[ROCm/ROCR-Runtime commit: 025c6146d9]
KFDDBGTest is deprecated, so just removing references to IsaGen.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I9f094d847a8ae43cb3793253b34a7d7ed2179ac1
[ROCm/ROCR-Runtime commit: ac48163885]
Use ReadMemoryIsa transferred and updated from KFDEvictTest.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I566f9ec36398bc4d08ab90231688600356df4d6a
[ROCm/ROCR-Runtime commit: 097b11abad]
Makes use of macros to simplify shader code with instruction-level
differences depending on GFX version. These macros are extensible and
are prepended to every shader so that they are usable everywhere.
This patch introduces three macros used within IterateIsa and
ReadMemoryIsa shaders.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: If954e1b6d2027e9f55bf7e99bd9df2668d1da524
[ROCm/ROCR-Runtime commit: 5ceb35f428]
Initial commit for ShaderStore.hpp. Will contain consts char*'s for
all shaders used within KFDTest.
The LLVM assembler now takes care of the correct instructions to be used
for various GFX versions using directives embedded into the shader assembly.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I2887a03b33d5c2cc382e4f96c2bc3e067715ab54
[ROCm/ROCR-Runtime commit: 34ca37d9e8]
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id1eb3856bc74bf0da46685c5dc08e91f5df66d4f
[ROCm/ROCR-Runtime commit: a7b85fdb08]
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I7333d0e45ccd3f43690a2a01227f89a6e04fcecb
[ROCm/ROCR-Runtime commit: b44d6762bd]
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I174f1ea5332c499440b30d9bcf06836274428a0f
[ROCm/ROCR-Runtime commit: c845b976d0]
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I669f076b5c34eb90349865eeca1b29e17c9e80d6
[ROCm/ROCR-Runtime commit: 08d38fb140]
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
- LLVM syntax change on ScratchCopyDwordIsa_gfx10:
hwreg(HW_REG_SHADER_FLAT_SCRATCH_LO/HI) -> hwreg(HW_REG_FLAT_SCR_LO/HI)
- Fix bug in CopyOnSignalIsa_gfx10 and PollMemoryIsa_gfx10 whereby
flat_store_dword used vector reg format v[n,n]. Changed to v[n:n]
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id182cfb8aeb7372366c59affb5cbdd145909ee96
[ROCm/ROCR-Runtime commit: 039bce94a6]
Instantiate in KFDBaseComponentTest::SetUp() and destroy in TearDown().
This ensures m_pAsm is available for all tests.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I8b98a5350a9739d71455f14552c9879bdb1c475d
[ROCm/ROCR-Runtime commit: 235636d598]
Initial commit for transition from IsaGenerator/SP3 assembler model to
the LLVM AMDGPU (AMDGCN) assembler backend:
- Add Assembler class, may be instantiated for assembly similar to
IsaGenerator.
- Add Assembler and LLVM archive dependencies to build process.
- CXX bumped to gnu++14 as required for LLVM compilation.
- Compatible with LLVM 7.0 and greater (latest Lightning/llvm-git
version should be used for up-to-date gfx support). Note that this is
just a build dependency and *not* a runtime dependency. LLVM does not
need to be installed on the host machine to run kfdtest.
- CMake will first look for a Lightning build. Lightning itself does not
need to be installed system-wide, just built. If this fails, it will
attempt to find a system-wide LLVM install.
General Assembler usage and notes:
- Similar to IsaGenerator, applicable test classes will contain an
Assembler object pointer which may be instantiated in the test
constructor.
- Instantiation requires the GFXIP version in order to find the
appropriate LLVM AMDGPU Target ID.
- The RunAssemble() member func takes in a standard const char* shader and
fills the TextData member with the output binary; TextSize with the size
of TextData. These may be accessed via GetInstrStream() and
GetInstrStreamSize(), or the output binary may be copied into an
IsaBuffer via CopyInstrStream(). RunAssembleBuf() combines RunAssemble()
and CopyInstrStream() and additionally takes an optional BufSize
parameter to specify the size of the output buffer (defaults to
PAGE_SIZE).
- Assembler object deletion is to be done in the base test destructor.
Assembler-specific memory allocation is freed in the Assembler
destructor.
- For debug, one can call PrintTextHex() to print out a formatted hex
representation of the output binary, or PrintELFHex() to print out the
intermediate ELF object. Note that PrintTextHex() is public whereas
PrintELFHex() is private.
- Prints use the LLVM outs() call as that allows for use of the LLVM
format_hex() func in the aforementioned debug prints. This is subject to
change if the LOG() call would be preferred.
RunAssemble control flow:
- Ensure correct Assembler initialization and clear previous run
TextData (if necessary).
- Initialize LLVM AMDGPU target, required interfaces, and buffers.
- Set parser to specified target/subtarget and assemble into ELF code
object.
- Extract .text section from ELF, allocate space for TextData and store.
- On success, returns 0 (HSAKMT_STATUS_SUCCESS). On error, returns -1
(subject to change to be in line with HSAKMT_STATUS enum).
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I1d96230824db651d3ffbaa46eb68fc274e7066b5
[ROCm/ROCR-Runtime commit: 65b1e0c058]
According env setting HSA_XNACK=1 or 0, set XNACK mode ON or OFF to run
KFDSVMRangeTest and KFDSVMEvictTest. If HSA_XNACK is not defined, use
system boot-time XNACK mode setting.
Restore to the original XNACK mode when test finished.
Change-Id: Ia896a1b0a90854646c8a79acca38a7d46098efde
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 23ec6e880e]
AQL firmware can sometime send invalid signal interrupts with 0 context
ID. This test simulates this by submitting similar events using PM4
packets and measures the performance of signaling a normal event after
that.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I69028dc6dd98a5a93f18daad4efbe1b16b6098f9
[ROCm/ROCR-Runtime commit: e738e57fc4]
The KFD patch "drm/amdkfd: Ignore bogus signals from MEC efficiently" will
reserve one signal slot that user mode cannot use any more. Update
the maximum event number in KFDEventTest to match that change.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ic789e16b6d73dfea66ab51c5bbc075c8e8e2d052
[ROCm/ROCR-Runtime commit: 347bf6a03c]
On the some platform there's only 256MB vram and then will fail to
allocate 256MB vram. So let's limit a small vram allocation for
ensuring vram allocated successfully.
Change-Id: Iba4c469de56925675e5624b300a6153e24ab19b3
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
[ROCm/ROCR-Runtime commit: c86a0b8332]
It's not possible to allocate the 3/4 vram size with granularityMB
being 128 when vram size < 512MB and decrease granularityMB to 16 has
no significant impact on ROCt test on other system. So let's decrease
granularityMB on small vram system for handling LargestVramBufferTest().
Change-Id: Iea7c29abfd382a20761b653730fd09a220ad2fd0
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
[ROCm/ROCR-Runtime commit: 6c103877dd]
__x86_64__ and __AMD64__ should be already defined by the compiler to
specify the compilation target and shouldn't be defined manually.
I fixed two x86_64 checks to include VS variables, as removing this
might cause it to fail to compile on that compiler.
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I600ff449af85bf7d83ecab167d97933922e2d917
[ROCm/ROCR-Runtime commit: 178a7a5cfa]
Instead of installing to lib or include, use CMAKE_INSTALL_LIBDIR and
CMAKE_INSTALL_INCLUDEDIR to allow the builder to override if desired.
The default LIBDIR should be "lib" to avoid breaking ROCm packaging, but
using GNUInstallDirs would use lib64 on RHEL. By setting a default value
prior to including GNUInstallDirs, we can always use "lib" unless the
builder explicitly overrides it via "-DCMAKE_INSTALL_LIBDIR", which is
typical in most distro scripts.
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I135f21bcfeb02b6849f6e8ca403b39c029a02d5c
[ROCm/ROCR-Runtime commit: ddf4edcafc]
Tested on Talos II with Vega 64
POWER systems allocate NUMA nodes on multiples of 8 to allow CPU
onlining / offlining
Set the correct NUMA mask bits when requesting node-bound memory
allocations
This is a cleanup/squash/rebase of:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/47
Change-Id: Id4af6dff7e66e9d464d6b17a1e99087eb3ac8e51
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
[ROCm/ROCR-Runtime commit: 5fd3c868b2]
Some VRAM access tests in MMBandWidth can be very slow on systems with
complicated PCIe topology. Skip tests that take a long time to avoid
excessively long running tests with little benefit.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I2950237347fc2f764f6aa3292ab819051472bf37
[ROCm/ROCR-Runtime commit: 3ecd54f098]
Map failures happen in AllocBuffers function when there
isn't enough space to move BO to vram. In such cases, the
function retries allocation/map until successful to continue
testing eviction and restore.
Print a message in KFDEvictTest when this happens to correlate
to the message seen in the kernel log.
amdgpu 0000:c1:00.0: amdgpu: Failed to map peer:0000:c1:00.0 mem_domain:4
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0475d8d9521a07612182e54fc7cddb9bd44353e6
[ROCm/ROCR-Runtime commit: 0d07b3477b]
Image support does not compile on other archectures, since it relies on
the x86 only header "x86intrin.h".
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I120d15870e74e20bd618e6f5da8c05e28fb1203b
[ROCm/ROCR-Runtime commit: a0931f4a3c]
If PCIe Atomics aren't supported, we shouldn't try to run a test that
tests PCIe Atomics. Check for support, and bail early if it's not there
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ie9aa0fed3ece07fb83a33e6cacef2961626afab4
[ROCm/ROCR-Runtime commit: f62e9b9821]
While this is currently only used in one subtest, it's useful to have
this separated into the test utilities. This will also allow us to check
for PCI Atomics support before trying to run them.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I9704d151bfaa627eceae8399cc46c15babde6ff1
[ROCm/ROCR-Runtime commit: 8b54459e12]