Initial commit for ShaderStore.hpp. Will contain consts char*'s for
all shaders used within KFDTest.
The LLVM assembler now takes care of the correct instructions to be used
for various GFX versions using directives embedded into the shader assembly.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I2887a03b33d5c2cc382e4f96c2bc3e067715ab54
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id1eb3856bc74bf0da46685c5dc08e91f5df66d4f
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
- Change gds:1 modifier to gds
- Change offset0:0 modifier to offset:0
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I2a863695bcf7344cf184a809704948ba3a0d230f
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I7333d0e45ccd3f43690a2a01227f89a6e04fcecb
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I174f1ea5332c499440b30d9bcf06836274428a0f
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I669f076b5c34eb90349865eeca1b29e17c9e80d6
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
- LLVM syntax change on ScratchCopyDwordIsa_gfx10:
hwreg(HW_REG_SHADER_FLAT_SCRATCH_LO/HI) -> hwreg(HW_REG_FLAT_SCR_LO/HI)
- Fix bug in CopyOnSignalIsa_gfx10 and PollMemoryIsa_gfx10 whereby
flat_store_dword used vector reg format v[n,n]. Changed to v[n:n]
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id182cfb8aeb7372366c59affb5cbdd145909ee96
Instantiate in KFDBaseComponentTest::SetUp() and destroy in TearDown().
This ensures m_pAsm is available for all tests.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I8b98a5350a9739d71455f14552c9879bdb1c475d
Initial commit for transition from IsaGenerator/SP3 assembler model to
the LLVM AMDGPU (AMDGCN) assembler backend:
- Add Assembler class, may be instantiated for assembly similar to
IsaGenerator.
- Add Assembler and LLVM archive dependencies to build process.
- CXX bumped to gnu++14 as required for LLVM compilation.
- Compatible with LLVM 7.0 and greater (latest Lightning/llvm-git
version should be used for up-to-date gfx support). Note that this is
just a build dependency and *not* a runtime dependency. LLVM does not
need to be installed on the host machine to run kfdtest.
- CMake will first look for a Lightning build. Lightning itself does not
need to be installed system-wide, just built. If this fails, it will
attempt to find a system-wide LLVM install.
General Assembler usage and notes:
- Similar to IsaGenerator, applicable test classes will contain an
Assembler object pointer which may be instantiated in the test
constructor.
- Instantiation requires the GFXIP version in order to find the
appropriate LLVM AMDGPU Target ID.
- The RunAssemble() member func takes in a standard const char* shader and
fills the TextData member with the output binary; TextSize with the size
of TextData. These may be accessed via GetInstrStream() and
GetInstrStreamSize(), or the output binary may be copied into an
IsaBuffer via CopyInstrStream(). RunAssembleBuf() combines RunAssemble()
and CopyInstrStream() and additionally takes an optional BufSize
parameter to specify the size of the output buffer (defaults to
PAGE_SIZE).
- Assembler object deletion is to be done in the base test destructor.
Assembler-specific memory allocation is freed in the Assembler
destructor.
- For debug, one can call PrintTextHex() to print out a formatted hex
representation of the output binary, or PrintELFHex() to print out the
intermediate ELF object. Note that PrintTextHex() is public whereas
PrintELFHex() is private.
- Prints use the LLVM outs() call as that allows for use of the LLVM
format_hex() func in the aforementioned debug prints. This is subject to
change if the LOG() call would be preferred.
RunAssemble control flow:
- Ensure correct Assembler initialization and clear previous run
TextData (if necessary).
- Initialize LLVM AMDGPU target, required interfaces, and buffers.
- Set parser to specified target/subtarget and assemble into ELF code
object.
- Extract .text section from ELF, allocate space for TextData and store.
- On success, returns 0 (HSAKMT_STATUS_SUCCESS). On error, returns -1
(subject to change to be in line with HSAKMT_STATUS enum).
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I1d96230824db651d3ffbaa46eb68fc274e7066b5
According env setting HSA_XNACK=1 or 0, set XNACK mode ON or OFF to run
KFDSVMRangeTest and KFDSVMEvictTest. If HSA_XNACK is not defined, use
system boot-time XNACK mode setting.
Restore to the original XNACK mode when test finished.
Change-Id: Ia896a1b0a90854646c8a79acca38a7d46098efde
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
AQL firmware can sometime send invalid signal interrupts with 0 context
ID. This test simulates this by submitting similar events using PM4
packets and measures the performance of signaling a normal event after
that.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I69028dc6dd98a5a93f18daad4efbe1b16b6098f9
The KFD patch "drm/amdkfd: Ignore bogus signals from MEC efficiently" will
reserve one signal slot that user mode cannot use any more. Update
the maximum event number in KFDEventTest to match that change.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ic789e16b6d73dfea66ab51c5bbc075c8e8e2d052
On the some platform there's only 256MB vram and then will fail to
allocate 256MB vram. So let's limit a small vram allocation for
ensuring vram allocated successfully.
Change-Id: Iba4c469de56925675e5624b300a6153e24ab19b3
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
It's not possible to allocate the 3/4 vram size with granularityMB
being 128 when vram size < 512MB and decrease granularityMB to 16 has
no significant impact on ROCt test on other system. So let's decrease
granularityMB on small vram system for handling LargestVramBufferTest().
Change-Id: Iea7c29abfd382a20761b653730fd09a220ad2fd0
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
__x86_64__ and __AMD64__ should be already defined by the compiler to
specify the compilation target and shouldn't be defined manually.
I fixed two x86_64 checks to include VS variables, as removing this
might cause it to fail to compile on that compiler.
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I600ff449af85bf7d83ecab167d97933922e2d917
Instead of installing to lib or include, use CMAKE_INSTALL_LIBDIR and
CMAKE_INSTALL_INCLUDEDIR to allow the builder to override if desired.
The default LIBDIR should be "lib" to avoid breaking ROCm packaging, but
using GNUInstallDirs would use lib64 on RHEL. By setting a default value
prior to including GNUInstallDirs, we can always use "lib" unless the
builder explicitly overrides it via "-DCMAKE_INSTALL_LIBDIR", which is
typical in most distro scripts.
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I135f21bcfeb02b6849f6e8ca403b39c029a02d5c
Tested on Talos II with Vega 64
POWER systems allocate NUMA nodes on multiples of 8 to allow CPU
onlining / offlining
Set the correct NUMA mask bits when requesting node-bound memory
allocations
This is a cleanup/squash/rebase of:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/47
Change-Id: Id4af6dff7e66e9d464d6b17a1e99087eb3ac8e51
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Some VRAM access tests in MMBandWidth can be very slow on systems with
complicated PCIe topology. Skip tests that take a long time to avoid
excessively long running tests with little benefit.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I2950237347fc2f764f6aa3292ab819051472bf37
Map failures happen in AllocBuffers function when there
isn't enough space to move BO to vram. In such cases, the
function retries allocation/map until successful to continue
testing eviction and restore.
Print a message in KFDEvictTest when this happens to correlate
to the message seen in the kernel log.
amdgpu 0000:c1:00.0: amdgpu: Failed to map peer:0000:c1:00.0 mem_domain:4
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0475d8d9521a07612182e54fc7cddb9bd44353e6
Image support does not compile on other archectures, since it relies on
the x86 only header "x86intrin.h".
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I120d15870e74e20bd618e6f5da8c05e28fb1203b
If PCIe Atomics aren't supported, we shouldn't try to run a test that
tests PCIe Atomics. Check for support, and bail early if it's not there
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ie9aa0fed3ece07fb83a33e6cacef2961626afab4
While this is currently only used in one subtest, it's useful to have
this separated into the test utilities. This will also allow us to check
for PCI Atomics support before trying to run them.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I9704d151bfaa627eceae8399cc46c15babde6ff1
Each time delay is grown we need to reset elapsed. We want to take
the most accurate sample from the set at fixed delay.
Without this we will hang if there is ever an insufficiently accurate,
high unit clock read.
Change-Id: Ic65f364067789ac85a6572d67af2d77528e265bb
The loader must use internal interfaces to access page allocation
flags. Code pages should also ensure use of cached memory.
Also relocate i-cache flush after code page copy.
Change-Id: I86d36243b6eebb1d46b991b372a5236baaf941ab
VM faults should not report via the queue error handler.
The system event contains much more useful information.
Change-Id: I744d9b97b23334d7ed2c0f450111c1b8032567e3
Import the latest version from the kernel tree.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If5f998ad55085ebd5020adaa382181204d834e3e
Hive ID is used during copy path selection to locate an optimal
pool of SDMA engines. However, for CPU-GPU connections we always
want to use the host port facing engines, known generally as the
PCIe optimzed engines. We want this selection even when the
connection is XGMI hence dropping the hive id for CPUs.
Change-Id: Iffe44174afecfc0bb3272b806fce549c930a49d9
This error messages should be handled by the caller.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I68d879d6d41835f47b8ac138c2218eaa6b86a512
Excessive scratch allocations can normally trigger occupancy
reduction. This breaks cooperative groups so if occupancy
reduction is required on a cooperative dispatch fail with OOM.
Change-Id: I64612a2e38bf1286f3b74c1c2a68ab0c85452771
With asym. harvest hw does not issue groups equally to each SE,
occasionally hw will skip an SE so that the distribution reflects
each SE's CU count. Scratch resources must be allocated to reflect
this asymmetric distribution of groups.
Change-Id: I65e26206500483ea18e6e8796e65ecba5354b029
HW does not ignore low bits of the scratch wave count and will
stride beyond the end of the allocation if the wave count is
ever indivisible by SE count. Rather than returning the allocation
size for cached large scratch allocations, use the requested
scratch size in scratch setup. Scratch cache will retain the
cached allocation's size.
Change-Id: I0129ddc99a8940d01d8fbcd0b02d5061f31f456d
Currently, context save area size passed to KFD includes the
size of the debug area. Change this to report the actual size
of the context save area to KFD.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I5d440ae802255a97ade046775f6a000bae79d5d5
Include the upgrade operation check in the prerm and postun scripts
in package.
Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ib95ea72f15bfbf4141b69b0a8ca4d3a71fe1c093
Include the upgrade operation check in the prerm and postun scripts
in package.
Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic766d8d68b5168e5f1b065d846ca2604d281e5be
discardBlock may be called multiple times on the same block.
We must not discard the block multiple times or we will corrupt
in-use memory accounting.
Change-Id: Ife9f3162785965a795dcf81887d4d447cc096e62