Commit Graph

2959 Commits

Author SHA1 Message Date
Graham Sider ad5f98814f kfdtest: Move KFDCWSRTest shaders to ShaderStore
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I7c89fca94e92145a4115d1089348380807a868ee
2022-04-26 13:14:33 -04:00
Graham Sider aced779f1b kfdtest: Move KFDQMTest shaders to ShaderStore
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id50aea16528c4bed4530f95644a02f59efddae3e
2022-04-26 13:14:33 -04:00
Graham Sider c926d83b5a kfdtest: Move KFDMemoryTest shaders to ShaderStore
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I3335ca1f9dbe849233cf85253e0e92b56a20b8c9
2022-04-26 13:14:33 -04:00
Graham Sider 34ca37d9e8 kfdtest: Add ShaderStore.cpp/hpp
Initial commit for ShaderStore.hpp. Will contain consts char*'s for
all shaders used within KFDTest.

The LLVM assembler now takes care of the correct instructions to be used
for various GFX versions using directives embedded into the shader assembly.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I2887a03b33d5c2cc382e4f96c2bc3e067715ab54
2022-04-26 13:14:33 -04:00
Graham Sider a7b85fdb08 kfdtest: Update KFDSVMEvictTest to LLVM Asm
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id1eb3856bc74bf0da46685c5dc08e91f5df66d4f
2022-04-26 13:14:33 -04:00
Graham Sider ba9ccd32a1 kfdtest: Update KFDGWSTest to LLVM Asm
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
- Change gds:1 modifier to gds
- Change offset0:0 modifier to offset:0

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I2a863695bcf7344cf184a809704948ba3a0d230f
2022-04-26 13:14:33 -04:00
Graham Sider b44d6762bd kfdtest: Update KFDEvictTest to LLVM Asm
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I7333d0e45ccd3f43690a2a01227f89a6e04fcecb
2022-04-26 13:14:33 -04:00
Graham Sider c845b976d0 kfdtest: Update KFDCWSRTest to LLVM Asm
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I174f1ea5332c499440b30d9bcf06836274428a0f
2022-04-26 13:14:33 -04:00
Graham Sider 08d38fb140 kfdtest: Update KFDQMTest to LLVM Asm
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I669f076b5c34eb90349865eeca1b29e17c9e80d6
2022-04-26 13:14:33 -04:00
Graham Sider 039bce94a6 kfdtest: Update KFDMemoryTest to LLVM Asm
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
- LLVM syntax change on ScratchCopyDwordIsa_gfx10:
hwreg(HW_REG_SHADER_FLAT_SCRATCH_LO/HI) -> hwreg(HW_REG_FLAT_SCR_LO/HI)
- Fix bug in CopyOnSignalIsa_gfx10 and PollMemoryIsa_gfx10 whereby
flat_store_dword used vector reg format v[n,n]. Changed to v[n:n]

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id182cfb8aeb7372366c59affb5cbdd145909ee96
2022-04-26 13:14:33 -04:00
Graham Sider 235636d598 kfdtest: Instantiate Assembler in KFDBaseComponentTest
Instantiate in KFDBaseComponentTest::SetUp() and destroy in TearDown().
This ensures m_pAsm is available for all tests.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I8b98a5350a9739d71455f14552c9879bdb1c475d
2022-04-26 13:14:33 -04:00
Graham Sider 2f73db8fb0 kfdtest: Add GetGfxVersion to KFDTestUtil
Required to derive LLVM AMDGPU target ASIC (MCPU).

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: If8f139b3858c9bf42feba23ae9210e14625dc08b
2022-04-26 13:14:33 -04:00
Graham Sider 65b1e0c058 kfdtest: Add LLVM AMDGPU assembler components
Initial commit for transition from IsaGenerator/SP3 assembler model to
the LLVM AMDGPU (AMDGCN) assembler backend:

- Add Assembler class, may be instantiated for assembly similar to
IsaGenerator.
- Add Assembler and LLVM archive dependencies to build process.
- CXX bumped to gnu++14 as required for LLVM compilation.
- Compatible with LLVM 7.0 and greater (latest Lightning/llvm-git
version should be used for up-to-date gfx support). Note that this is
just a build dependency and *not* a runtime dependency. LLVM does not
need to be installed on the host machine to run kfdtest.
- CMake will first look for a Lightning build. Lightning itself does not
need to be installed system-wide, just built. If this fails, it will
attempt to find a system-wide LLVM install.

General Assembler usage and notes:

- Similar to IsaGenerator, applicable test classes will contain an
Assembler object pointer which may be instantiated in the test
constructor.
- Instantiation requires the GFXIP version in order to find the
appropriate LLVM AMDGPU Target ID.
- The RunAssemble() member func takes in a standard const char* shader and
fills the TextData member with the output binary; TextSize with the size
of TextData. These may be accessed via GetInstrStream() and
GetInstrStreamSize(), or the output binary may be copied into an
IsaBuffer via CopyInstrStream(). RunAssembleBuf() combines RunAssemble()
and CopyInstrStream() and additionally takes an optional BufSize
parameter to specify the size of the output buffer (defaults to
PAGE_SIZE).
- Assembler object deletion is to be done in the base test destructor.
Assembler-specific memory allocation is freed in the Assembler
destructor.
- For debug, one can call PrintTextHex() to print out a formatted hex
representation of the output binary, or PrintELFHex() to print out the
intermediate ELF object. Note that PrintTextHex() is public whereas
PrintELFHex() is private.
- Prints use the LLVM outs() call as that allows for use of the LLVM
format_hex() func in the aforementioned debug prints. This is subject to
change if the LOG() call would be preferred.

RunAssemble control flow:

- Ensure correct Assembler initialization and clear previous run
TextData (if necessary).
- Initialize LLVM AMDGPU target, required interfaces, and buffers.
- Set parser to specified target/subtarget and assemble into ELF code
object.
- Extract .text section from ELF, allocate space for TextData and store.
- On success, returns 0 (HSAKMT_STATUS_SUCCESS). On error, returns -1
(subject to change to be in line with HSAKMT_STATUS enum).

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I1d96230824db651d3ffbaa46eb68fc274e7066b5
2022-04-26 13:14:33 -04:00
Philip Yang 23ec6e880e kfdtest: Set XNACK mode according HSA_XNACK env setting
According env setting HSA_XNACK=1 or 0, set XNACK mode ON or OFF to run
KFDSVMRangeTest and KFDSVMEvictTest. If HSA_XNACK is not defined, use
system boot-time XNACK mode setting.

Restore to the original XNACK mode when test finished.

Change-Id: Ia896a1b0a90854646c8a79acca38a7d46098efde
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2022-04-25 18:22:08 -04:00
Felix Kuehling e738e57fc4 kfdtest: Add test for invalid signal interrupts
AQL firmware can sometime send invalid signal interrupts with 0 context
ID. This test simulates this by submitting similar events using PM4
packets and measures the performance of signaling a normal event after
that.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I69028dc6dd98a5a93f18daad4efbe1b16b6098f9
2022-04-21 16:26:36 -04:00
Shweta Khatri 539ec6a87d Fix heap-buffer-overflow error in Memory access test. Also reverted most of first array element from 0 to 1 changes.
Change-Id: I62dee9bab379210a322848132e2846dc153724d9
2022-04-21 12:09:58 -04:00
Felix Kuehling 347bf6a03c kfdtest: Reduce maximum number of events to 4095
The KFD patch "drm/amdkfd: Ignore bogus signals from MEC efficiently" will
reserve one signal slot that user mode cannot use any more. Update
the maximum event number in KFDEventTest to match that change.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ic789e16b6d73dfea66ab51c5bbc075c8e8e2d052
2022-04-20 14:00:25 -04:00
Prike Liang c86a0b8332 kfdtest: limit vram allocation size for MigrateAccessInPlaceTest
On the some platform there's only 256MB vram and then will fail to
allocate 256MB vram. So let's limit a small vram allocation for
ensuring vram allocated successfully.

Change-Id: Iba4c469de56925675e5624b300a6153e24ab19b3
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
2022-04-19 23:28:45 -04:00
Prike Liang 6c103877dd kfdtest: decrease granularityMB for handling small vram system
It's not possible to allocate the 3/4 vram size with granularityMB
being 128 when vram size < 512MB and decrease granularityMB to 16 has
no significant impact on ROCt test on other system. So let's decrease
granularityMB on small vram system for handling LargestVramBufferTest().

Change-Id: Iea7c29abfd382a20761b653730fd09a220ad2fd0
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
2022-04-19 23:28:26 -04:00
Jeremy Newton 178a7a5cfa Drop some unnecessary definitions
__x86_64__ and __AMD64__ should be already defined by the compiler to
specify the compilation target and shouldn't be defined manually.

I fixed two x86_64 checks to include VS variables, as removing this
might cause it to fail to compile on that compiler.

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I600ff449af85bf7d83ecab167d97933922e2d917
2022-04-19 12:22:42 -04:00
Jeremy Newton ddf4edcafc Use CMAKE_INSTALL_*
Instead of installing to lib or include, use CMAKE_INSTALL_LIBDIR and
CMAKE_INSTALL_INCLUDEDIR to allow the builder to override if desired.

The default LIBDIR should be "lib" to avoid breaking ROCm packaging, but
using GNUInstallDirs would use lib64 on RHEL. By setting a default value
prior to including GNUInstallDirs, we can always use "lib" unless the
builder explicitly overrides it via "-DCMAKE_INSTALL_LIBDIR", which is
typical in most distro scripts.

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I135f21bcfeb02b6849f6e8ca403b39c029a02d5c
2022-04-19 12:22:42 -04:00
Timothy Pearson 5fd3c868b2 Initial support for POWER platforms
Tested on Talos II with Vega 64

POWER systems allocate NUMA nodes on multiples of 8 to allow CPU
onlining / offlining
Set the correct NUMA mask bits when requesting node-bound memory
allocations

This is a cleanup/squash/rebase of:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/47

Change-Id: Id4af6dff7e66e9d464d6b17a1e99087eb3ac8e51
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2022-04-19 12:19:15 -04:00
Felix Kuehling 3ecd54f098 kfdtest: Skip slow tests in MMBandWidth
Some VRAM access tests in MMBandWidth can be very slow on systems with
complicated PCIe topology. Skip tests that take a long time to avoid
excessively long running tests with little benefit.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I2950237347fc2f764f6aa3292ab819051472bf37
2022-04-15 23:03:41 -04:00
Divya Shikre 0d07b3477b kfdtest: Add log message in KFDEvictTest
Map failures happen in AllocBuffers function when there
isn't enough space to move BO to vram. In such cases, the
function retries allocation/map until successful to continue
testing eviction and restore.

Print a message in KFDEvictTest when this happens to correlate
to the message seen in the kernel log.
amdgpu 0000:c1:00.0: amdgpu: Failed to map peer:0000:c1:00.0 mem_domain:4

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0475d8d9521a07612182e54fc7cddb9bd44353e6
2022-04-14 18:14:03 -04:00
Jeremy Newton a0931f4a3c Only default IMAGE_SUPPORT=ON for x86
Image support does not compile on other archectures, since it relies on
the x86 only header "x86intrin.h".

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I120d15870e74e20bd618e6f5da8c05e28fb1203b
2022-04-12 09:24:45 -04:00
Konstantin Zhuravlyov 9265409f08 Add code object v5 support
Change-Id: I03522765056e99ed49e6c5e213ee3753852de27b
2022-04-12 08:53:27 -04:00
Sean Keely b3caf6782b Revert "Release host buffers after segment freeze."
This reverts commit 03a52655a8.

Change-Id: Idc7e568b2b54a226dbe4d189b25a78be3bd16eea
2022-04-11 20:43:07 -05:00
Kent Russell f62e9b9821 kfdtest: Check for Atomic Ops support before running Atomics test
If PCIe Atomics aren't supported, we shouldn't try to run a test that
tests PCIe Atomics. Check for support, and bail early if it's not there

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ie9aa0fed3ece07fb83a33e6cacef2961626afab4
2022-04-05 12:34:26 -04:00
Kent Russell 8b54459e12 kfdtest: Add function to check for PCI Atomic Ops support
While this is currently only used in one subtest, it's useful to have
this separated into the test utilities. This will also allow us to check
for PCI Atomics support before trying to run them.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I9704d151bfaa627eceae8399cc46c15babde6ff1
2022-04-05 11:03:36 -04:00
Sean Keely 4e9849034d Correct inf loop defect in fast clock init.
Each time delay is grown we need to reset elapsed.  We want to take
the most accurate sample from the set at fixed delay.

Without this we will hang if there is ever an insufficiently accurate,
high unit clock read.

Change-Id: Ic65f364067789ac85a6572d67af2d77528e265bb
2022-04-01 16:15:37 -04:00
Sean Keely 03a52655a8 Release host buffers after segment freeze.
Release staging buffers after loading has completed.  The debugger
no longer uses this copy.

Change-Id: I46f36b50033bebe5a9ebc648b291d46f1d09b21d
2022-03-23 23:53:02 -05:00
Sean Keely 048700f2e7 Correct loader memory interfaces.
The loader must use internal interfaces to access page allocation
flags.  Code pages should also ensure use of cached memory.

Also relocate i-cache flush after code page copy.

Change-Id: I86d36243b6eebb1d46b991b372a5236baaf941ab
2022-03-23 23:52:56 -05:00
Sean Keely fbc48521dc Correct queue error reporting.
VM faults should not report via the queue error handler.
The system event contains much more useful information.

Change-Id: I744d9b97b23334d7ed2c0f450111c1b8032567e3
2022-03-23 23:37:53 -05:00
Felix Kuehling f88aaa933b libhsakmt: Update kfd_ioctl.h
Import the latest version from the kernel tree.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If5f998ad55085ebd5020adaa382181204d834e3e
2022-03-21 14:41:18 -04:00
Sean Keely af0f90800d Ignore hive id for CPUs when selecting copy paths.
Hive ID is used during copy path selection to locate an optimal
pool of SDMA engines.  However, for CPU-GPU connections we always
want to use the host port facing engines, known generally as the
PCIe optimzed engines.  We want this selection even when the
connection is XGMI hence dropping the hive id for CPUs.

Change-Id: Iffe44174afecfc0bb3272b806fce549c930a49d9
2022-03-18 18:48:44 -05:00
Sean Keely 7e73760cd0 Revert "add gfx1036 support"
Compiler is not promoted to mainline yet.

This reverts commit 2f97f17df9.

Change-Id: I7256aeb3698ee3ae640a9f457a929abe24d5ef17
2022-03-18 02:35:01 -05:00
Sean Keely 7ab0d786c2 Disable warnings as errors for rocrtst.
Change-Id: Ibe76c4c7f20fc0273dd02038477e7f9fc7800a3d
2022-03-11 17:55:55 -05:00
Alex Sierra dc33a092c0 kfdtest: remove log message at hsaKmtSVMSetAttr failure
This error messages should be handled by the caller.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I68d879d6d41835f47b8ac138c2218eaa6b86a512
2022-03-08 12:15:59 -06:00
Yifan Zhang 2f97f17df9 add gfx1036 support
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I075779b1369fde759c29572fa2027a3748d6ed4c
2022-03-05 13:16:19 +08:00
Sean Keely 8a6954c63c Do not allow occupancy restriction on cooperative groups.
Excessive scratch allocations can normally trigger occupancy
reduction.  This breaks cooperative groups so if occupancy
reduction is required on a cooperative dispatch fail with OOM.

Change-Id: I64612a2e38bf1286f3b74c1c2a68ab0c85452771
2022-03-02 19:59:30 -06:00
Sean Keely 552dcead93 Correct scratch allocation logic to account for asymmetric harvest.
With asym. harvest hw does not issue groups equally to each SE,
occasionally hw will skip an SE so that the distribution reflects
each SE's CU count.  Scratch resources must be allocated to reflect
this asymmetric distribution of groups.

Change-Id: I65e26206500483ea18e6e8796e65ecba5354b029
2022-03-02 19:59:30 -06:00
Sean Keely cedc3e80a8 Do not bump up total scratch size for large cached allocations.
HW does not ignore low bits of the scratch wave count and will
stride beyond the end of the allocation if the wave count is
ever indivisible by SE count.  Rather than returning the allocation
size for cached large scratch allocations, use the requested
scratch size in scratch setup.  Scratch cache will retain the
cached allocation's size.

Change-Id: I0129ddc99a8940d01d8fbcd0b02d5061f31f456d
2022-03-02 20:48:19 -05:00
Mukul Joshi b8dc875b3c libhsakmt: Update context save area size calculations
Currently, context save area size passed to KFD includes the
size of the debug area. Change this to report the actual size
of the context save area to KFD.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I5d440ae802255a97ade046775f6a000bae79d5d5
2022-03-02 15:28:38 -05:00
Saravanan Solaiyappan 046f2e9116 Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
in package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ib95ea72f15bfbf4141b69b0a8ca4d3a71fe1c093
2022-02-24 12:01:39 -05:00
Saravanan Solaiyappan a496adafaa Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
in package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic766d8d68b5168e5f1b065d846ca2604d281e5be
2022-02-24 10:26:04 -05:00
Sean Keely b9a0c1d313 Do not discard fragment allocator blocks multiple times.
discardBlock may be called multiple times on the same block.
We must not discard the block multiple times or we will corrupt
in-use memory accounting.

Change-Id: Ife9f3162785965a795dcf81887d4d447cc096e62
2022-02-10 18:39:46 -06:00
Sean Keely 266cd68524 Add fallback case for cache line size.
KFD sometimes returns 0 for cache line sizes.

Change-Id: If82de0068318bbc138f0d1d4692ff908359174ad
2022-02-10 18:39:46 -06:00
Lang Yu 052b7957ea libhsakmt: Add another pci device id for cyan skillfish
Add PCI DID for cyan skillfish.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: I1d06936cccdf99af76fe5ca3ff323538fac76c9c
2022-01-27 01:41:00 -05:00
Aaron Liu 7cdf38f6c0 libhsakmt: correct the gfx version for gfx90c
The gfx version of gfx90c is 90C instead of 902.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: Id009c9357f816b8ccab605090df47626f1a579ef
2022-01-26 01:25:58 -05:00
Sean Keely 21291b48c6 Retrieve cache line size from KFD topology.
Change-Id: I16ddd9d9888bb973eccf3c562619894c88c7df15
2022-01-16 08:44:44 -06:00