Aqua Vanjaram is intended to have fine-grained coherency
from anywhere to anywhere else using read-acquire and
write-release primitives.
Add a test that writes to memory covered by five
different cache lines, then write-releases, while
another thread read-acquires, then reads those
five locations in memory.
There are nine variations of the test to cover
CPU-GPU, same-GPU and across-GPU, vector instructions and
scalar instructions, and data local to the
acquirer or receiver.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I20d2db5c53bd280e971479aad7e61df6ed5d3623
KFDMemoryTest.DeviceHdpFlush requires device node 0 is large bar to
check VRAM content from CPU, run the test only if device 0 is large
bar GPU.
Change-Id: I874b153219550c50b724625e971e3ed3a84dc652
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Nodes with XGMI have no HDP, so DriverHDPFlush should skip.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: If5a87e660712e51d03e750d8e044786036b2e603
Even with the restriction to only compile on gfx90a, this
shader still fails CompileShaders test.
There don't seem to be any systems that actually use it.
Leave it in the shader store, but remove it otherwise
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I41bec6ba10363d42b163ac101c3a92edaad6d6df
Register and map userptrs through Shared Virtual Memory(SVM) API at
the Kernel level when available. Using this approach, performance
will be improve as register/unregister memory will not trigger any
system call to KFD driver.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I3726b4b5e1c6a52a83786fbe0af6322eb29ae7c9
The MemoryAllocAll test in kfdtests exercises the new KFD memory
availability API by trying to allocate a single buffer object that
exactly fills all of vram. Desired object size is determined using the
memory availility KFD ioctl via libhsakmt, then an object is allocated
slightly larger than that size. If the allocation attempt fails then
the test tries to allocate a slightly smaller object, and continues
trying with smaller sizes until the allocation succeeds. The test
succeeds if the successfully allocated object is within some specified
tolerance of the available memory reported.
There are a number of known issues that can cause the successfully
allocated object to be significantly smaller than reported availability.
Until these issues are addressed, we should not fail the test, but just
log the actual divergence between the size of the object we thought we
could allocate, and what was actually possible.
Signed-off-by: Daniel Phillips <daniel.phillips@amd.com>
Change-Id: I165a30865ffbb2353286dcc896ad8e24af124615
This reverts commit 178a619b80.
There are some openMP issues that were introduced after SVM userptr
feature was added.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I7ef87c5232a3bcbe594c743fa4b4958601845ba5
Detect under-reporting of available memory by initially attempting to
allocate substantially more than reported available memory, and ensure
that the allocation fails. Continue shrinking the attempted allocation
until it succeeds, then fail the test if the successful allocation is
either too much more than or too much less than reported available.
Signed-off-by: Daniel Phillips <daniel.phillips@amd.com>
Change-Id: Ib418f0aa26e8db80590a6c5f2578da56a4b60f2b
Modifier scc is disabled from gfx90a's asm, so remove the
shader for gfx90a A+A and keep it for newer asics with scc
support.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: Iec3c7ccd5156a855adb2b02feb3db0761876aa2f
Register and map userptrs through Shared Virtual Memory(SVM) API at
the Kernel level when available. Using this approach, performance
will be improve as register/unregister memory will not trigger any
system call to KFD driver.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I20723cbeb340bf48b95e1115f0102c031397bc14
In DeviceHdpFlush, isaBuffer was accidentally used instead of isaBuffer0
during LLVM re-work--revert.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I98d2322e772f821a39505bb336ceb4e6cd8722ef
In the LLVM rework, a line was accidentally changed from
isaBuffer1 to isaBuffer, causing VramCacheCoherenceWithRemoteGPU
to fail. Change it back.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: Ie1f3465b5c46556f18682d1b3d1f086bb790c648
- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
- LLVM syntax change on ScratchCopyDwordIsa_gfx10:
hwreg(HW_REG_SHADER_FLAT_SCRATCH_LO/HI) -> hwreg(HW_REG_FLAT_SCR_LO/HI)
- Fix bug in CopyOnSignalIsa_gfx10 and PollMemoryIsa_gfx10 whereby
flat_store_dword used vector reg format v[n,n]. Changed to v[n:n]
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id182cfb8aeb7372366c59affb5cbdd145909ee96
It's not possible to allocate the 3/4 vram size with granularityMB
being 128 when vram size < 512MB and decrease granularityMB to 16 has
no significant impact on ROCt test on other system. So let's decrease
granularityMB on small vram system for handling LargestVramBufferTest().
Change-Id: Iea7c29abfd382a20761b653730fd09a220ad2fd0
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Some VRAM access tests in MMBandWidth can be very slow on systems with
complicated PCIe topology. Skip tests that take a long time to avoid
excessively long running tests with little benefit.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I2950237347fc2f764f6aa3292ab819051472bf37
Total VRAM size on APU is 512M usually,
Framebuffer also is allocated from VRAM.
There is no enough memory for this case.
/home/ruiliji2/p5/libhsakmt/tests/kfdtest/src/KFDMemoryTest.cpp:1285: Failure
Value of: (hsaKmtMapMemoryToGPUNodes(bufs[i], bufSize, &altVa, mapFlags, 1, &defaultGPUNode))
[ FAILED ] KFDMemoryTest.MMBench (1034 ms)
Change-Id: Ib4201291122d85f6512a85859aea9a4713fb4f5c
(cherry picked from commit a9f924484e7022a2d53ee02811b080f0833eba55)
skip HDP flush test when remap feature is not supported.
Backgroud:
the HDP register remap is skipped in sriov mode,
it will cause mmio base is nullPtr.
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ib9aea1900931e30571656397a485ee4db051ec0a
Test if query userptr pointer info return correct alloc flags,
CoarseGrain by default.
Test if query hsaKmtAllocMemory pointer info return correct alloc
CoarseGrain flags.
Change-Id: If3a1175645717e5d7c475d6ff35b02d6876a1f7c
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Test Thunk multiple threads register and deregister same userptr race
condition, to emulate application register same userptr to multiple
GPUs using multiple threads.
Use thread barrier to sync the threads, to start register userptr at
same time.
Change-Id: I6723dc39f75908026fa14a490e39e1fe49a13a1b
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
If the signal arrives too late, it interrupts waitpid. Handle this
situation gracefully.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If4925c352c81ba7fef8a940460b91f5e720b451e
It is to address gfx90a HW memory model changes.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: Ie5c5c5ee5ddfb75c0b4f625baf59ce37b4cc7c31
Three kfd subtests are added to verify new XGMI connection with
cache coherence HW link on A+A.
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Change-Id: I6960ec91cbfb696c4e6acb3b79fd83107003acdd
In A+A all system memory is mapped as NC. So add a new function
gfx9_PollNCMemory which will support NC memory.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I097b95fb156f73d6f480cd4fd262cc6fa5933f69
Refer to commit "Mark buffers accessed by CP as UC"
A+A buffers are mapped as NC. CP (PM4Writes) need ReleaseMem function to
ensure the write go through to the memory
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4ee55a6e40fba078f5950d95c8fee7ee076260bf
Add updated SP3 static library with support for gfx90a and
also add initial corresponding changes in kfdtest.
Change-Id: I71bc6404ace7f9bf0dd74e712287136aa2b8a03d
s_store_* instruction set was retired from gfx10.3
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Change-Id: Ibe41a3fe7e053fb345b1af6ad4abc22a0885bc81
On gfx1012, allocating 1/4 of the system memory on a 32G RAM machine
could fail, resulting in this test to fail. Limit the maximum buffer
to allocate to be smaller than 3G to accommodate this situation.
Change-Id: I38b0a0f7da1f0b9ca851e04d2d0a51767858c801
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
The previous BigBufferStressTest has too much stuff and takes a long
time to run. By separating largest*BufferTest out into other
tests, we dramatically reduce the time to run BigBufferStressTest and
therefore make reproducing issues much easier.
Meanwhile, rename the test to BigSysBufferStressTest to express more
information.
Change-Id: I5911f113c0bd50627ee6d84bbb4f2972cbed8886
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
In order to accommodate the flaky queue submission under memory
shortage scenarios, BigBufferStressTest has become very much a hack,
undermining its purpose of testing the basic memory related operations.
Therefore, remove the queue submission part. The EvictTest should serve
the purpose of testing the memory and queue submission functionalities
when memory eviction happens.
Change-Id: I3c3603a0e834267eccb72f46efeabe1e053c8fc5
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
If NUMA system no available memory on node 0, mbind will fail on node
0, so set flag NoNUMABind=1 to bypass mbind for all test cases which use
node 0 and allocate system memory.
Change-Id: I7962938ad2bed5a293ca5e6a8500c7f7e15ff453
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
When large bar is not available, we can use
xgmi to do p2p tests.
Change-Id: Ib7b59fb8a4d41f605739a0428973f6b2f1a3450f
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
The memory need to be mapped for both local and remote GPU access
Change-Id: I4aeaffc0851b6107fc91e9eaa6150764b06f5ca9
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Allocate system memory from node id 0 will fail on NUMA system which has
no memory on node 0. Change to use new flag NoNUMABind to allocate
system memory from NUMA nodes which have free memory.
Change-Id: I8ef9ca28fc2ab5dd31d07a2d3eaf1d5886e798a0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
A number of tests are no longer broken on gfx802.
Change-Id: If70c77423f8f14de59490ab8ca156b0c4e7b5cf1
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reserve half of dma32 zone for non-NUMA system.
Change-Id: Id7aea7b6ff6cc1cc7983ecd95f8078b7f1be630c
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
We should use the SE number reflected by NumShaderBanks of the node
rather than hardcoding it.
Change-Id: I945fb001f81ce506249cf485a7ce25aee8219bc7
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Remove the CHIP name from the shader ISA and add wave_size(32) to make the same
shader can be used for both GFX9 and GFX10
Change-Id: I16ea72f87980c3d9c11298e20c06a0a073fe9a28
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>