- Reformat shaders for legibility
- Move assembly processes to from IsaGen (CompileShader) to Assembler
(RunAssembleBuf)
- LLVM syntax change on ScratchCopyDwordIsa_gfx10:
hwreg(HW_REG_SHADER_FLAT_SCRATCH_LO/HI) -> hwreg(HW_REG_FLAT_SCR_LO/HI)
- Fix bug in CopyOnSignalIsa_gfx10 and PollMemoryIsa_gfx10 whereby
flat_store_dword used vector reg format v[n,n]. Changed to v[n:n]
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id182cfb8aeb7372366c59affb5cbdd145909ee96
It's not possible to allocate the 3/4 vram size with granularityMB
being 128 when vram size < 512MB and decrease granularityMB to 16 has
no significant impact on ROCt test on other system. So let's decrease
granularityMB on small vram system for handling LargestVramBufferTest().
Change-Id: Iea7c29abfd382a20761b653730fd09a220ad2fd0
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Some VRAM access tests in MMBandWidth can be very slow on systems with
complicated PCIe topology. Skip tests that take a long time to avoid
excessively long running tests with little benefit.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I2950237347fc2f764f6aa3292ab819051472bf37
Total VRAM size on APU is 512M usually,
Framebuffer also is allocated from VRAM.
There is no enough memory for this case.
/home/ruiliji2/p5/libhsakmt/tests/kfdtest/src/KFDMemoryTest.cpp:1285: Failure
Value of: (hsaKmtMapMemoryToGPUNodes(bufs[i], bufSize, &altVa, mapFlags, 1, &defaultGPUNode))
[ FAILED ] KFDMemoryTest.MMBench (1034 ms)
Change-Id: Ib4201291122d85f6512a85859aea9a4713fb4f5c
(cherry picked from commit a9f924484e7022a2d53ee02811b080f0833eba55)
skip HDP flush test when remap feature is not supported.
Backgroud:
the HDP register remap is skipped in sriov mode,
it will cause mmio base is nullPtr.
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ib9aea1900931e30571656397a485ee4db051ec0a
Test if query userptr pointer info return correct alloc flags,
CoarseGrain by default.
Test if query hsaKmtAllocMemory pointer info return correct alloc
CoarseGrain flags.
Change-Id: If3a1175645717e5d7c475d6ff35b02d6876a1f7c
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Test Thunk multiple threads register and deregister same userptr race
condition, to emulate application register same userptr to multiple
GPUs using multiple threads.
Use thread barrier to sync the threads, to start register userptr at
same time.
Change-Id: I6723dc39f75908026fa14a490e39e1fe49a13a1b
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
If the signal arrives too late, it interrupts waitpid. Handle this
situation gracefully.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If4925c352c81ba7fef8a940460b91f5e720b451e
It is to address gfx90a HW memory model changes.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: Ie5c5c5ee5ddfb75c0b4f625baf59ce37b4cc7c31
Three kfd subtests are added to verify new XGMI connection with
cache coherence HW link on A+A.
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Change-Id: I6960ec91cbfb696c4e6acb3b79fd83107003acdd
In A+A all system memory is mapped as NC. So add a new function
gfx9_PollNCMemory which will support NC memory.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I097b95fb156f73d6f480cd4fd262cc6fa5933f69
Refer to commit "Mark buffers accessed by CP as UC"
A+A buffers are mapped as NC. CP (PM4Writes) need ReleaseMem function to
ensure the write go through to the memory
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4ee55a6e40fba078f5950d95c8fee7ee076260bf
Add updated SP3 static library with support for gfx90a and
also add initial corresponding changes in kfdtest.
Change-Id: I71bc6404ace7f9bf0dd74e712287136aa2b8a03d
s_store_* instruction set was retired from gfx10.3
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Change-Id: Ibe41a3fe7e053fb345b1af6ad4abc22a0885bc81
On gfx1012, allocating 1/4 of the system memory on a 32G RAM machine
could fail, resulting in this test to fail. Limit the maximum buffer
to allocate to be smaller than 3G to accommodate this situation.
Change-Id: I38b0a0f7da1f0b9ca851e04d2d0a51767858c801
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
The previous BigBufferStressTest has too much stuff and takes a long
time to run. By separating largest*BufferTest out into other
tests, we dramatically reduce the time to run BigBufferStressTest and
therefore make reproducing issues much easier.
Meanwhile, rename the test to BigSysBufferStressTest to express more
information.
Change-Id: I5911f113c0bd50627ee6d84bbb4f2972cbed8886
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
In order to accommodate the flaky queue submission under memory
shortage scenarios, BigBufferStressTest has become very much a hack,
undermining its purpose of testing the basic memory related operations.
Therefore, remove the queue submission part. The EvictTest should serve
the purpose of testing the memory and queue submission functionalities
when memory eviction happens.
Change-Id: I3c3603a0e834267eccb72f46efeabe1e053c8fc5
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
If NUMA system no available memory on node 0, mbind will fail on node
0, so set flag NoNUMABind=1 to bypass mbind for all test cases which use
node 0 and allocate system memory.
Change-Id: I7962938ad2bed5a293ca5e6a8500c7f7e15ff453
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
When large bar is not available, we can use
xgmi to do p2p tests.
Change-Id: Ib7b59fb8a4d41f605739a0428973f6b2f1a3450f
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
The memory need to be mapped for both local and remote GPU access
Change-Id: I4aeaffc0851b6107fc91e9eaa6150764b06f5ca9
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Allocate system memory from node id 0 will fail on NUMA system which has
no memory on node 0. Change to use new flag NoNUMABind to allocate
system memory from NUMA nodes which have free memory.
Change-Id: I8ef9ca28fc2ab5dd31d07a2d3eaf1d5886e798a0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
A number of tests are no longer broken on gfx802.
Change-Id: If70c77423f8f14de59490ab8ca156b0c4e7b5cf1
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reserve half of dma32 zone for non-NUMA system.
Change-Id: Id7aea7b6ff6cc1cc7983ecd95f8078b7f1be630c
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
We should use the SE number reflected by NumShaderBanks of the node
rather than hardcoding it.
Change-Id: I945fb001f81ce506249cf485a7ce25aee8219bc7
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Remove the CHIP name from the shader ISA and add wave_size(32) to make the same
shader can be used for both GFX9 and GFX10
Change-Id: I16ea72f87980c3d9c11298e20c06a0a073fe9a28
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Modified shaders in KFD memory test to support gfx10.
There is no gprs register for flat_scratch on gfx10.
Use s_setreg_b32 instruction to set flat scratch base
address register
Change-Id: I505156a046056b61ce2d873343feb50ce635274a
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Some SDMA packet format might be different among asic versions
Change-Id: Ic7eda7554c23e3972e168480874ca67a92677346
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
use family ID as parameter when construct the packets
Change-Id: I6c1706954ab7b8cbb8bef2aab16edf21f5e1abf0
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Currently the test only covers relatively small buffers sizes. It's
useful to test buffer sizes up to 1GB to see the impact of features
that target the efficiency of large buffer allocations and mappings.
Change-Id: I2e8d5afd482894dbe2166f32d38091199b9c15e6
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
1. Use s_mov_b32 to move 0xcafe to s18. s_movk_i32 is a sign extention move
instruction. Oxcafe will be extended to 0xffffcafe which is not desired
2. Add wait to s_load_dword instruction to make sure memory read finish before
the next store instruction.
Change-Id: I665d1d471019edfaba5693e07cdc567d4103573f
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
If TTM eviction and restore happens, it may takes very long time if
retry, the longest time is 5 minutes during my test. There is chance
packet is submited to queue while eviction, we have to increase the
Wait4PacketConsumption timeout.
The queue will continue to execute after eviction and restore. If we
upmap the memory from GPU while queue is evicted, this will cause VM
fault. Change to unmap memory after queue is destroyed.
Change-Id: I1b44e2274ea7b83398b2e3293578dad6947cb5af
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Because dma32 zone is on node 0, use all system memory on node 0 will
cause TTM eviction to free dma32 zone for other devices which only
work with 32bit physical address. The TTM eviction and restore may take
too long and cause queue timeout.
Running on other NUMA nodes, the NUMA default memory policy is
MPOL_PREFERRED, means TTM will get pages from local node first, and then
get remaining pages from other nodes. Check /proc/buddyinfo can confirm
this.
Reset NUMA bind to all after the test.
Change-Id: I39b373c07a2d5aa396f5c7602bffabab0481930f
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
On small bar multi-gpu system, hsaKmtMemoryMapToGPU will fail due to latest
kernel P2P sanity check. Swith to use hsaKmtMemoryMapToGPUNodes to fix
the failure
Change-Id: Id8b6329d1243df0e908cc9a171b5c7f9156f4a8b
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
For following test cases:
- KFDQMTest.QueueLatency
- KFDQMTest.BasicCuMaskingLinear
- KFDQMTest.BasicCuMaskingEven
- KFDMemoryTest.MMBandWidth
- KFDMemoryTest.MMapLarge
- KFDMemoryTest.MMBench
v2: xml element cannot start with a number, so change the key name of
MMBandWidth and MMBench accordingly
xml element cannot contain whitespaces, so trim whitespaces in "VRAM "
v3: introduce KFDLog-like way to use KFDRecord
Change-Id: Ifc3ed5657621252a7b39dccf1ef4f50a92593f77
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
As it will alloc as much as small system memory to reach the allocation limit.
We can try to alloc memory several times to see if any allocation in
the previous step cause memory leak.
Also we test if GPU can access these memory correctly or not.
Change-Id: I309f9821b6bc99c212a6bfbc21fe3086ab589fd3
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
It should have PASS/FAIL report for the vram allocated size.
Change-Id: I546c02c2ed02f1cfb5278e0dfd7b18ade39faafb
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
ASSERT failures result in immediate termination of the test. EXPECT
returns a failure but continues execution. Reserve ASSERT for required
functionality (node initialization, queue creation, etc) where the rest
of the test cannot run if that call fails. Use EXPECT everywhere else
Change-Id: I1c11326fc3ae22b50fa83b07b3b49af1e1f4e69e
The flag makes EXPECT_* to behave like ASSERT_*, which actually work against
our favor, so disable the flag.
Change-Id: I2ea1dfeaf916b396593a504d081148abdac0fc70
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
When skipping a test, the output should be:
Skipping test: <reason>.
This will allow for easier identification, automation and general readability
Change-Id: I98bda1c068f9dbc83aeea74f642b6101121f234d