Possibly because of moving to gart table for vram access from Kernel.
This test failure shouldn't be a blocker. Temporarily blacklist till a
solution is found.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I99725f368aced863188e30f619288ad4d033b9a6
Set the KFD_IOC_ALLOC_MEM_FLAGS_COHERENT flag and
KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED flag to allocate
uncached coherent memory when HSA_DISABLE_CACHE
environment variable is set. At KFD driver,
Single KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED flag is
not sufficient to allocate uncached memory. We
have to use both two flags to allocate uncached
memory.
Change-Id: Ie490f37b2e696314e60048f5b1b57442431696e9
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
This will create a deb and an rpm for rocrtst to make installing and
running it easier for non-ROCr devs.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I506baedc1471482e5808139cab5c28ae07ac8fb1
APU doesn't have non-KERNARG memory pool for cpu agent or
a global memory pool for gpu agent. Current setup check
fails as below. Change to a APU specific check method.
[==========] Running 45 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 1 test from rocrtst
[ RUN ] rocrtst.Test_Example
#### TEST NAME ####
Test Case Example
#### TEST DESCRIPTION ####
Put a description of the test case here. Line breaks will be taken care of
on output, not here.
#### TEST SETUP ####
The gpu device name is gfx902
Target HW Profile is HSA_PROFILE_FULL
Test can run on any profile. OK.
/home/jenkins/hsa/runtime/rocrtst/common/base_rocr_utils.cc:180: Failure
Value of: rocrtst::ProcessIterateError(err)
Actual: 4096
Expected: HSA_STATUS_SUCCESS
Which is: 0
HSA_STATUS_ERROR: A generic error has occurred.
/home/jenkins/hsa/runtime/rocrtst/suites/test_common/test_case_template.cc:195: Failure
Value of: HSA_STATUS_SUCCESS
Actual: 0
Expected: err
Which is: 4096
rocrtst64: /home/jenkins/hsa/runtime/rocrtst/common/base_rocr_utils.cc:416: hsa_kernel_dispatch_packet_t* rocrtst::WriteAQLToQueue(rocrtst::BaseRocR*, uint64_t*): Assertion `test->main_queue()' failed.
../shunit2: line 977: 1382 Aborted (core dumped) ./rocrtst$ROCRTST_BLD_BITS "$ROCRTST_ARGS" --gtest_output=xml:"$gtest_xml"
failed (failed to run rocrtst)
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I03691bd4171b6e622231baf3dce4db2211eb47e7
Three kfd subtests are added to verify new XGMI connection with
cache coherence HW link on A+A.
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Change-Id: I6960ec91cbfb696c4e6acb3b79fd83107003acdd
In A+A all system memory is mapped as NC. So add a new function
gfx9_PollNCMemory which will support NC memory.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I097b95fb156f73d6f480cd4fd262cc6fa5933f69
destBuf is mapped as cached, the intruction flat_atomic_add
operates on cache that cause test failed. Adding scc modifier
in the instruction will fix the issue.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: I8e138f93ae4f5e23020e3ac1549ef924968a74c5
The three kfdtests have been fixed, so remove them from
filter list.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: I101a72476970a9d105e8c0b5c022847757fdd316
Before gfx90a, coherent memory is uncached. So it was reasonable
when environment variable HSA_DISABLE_CACHE is set, memory is mapped
as coherent. On gfx90a, coherent memory can be cached, so mapping
memory as coherent can't guarantee memory is uncached. When
HSA_DISABLE_CACHE is set, we have to map memory as uncached.
Change-Id: Ia5ed4cf0ad6aef5644dc8c9e6632b52d606f06f4
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Refer to commit "Mark buffers accessed by CP as UC"
A+A buffers are mapped as NC. CP (PM4Writes) need ReleaseMem function to
ensure the write go through to the memory
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4ee55a6e40fba078f5950d95c8fee7ee076260bf
Refer to commit: " Mark buffers accessed by CP as UC"
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I1816e035dbb3178f28f5e34b050c20ecca282060
This change might be redundant if ROCr takes care of it
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I7b67143a8ad21baa61b7eda7b8e5fe0ac1e33830
This change is for the A+A bring-up branch as it needs to made more
generic to handle all ASICs.
For A+A all the system buffers are mapped as NC (non coherent) unless
explicitly marked as UC (uncached). The coherency is then expected to be
handled by shader by explicitly using acquire/release instructions.
However, CP doesn't have same feature. The buffers used by CP thus have
to UC. For now queue buffer and Signal handler memory is marked as UC.
This change shouldn't affect other ASICs since Uncached flag is not used
in those. However, this change still need to be made more generic.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I56c37a809913f7f08c94d01b0572d0f4864939aa
On gfx9, the maximum number of wavefronts per queue is the minimum of
40 waves per compute units, or 512 waves per shader engine. On gfx10,
there can only be 32 waves per compute units.
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Change-Id: I148d1a4fe6c07cdbfaa1f77939eb29311c81c008
Reserve some space in the context save area for the debugger's
use. There should be 32 bytes per wave for a given queue.
Change-Id: I65ddb6123d0f6afd3149844617ad19023009101d
Temporarily blacklist some tests on gfx90a until they are solved.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Change-Id: I87cc3a996ea7d55ed8f20f5b4eecfd8bb691effd
On every new asic with new stepping, we need to manually relax this
checking. This check is not very helpful. Delete it.
Change-Id: I11f813023ca2566d82f6d11121d4be38c296674b
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Due to cache coherence change, the remote vram mapping is changed
to cached, the written value by remote shader will not be read by
local shader. So the test will fail.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: I2b64e8a30bed0066e159bad9bb7febae5ebe84aa
XNACK API for GPUs that support this mode. This API
makes calls to amdgpu driver to configure xnack mode.
It supports set xnack mode and query the current mode used.
Change-Id: If865fd0e3f900f008243dc49504e1a0694e1791a
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Add function definitions to support SVM (shared virtual memory)
and xnack set.
Change-Id: Ia97ad9d0c449d8d500d799f702e1a58e87d65a56
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Add svm (shared virtual memory) range and xnack mode
APIs.
Change-Id: Ibd8d7fe566dc200730da0c892caa71aad7589ebd
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
strlen(src) should not be used as the length in strncpy. Use memcpy
since we know the length of the string, and ensure that we
NULL-terminate regardless of length
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I21cc6d106510c69464e7ac9d3fc7da3a1e6d1a68
The largebar check will exit exceptionally from test
when destination node is not set.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: I8bf0fed613250cc71468208e645fc562fb1a8757
Update build script and CMakeLists_sp3.txt file as SP3 directory
structure has changed.
The SP3 source code with gfx90a suport is merged into a
new branch mukjoshi/sp3_gfx90a.
Please make sure to checkout this branch before using the
build script to generate the static library.
Change-Id: I2bf0ade8b2d254cd7648cc8a6d69a83ee51344cd
Add updated SP3 static library with support for gfx90a and
also add initial corresponding changes in kfdtest.
Change-Id: I71bc6404ace7f9bf0dd74e712287136aa2b8a03d
Given the chance of local memory breakage is so high on emulators, we
should use this simple test to check the local memory function.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Change-Id: Ifc48c12e11d75cc777ed7ea13e03bf54c2458e12
PKG_CONFIG_PATH environment variable should be set to
<rocm_path>/lib/pkgconfig, because the *.pc file is located there.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Change-Id: Iec503b1c2409987e52fd88fea160c70762686a28
It is to provide an option to map specific memory as
uncached on A+A HW platform.
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Change-Id: Ib665cb306a0e78aba3ea5ee2f0e46cb62ae139f8
Test for HQD preemption during stalled context restore. Added for
regression testing against new microcode.
Change-Id: I13eb7d1c598062390e12cf8a5237e53b6489f232
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Replace the stop reasons ttmp11.trap_raised and ttmp11.excp_raised
with ttmp11.wave_stopped which indicates that the trap handler has
halted the wave as the result of an event (trap, single-step or
exception).
If the wave is stopped because of a trap, also record the trap_id in
ttmp11.saved_trap_id[7:0].
Save status.halt in ttmp11.saved_status_halt, so that it can be
restored when resuming a wave (changing a wave's state from stopped to
running or single-stepping).
Change-Id: I7322f59b60e8cc1b92bf5f067dba606a3109ef49
The purpose of this patch is to add a missing device ID for gfx1030.
The missing ID "0x73A1" is now added to the "topology.c" file.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I05a8a55e2c46f941a039fa72a6a5e76bf2a52736