提交線圖

2959 次程式碼提交

作者 SHA1 備註 日期
shaoyunl 6cad92de6f Added family ID for gfx1010
Change-Id: I1b9a2b5270e70d12f066906f4e6cfea2cbfc2110
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
2019-07-09 11:38:57 -04:00
Oak Zeng 3b014adccc Device HDP flush test
Change-Id: I1c19e44caeee4a6e59200dceb718896fcff9bf82
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-07-07 21:59:37 -04:00
Chris Freehill d699039284 Make build_rocrtst.sh build all target kernels by default
This will allow the default target list to be branch
specific.

Change-Id: If8ecc14e2b7fb5ed2eb25ab447480308d539b248
2019-07-05 19:30:07 -04:00
shaoyunl 664c6617ad Added SP3 assembler support for gfx10
Change-Id: I31c1df0f6d5243089e2ec3db381a19362be18d6c
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
2019-07-05 10:40:54 -04:00
Yong Zhao c27704ded9 kfdtest: Add core test category
This will faciliate ASIC bringup, including under simulation environment.

Change-Id: Ie027a77a2498cba739fea51f404d9843ce8dbeae
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-07-02 22:28:23 -04:00
Jay Cornwall ff8f439112 Handle traps, illegal instruction, memory violations through queue signal
Report traps and fatal exceptions through a wavefront's
amd_queue_t.queue_inactive_signal. Previously, only traps were
reported and requireed the compiler to pass in the signal pointer
in s[0:1].

The signal is obtained through a mapping from doorbell index to
amd_queue_t*. The doorbell is fetched within a wavefront through
the gfx9+ S_SENDMSG(MSG_GET_DOORBELL) instruction.

Change-Id: I319b45f2e15dfcfe4db8f4065da1136e9539a42b
2019-07-01 22:59:41 -04:00
Jay Cornwall 6ed686ee29 Replace gfx9 SP3 trap handler with LLVM, fix IB_STS restore
Assembler toolchains are moving from SP3 to LLVM. Replace trap handler
source code with LLVM equivalent.

Fix a trap issue with SQ_WAVE_IB_STS restore. Mostly harmless as all
traps are currently considered fatal to the wavefront.

Change-Id: Iacecd9dd31a1d96a083c8b8327f442f33c861f9f
2019-07-01 22:59:27 -04:00
Chris Freehill 8caa6c0b01 Temporarily disable Debug test
Change-Id: Iabb238fcd78b9c2eb0c085b19ab93b8c9e538140
2019-06-29 04:55:35 -04:00
Yong Zhao b507911ccd kfdtest: Use SDMA engine information directly from the node
Change-Id: Icd391c8e821fb0ff5a1094f21b880a97e6d417a3
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-06-28 00:47:15 -04:00
Kent Russell be6ff2cdff Remove failing tests due to gfx1010 kernel merge
BasicAddressWatch causes issues where KFDEvictTest and
KFDQMTest.OverSubscribeCpQueues fails, and results in a GPU hang/reset.
PM4EventInterrupt just hangs indefinitely. Remove them for now to allow
the kernel merges to resume, and figure out what happened in the nv10
merge to cause it

Change-Id: I418f9561ecb3e71bc52ac48ea363fcbde82a8e2b
2019-06-27 10:19:46 -04:00
Sean Keely 299874f17d Initial support for deallocation callbacks.
Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback
to notify when HSA memory has been released.

Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d
2019-06-26 04:12:17 -05:00
Chris Freehill 081a2cc875 rocrtst fixes for hsa_signal cleanup and aql packet dispatch
In several places aql packets were written to queue all at once
instead of doing the header atomically. These cases have been
fixed.

There were a few hsa_signal leaked that have been addressed.

There was some duplication of code that has been addressed.

Addresses ROCMOPS-456

Change-Id: Ia1869bc370f92e49ac560301df47741d5f76978e
2019-06-21 17:34:10 -05:00
Felix Kuehling 62ee7b4112 Restore SDMA blacklist
The SDMA blacklist should contain all tests that use SDMA. It will
be applied to all ASICs that are know to have SDMA stability issues.

Change-Id: I53e723382c12f99bddf9c535000e27737a7ea1f6
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-06-21 16:08:22 -04:00
Oak Zeng be9ac578ef Re-enable HostHdpFlush test
The bus error bug was fixed from kfd driver and Thunk

Change-Id: Id02617fdc26f1c49307f90a0a939e05f22d739e7
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-06-21 11:52:07 -04:00
Oak Zeng 5d163cd821 Fix HostHdpFlush shader
1. Use s_mov_b32 to move 0xcafe to s18. s_movk_i32 is a sign extention move
instruction. Oxcafe will be extended to 0xffffcafe which is not desired
2. Add wait to s_load_dword instruction to make sure memory read finish before
the next store instruction.

Change-Id: I665d1d471019edfaba5693e07cdc567d4103573f
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-06-21 11:51:51 -04:00
Evgeny 6c0aaa2773 aqlprofile api fix
Change-Id: I2a710040422c7853ece5472ea776442b25d69dcb
2019-06-19 23:14:27 -04:00
Sean Keely bb980462e7 Fix IPC related hangs/faults in rocrtst.
IPC was failing due to calling fork when HSA was open.  The fix
was correcting incomplete cleanup in several other tests.

TestBase::Close (via CommonCleanUp) now checks that HSA is properly
closed between tests.

rocrtstPerf.Memory_Async_Copy uses hwloc which uses OpenCL which
has no shutdown routine.  Consequently this test can not cleanup
properly.  I added a hack to force HSA refcount to the value
it should have if OpenCL were cleaning up but this leaks resources
and potentially puts hwloc & OpenCL in a bad state.

OpenCL loads LLVM which installs some exit handlers.  Those handlers
can't execute in a child process and can't be removed since OpenCL
doesn't cleanup.  IPC hacks around this by aborting rather than exiting
in the child process.

Change-Id: I92326a73d7b11632208717d99728e6dafdc7d3ca
2019-06-19 01:03:52 -04:00
Philip Yang 4066dcd542 kfdtest: increase BigBufStressTest timeout and avoid VM fault
If TTM eviction and restore happens, it may takes very long time if
retry, the longest time is 5 minutes during my test. There is chance
packet is submited to queue while eviction, we have to increase the
Wait4PacketConsumption timeout.

The queue will continue to execute after eviction and restore. If we
upmap the memory from GPU while queue is evicted, this will cause VM
fault. Change to unmap memory after queue is destroyed.



Change-Id: I1b44e2274ea7b83398b2e3293578dad6947cb5af
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2019-06-18 09:28:43 -04:00
Philip Yang 36776e9917 kfdtest: avoid BigBufStressTest run on NUMA node 0
Because dma32 zone is on node 0, use all system memory on node 0 will
cause TTM eviction to free dma32 zone for other devices which only
work with 32bit physical address. The TTM eviction and restore may take
too long and cause queue timeout.

Running on other NUMA nodes, the NUMA default memory policy is
MPOL_PREFERRED, means TTM will get pages from local node first, and then
get remaining pages from other nodes. Check /proc/buddyinfo can confirm
this.

Reset NUMA bind to all after the test.



Change-Id: I39b373c07a2d5aa396f5c7602bffabab0481930f
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2019-06-18 09:28:20 -04:00
Sean Keely 0c0e634458 PTHREAD_STACK_MIN may differ from system parameters.
Restrict stack adjustment to non-default stack requests and allow
stack growth within reason (20MB cutoff).

Change-Id: I320280c711402ac29683e94c7246b7c32c797611
2019-06-17 21:04:17 -05:00
Sean Keely 4b22d24346 Revert to SystemClockCounter for HSA system time.
CPUClockCounter is not NTP adjusted (CLOCK_MONOTONIC_RAW) so should be 
better for measurements.  However, it is implemented with syscall while
CLOCK_MONOTONIC is implemented via vDSO.  The latency increase becomes
significant when language layers make corresponding clock measurements.
Reverting to CLOCK_MONOTONIC will reduce latency and allow small
duration events to be measured at the cost of incorporating NTP
frequency skew errors.  NTP may adjust frequency by 500ppm so limits us
to ~3 decimals in elapsed time.

Change-Id: I920b9f707f47109d80d6c256c475638c03fb8d76
2019-06-17 21:07:26 -04:00
Cole Nelson 3f2d2e67c9 kfdtest: Blacklist multiple tests on gfx900/20
PSDB and other jenkins jobs are currently failing on several kfd tests.
This is blocking user throughput for screening patches by PSDB.
Blacklist multiple tests and submit JIRA's.

KFDIPCTest.BasicTest (ROCMOPS-459) .CMABasicTest (ROCMOPS-460) .CrossMemoryAttachTest (ROCMOPS-461)
KFDMemoryTest.BigBufferStressTest (ROCMOPS-462)
KFDQMTest.MultipleSdmaQueues (ROCMOPS-463) (ROCMOPS-416)
KFDEvictTest.BurstyTest (ROCMOPS-464)

Change-Id: I2c7cdeabc26654f39823201ce86d4113b3a98a0e
Signed-off-by: Cole Nelson <cole.nelson@amd.com>
2019-06-16 19:24:22 -04:00
Chris Freehill 259a1bac18 Temporarily disable some failing tests
Change-Id: Iee713bb963db812c36ce2568aee2a4f8409c52e5
2019-06-14 08:36:11 -05:00
Ori Messinger fe4db33875 Remove passing blacklisted kfd tests
This relates to the following commits:

1. commit aa7c13264a
2. commit 54807526b9
3. commit 6df62c78b8

Change-Id: I3d0d3214baba403b4709b358132b6756a15f42d7
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2019-06-12 06:14:46 -04:00
Sean Keely bbb90bdfc9 Fix description of HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL.
Description was inconsistent with itself and code.  Existing behavior
returns HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL == true for system
memory pools only and system memory pools do require hsa_amd_agents_allow_access.

Change-Id: I64b287bff9fdb21688aa169296e410edf1b209b5
2019-06-11 01:45:22 -04:00
Oak Zeng 888e1a7ae7 Use kfd fd to mmap mmio
Change-Id: Iadd2e1ea46d0951aaa5a6cefbc7d42d1b2c1f653
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-06-10 21:07:45 -05:00
Oak Zeng 65d554f5e4 Thunk API to allocate queue GWS
Change-Id: I6c5b109e2567cb71aed9245923cfcbeee6295ab2
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-06-10 21:07:45 -05:00
Oak Zeng 45d717d860 Add node property to report number of GWS
Change-Id: I81263ca7ebfa3c0f9f1be78acfa0920e47d551b1
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-06-10 21:07:45 -05:00
Felix Kuehling 396a85e97b kfdtest: Allocate PM4 queue and dispatch earlier KFDEvictTest.QueueTest
Allocating these before the big memory allocations minimizes the chances
of spurious out of memory errors.

Change-Id: I94aff9ec7ea34d4dc98ae08ac4cf9dc335b3df7f
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-06-07 16:54:28 -04:00
Felix Kuehling f474cf21cd kfdtest: Reduce libdrm VRAM usage in eviction tests
This reduces thrashing due to graphics submissions only and
significantly speeds up the BasicTest when keeping idle compute
processes evicted. In the BasicTest  compute is always idle, so
only one compute eviction and no restore is triggered. Then
graphics submissions complete quickly without thrashing each other.

Change-Id: Iae6da98903b20424a5097f235e1d09cf13e4b41b
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-06-07 16:54:28 -04:00
Felix Kuehling 6984f3e3b4 kfdtest: Add KFDEvictionTest.BurstyTest
Change-Id: I748603b0b204ffc3ea33399ecbc022233a7447d3
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-06-07 16:54:28 -04:00
Felix Kuehling 6f5379d315 kfdtest: Pass timeout parameter to BaseQueue::Wait4PacketConsumption
Change-Id: I0e88db5ca8e6712e9efc419a10eb4c49cedb6f62
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-06-07 16:54:28 -04:00
Evgeny a06d96cef8 aqlprofile API: sdma blocks
Change-Id: I619af8adc17706f808644180cdd5a5c785e052ec
2019-06-05 18:54:08 -05:00
Felix Kuehling f5a094bc96 libhsakmt: Update kfd_ioctl.h
Change-Id: Ibf165023b98787fdf295f50324e19aa062f2421d
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-06-03 19:15:49 -04:00
Evgeny 1be9298f72 adding new trace API
Change-Id: I6c83b5789f5a6cdbb574d041c40d5a47229c7f1a
2019-06-01 14:33:59 -04:00
Eric Huang 47d1c17592 kfdtest: fix error injection failure in RAS test
1. umc error injection only accepts parameter "0 0".
2. flush output to file in order to make writing happen
   immediately.

Change-Id: I8d3bde287caee6b90b6eec56c760f5a228be7595
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
2019-05-30 16:38:15 -04:00
Eric Huang d278b2579e kfdtest: fix debugfs path bug in RAS test
The path was wrong based on assumption that GPU dri render
node starts from 0, because if there is a VGA device on
board, node 0 will be VGA and node 1 will be GPU. So the fix
will look at the name of GPU minor node and find the correct
primary node on which RAS debugfs entry exists.

Change-Id: Icc5e63ce48698d5d29105c0417e3bec8afa0a7c8
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
2019-05-29 11:14:22 -04:00
Matt Arsenault 0016c6ce5b Don't check VERSION_BUILD is defined
Check if it is true or not. The string() call would define this to an
empty string, which would pass. This would then leave a trailing -
in the version string, which dpkg would error on during package
installation.

Change-Id: Ifb5fc15f5dde506e96bff7881a5d3f22d983406e
2019-05-29 11:09:31 -04:00
Sean Keely 22de0e7fb9 Allow hsa_status_string when HSA is closed.
API is a stateless lookup of RO data and needed to interpret
hsa_init error codes.

Change-Id: If80cba2f697843d08e529da0f790acf3c37127a7
2019-05-24 22:40:03 -04:00
Sean Keely 9f81bdfbe1 Add exception and error safety for CreateThread.
Change-Id: I82aaf64e039ca9614b4948deec1f87147f56279a
2019-05-24 22:39:55 -04:00
Matt Arsenault 22d29b55a4 Change include flag order
Search the local src directories first. If using a system
installed hsakmt, this would pick the installed hsa headers.

Change-Id: I9746d6e9db1749a130e4d93e024556754a537083
2019-05-22 16:43:18 -07:00
Felix Kuehling 64b90261d9 libhsakmt: Enable invisible debug VRAM mappings by default
Remove the HSA_DEBUG environment variable that controlled the
creation of these mappings.

This should allow the debugger to attach to a running process and
access VRAM buffers through ptrace without having to do anything
special.

On processes that create many small VRAM mappings, this may cause
regressions due to the per-process mmap limit. However, the
sub-allocator in ROCr should consolidate most small allocations
into 2MB blocks nowadays, for good TLB efficiency. So this is
unlikely to cause problems.

Change-Id: I929da1be0f6cb51ec00a02f3f241d16083e4d95f
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-05-17 18:28:14 -04:00
Sean Keely a913549190 Correct pthread join/detach handling.
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.

Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65
2019-05-16 12:14:06 -05:00
Philip Cox 608bc7c3a0 Fix type mismatch passed to queue suspend/resume
The queue IDs passed over to the kernel via kfd_ioctl_dbg_trap_args->ptr
should be a list of uint32_t's.  Need to convert from the passed in
64 bit HSA_QUEUEID to 32 bit uint32_t's.

Change-Id: I8718566d9f9ffc90ce0b2ecc129b10c49d73186a
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
2019-05-15 07:33:47 -04:00
Sean Keely 6e2a056e1b Correlate errors for time stamps which predate process start.
Small times may be given to time conversion if GPU clocks are used to
accumulate elapsed time.  Because HSA APIs deal in absolute time this
leads to large conversion offsets of order system uptime.  Variation
in relative clock ratio estimation may be amplified in this case,
destroying elapsed time measurements.

This patch fixes the relative clock ratio used for times which predate
the call to hsa_init.  This correlates errors in such times allowing
the elapsed time to be correctly computed.

The effective maximum system uptime before elapsed time conversion becomes
inaccurate is ~3.5 months.  GPU event timestamps are good for process uptime
of ~3.5 months.  These are limited by double's mantissa precision.

Change-Id: I48752ff354920439d91016d6f2b0c8ddfa60b445
2019-05-14 17:35:06 -04:00
Kent Russell 54e042eee1 Add missing gfx803 ID
Change-Id: I9eca81f0f149ea924c3b81bd80680d7fd1ad7a6c
2019-05-13 09:03:06 -04:00
Sean Keely 06376e726b Expose HDP flush registers.
Exposed via agent info query.  Only valid if fine grain PCIe memory is enabled.

Change-Id: Ib4770901592ec047276458926a947737f9b93bb5
2019-05-11 00:04:47 -04:00
Oak Zeng 78e4ef17c2 Temporarily disable HostHdpFlush test
Change-Id: I070cb3523a33b4efbfa7041fa2623059e1ff37bb
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-05-10 09:34:40 -04:00
Felix Kuehling 8f10c9375d libhsakmt: Disable -Werror by default
This can cause build failures on unknown of future compiler versions.
Only enable it if explicitly enabled by an environment variable. This
allows us to continue building with -Werror in internal builds with
known compiler versions.

Change-Id: Ic1cd9d223218cc4e4cddba49df93bb357c1cbd40
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-05-07 16:06:51 -04:00
Philip Cox b0d23aee16 fix suspend/resume logic in debug_trap code
There was a mistake and RESUME was used when it should
have been suspend in two places in the suspend resume
code.  This fixes that error.


Change-Id: I69be733d7ae7c14ce5ee8af57a307976e4212d62
2019-05-07 06:56:00 -04:00