Minor changes to instructions for GFX12.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Iac5be900e3755099d83010fb1a2066b4dbb52dda
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: bde8e7a212]
Updated ShaderStore shader (used by CWSR test) for GFX12.
Workgroup ID now pass in a different register.
Minor changes for new scope syntax.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I6fdabc8b62cba201d7777a736d3d43cfae28ca4c
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: e086c383fe]
New watchpoint exception status bits have to been assign to the first 4 least
significant bits so change test verification mask to check against the
first watch point ID accordingly.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: If83950207ea9f66cd230c23e7386a97b3893c2eb
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 3b842c39f1]
Fix traphandler for KFD debugger testing.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Ib8f5aac3d1b99e4463ac56b5f6d5dee2c367c447
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: a2e9226784]
Set max size needed for VGPR when doing a CWSR for GFX12 and GFX12.1.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Iddefc62f1ad419c6f5ab6a872048457a1dc24037
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 259a724e21]
Since PC Sampling not upstream yet, so use 1.16 for
contiguous VRAM allocation, and 1,17 for pc sampling.
Change-Id: Ib5d22e8f386ce7fe3f7111485b9632b61227e539
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 5786dbbb76]
Skip test when PC Sampling is not supported by ASIC.
Change-Id: I6f9be0bdaed66e51052723b6df6908079470cefb
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 1087dea925]
C Error returns are positive in user space and should check against errno
instead.
Fix declaration of return to type HSAKMT_STATUS.
KFD IOCTL should handle size return when querying capabilities so return
size to caller unconditionally.
Clean up error translations per function so that it's stylistically
clear.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Ic37390425f370c7ad88f9ed014444decf19383a3
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 206db80a56]
We need : to end each subtest, except for the last entry.
Change-Id: I9515d90703c9679e06a4acd124883540c1d5b832
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 371d078226]
This test may fail when run on non-upstream versions of KFD as this
feature will not be upstreamed.
Change-Id: I7131e1f50984739c0df12e4c9afe790bd7e4cdfa
[ROCm/ROCR-Runtime commit: d2d95a8948]
It seesm the Release() function is not reliable and can cause segfaults.
This is a temporary work-around until the Release() function is fixed.
Change-Id: I95470a800c6153673e4b8f4fe46a646903325074
[ROCm/ROCR-Runtime commit: ac5fb8be9e]
If pthread_attr_setaffinity_np function exists use it instead of
pthread_setaffinity_np as pthread_setaffinity_np seems to fail to set
the affinity settings on some systems.
Change-Id: Icd8b17039699ac10d9cd5c4dbb6ac44630673949
[ROCm/ROCR-Runtime commit: 57b93e02a4]
Bumping HSA_AMD_INTERFACE_VERSION_MINOR version to 5 to account for
previously added GPU agent query: HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES
Change-Id: Ic8cfdcfb7bad6f3d1e0b3d68f505a62074fc26b9
[ROCm/ROCR-Runtime commit: b6829f7a72]
Support contiguous physical memory allocation flag. Allocations with
this flag will have contiguous physical memory. This is dependent on KFD
support for this flag and the AllocateKfdMemory(..) function call will
fail when it is not supported.
Change-Id: I6c51c8b061f7b026fdcc2aa2c37c74ecc13d95b6
[ROCm/ROCR-Runtime commit: 9af225e1b1]
On systems with more than 1 TB of memory per NUMA region, this triggers
unnecessary errors.
Change-Id: I1bc7f209b9c1739b516c9f6b0acf434488ac7b8d
[ROCm/ROCR-Runtime commit: e539c8dce2]
They should start at H1
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: Id11a2599c4609255a1a9916f70b58adc41cdddb4
[ROCm/ROCR-Runtime commit: f94c1794bb]
Add HsaMemFlags Contiguous bit for hsaKmtAllocMemory to allocate
contiguous VRAM, to support RDMA device with limited scatter-gather
ability.
Check KFD ioctl minor version >= 17.
Change-Id: I0db00dad125b2b7be523f343082641f59b850423
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 97497c7efc]
HsaMemFlag new flags added and the number of the reserved bits is
reduced, and generate value overflow compilanation error.
The reserved bits is not used, remove the init.
Change-Id: I603596977dfd558ce31ead03711d7c5ce5ee5b71
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: e2d742ac6f]
Fix lazy pointer initialization for dedicated PC Sampling queue.
Previous implementation would always create a queue on GPU agent
creation instead of creating the queue on first use.
Change-Id: Icf300f2b162e59143ba61ba182d9bee6e1308fc1
[ROCm/ROCR-Runtime commit: f2751b7030]
Fix Musl libc NULL errors and unsupported pthread funcs for compatibility.
Also ensures cleanup and error handling irrespective of CPU affinity override.
Fix submitted by github dev - AngryLoki
https://github.com/ROCm/ROCR-Runtime/issues/181
Change-Id: Ia487315e504112be5d3370756f23f6e23b9ae4be
[ROCm/ROCR-Runtime commit: bc9cac97fe]
New hsa_amd_queue_get_info API to support:
- HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue
- HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue
completion signal.
Change-Id: I98842131bcbdd08552649791a5d43e578a615808
[ROCm/ROCR-Runtime commit: d6d5786051]
When doing a coredump, we try to park the wave and save its PC in
ttmp7/ttmp11, but these registers will be overwritten by PC Sampling
requests.
Change-Id: I60fb734eb3bed4ee3cc8d8bba9ec4a527fff9671
[ROCm/ROCR-Runtime commit: 3443fdf665]
Flush is used by the client to retrieve data that are currently stored
in the buffers. This is used by the client to retrieve current data when
the buffers are not full.
Change-Id: Ib8304dcdfb2797cb060ec72df4970d95cf6be348
[ROCm/ROCR-Runtime commit: 8abbf9475b]
Each time there is enough data to fill the client session buffer,
callback the client data ready function to transfer the buffer contents
to the client.
Change-Id: Id79775426fa6d22e00dc2ef6f55c439eacb9b2af
[ROCm/ROCR-Runtime commit: 5177d17f5d]
Retrieve data from the buffers previously set in the 2nd level trap
handler TMA. We use a double buffering mechanism to allow the 2nd level
trap handler to write to one buffer while we are copying data from the
other.
Co-authored by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Co-authored by: James Zhu <James.Zhu@amd.com>
Change-Id: I252c381ea06b8cf927c4f9af6ea59dedc3717fbb
[ROCm/ROCR-Runtime commit: 855e454671]
Allocate required device and host buffers to be able to interact with
the 2nd level trap handler.
Change-Id: If99de5aacf956ca57ecafc7b04b797be9c9decaa
[ROCm/ROCR-Runtime commit: 8d666dea01]
Code is valid for gfx9 GPUs excluding gfx94x.
1st level trap handler will use TTMP13[22] to indicate host trap and
TTMP13[21] to indicate stochastic trap.
For each PC sampling method (hosttrap and stochastic), we use a double
buffering mechanism to transfer data between GPU and host.
The GPU will dump data into one buffer while CPU may be reading data
from the other buffer. There are 2 separate signals, one for each
buffer.
When signal != 0, the buffer belongs to the GPU and the GPU can write
to it. Once the buffer has reached the high watermark, the GPU will
set the signal to 0 to wake up the host and so that the host can try
to switch the buffers and read the data.
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: If3eb0913e52fb4788059a71e5feca334612f3d5d
[ROCm/ROCR-Runtime commit: 431a70471e]
Create dedicated CP queue with highest priority for PC Sampling. Reduce
the highest priority that LRT's can set for existing API so that PC
Sampling queue will always have highest priority over any other CP
queues
Change-Id: Ia70d74415edc83b4862a3e18dbdbd7cebe73ab47
[ROCm/ROCR-Runtime commit: a83f872a23]
Create PC Sampling APIs for start and stop functions. And create stub
for flush function.
Change-Id: I7a093b29dc87e34ac06faaae6cac2be50e4663e1
[ROCm/ROCR-Runtime commit: a842247482]
Implement PC Sampling session create and destroy APIs.
Change-Id: I93370d3d01b74ee15e71b8b0e20feb8f0066a3dc
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Vladimir Indic <Vladimir.Indic@amd.com>
Change-Id: Ib0c64356a1a4616b12d5dbeebe16273fe2a84abe
[ROCm/ROCR-Runtime commit: 632f9e60f7]
Add new PC Sampling API to list the supported PC Sampling methods and
options on a specific agent. If there is already a PC Sampling session
active on this agent, the list of methods returned will be reduced to
methods that can be run simultaneously with the current active session.
Change-Id: I42ac2b8f30d5c368faf8ed4cf37ca4134db22985
[ROCm/ROCR-Runtime commit: 295acf6b27]
Create allocator helper function to provide fine-grained memory on
a specific agent.
Change-Id: I32ba9aceb9c9dc708b140a0c45158e6e7a018844
[ROCm/ROCR-Runtime commit: 71f1a6726c]
ExecutePM4() function can optionally accept extra arguments for
acquire fence scope, release fence scope andcompletion signal. When
a completion signal is provided, ExecutePM4() does not wait for the
commands to complete.
Change-Id: Ib2a433b7bce1cb6260be8b76fe902335bd5dfada
[ROCm/ROCR-Runtime commit: 721e56ef5c]