Commit Graph

2445 Commits

Author SHA1 Message Date
David Belanger 09744e4959 kfdtest: Added gfx1200 filter.
Initial template for GFX12.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I552374bfcc0dd6272d170df85d36d0dbca0196d5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
James Zhu 5786dbbb76 libhsakmt: update KFD ioctl minor version
Since PC Sampling not upstream yet, so use 1.16 for
contiguous VRAM allocation, and 1,17 for pc sampling.

Change-Id: Ib5d22e8f386ce7fe3f7111485b9632b61227e539
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
James Zhu 1087dea925 kfdtest: skip test when PC Sampling is not supported by ASIC
Skip test when PC Sampling is not supported by ASIC.

Change-Id: I6f9be0bdaed66e51052723b6df6908079470cefb
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Jonathan Kim 206db80a56 libhsakmt: fix pc sampling return of functions
C Error returns are positive in user space and should check against errno
instead.
Fix declaration of return to type HSAKMT_STATUS.
KFD IOCTL should handle size return when querying capabilities so return
size to caller unconditionally.
Clean up error translations per function so that it's stylistically
clear.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Ic37390425f370c7ad88f9ed014444decf19383a3
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Kent Russell 371d078226 kfdtest.exclude: Fix blacklist
We need : to end each subtest, except for the last entry.

Change-Id: I9515d90703c9679e06a4acd124883540c1d5b832
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Chris Freehill 79e4eda0b6 Merge 'thunk/integrate-into-rocr' into integrate-libhsakmt 2024-05-02 21:52:49 -05:00
David Yat Sin d2d95a8948 rocrtst: add test for contiguous mem allocations
This test may fail when run on non-upstream versions of KFD as this
feature will not be upstreamed.

Change-Id: I7131e1f50984739c0df12e4c9afe790bd7e4cdfa
2024-04-30 17:42:15 -04:00
David Yat Sin ac5fb8be9e Temporary: Do not early release mutex when not ganging
It seesm the Release() function is not reliable and can cause segfaults.
This is a temporary work-around until the Release() function is fixed.

Change-Id: I95470a800c6153673e4b8f4fe46a646903325074
2024-04-30 17:07:39 -04:00
Chris Freehill 11fd5c2562 Prepare for integration into rocr
Change-Id: I6102b9910dbb9d09e09bb262a03c5c0ad4ce66f4
2024-04-30 09:01:09 -05:00
David Yat Sin 57b93e02a4 Use pthread_attr_setaffinity_np when available
If pthread_attr_setaffinity_np function exists use it instead of
pthread_setaffinity_np as pthread_setaffinity_np seems to fail to set
the affinity settings on some systems.

Change-Id: Icd8b17039699ac10d9cd5c4dbb6ac44630673949
2024-04-29 15:02:54 +00:00
David Yat Sin b6829f7a72 Bump HSA_AMD_INTERFACE_VERSION_MINOR
Bumping HSA_AMD_INTERFACE_VERSION_MINOR version to 5 to account for
previously added GPU agent query: HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES

Change-Id: Ic8cfdcfb7bad6f3d1e0b3d68f505a62074fc26b9
2024-04-29 12:55:18 +00:00
Kent Russell 5e1f24f305 .github: Add CODEOWNERS file
Change-Id: Ia763b91177f1ae09d16e5968bed17b0dba62cbe5
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-04-26 09:21:39 -04:00
amd-jmacaran 587e4287f4 Change token name to match IT-created token
Change-Id: Ic9189c012024c59cf5bad9daf25f6c2575a100fd
2024-04-25 12:23:28 -04:00
David Yat Sin 3d999a1adf Perform HDP flush for SDMA copies gfx10/gfx11
Perform HDP flush on gfx10/gfx11 PCIe devices.

Exclude gfx101x devices

Change-Id: Ief76c34634b09b0a7942cb71519d4082ca8b4fad
2024-04-24 18:07:34 -04:00
David Yat Sin 9af225e1b1 Add support for contiguous memory allocations
Support contiguous physical memory allocation flag. Allocations with
this flag will have contiguous physical memory. This is dependent on KFD
support for this flag and the AllocateKfdMemory(..) function call will
fail when it is not supported.

Change-Id: I6c51c8b061f7b026fdcc2aa2c37c74ecc13d95b6
2024-04-24 14:02:07 -04:00
David Yat Sin e539c8dce2 Remove assert for physical vs virtual memory size
On systems with more than 1 TB of memory per NUMA region, this triggers
unnecessary errors.

Change-Id: I1bc7f209b9c1739b516c9f6b0acf434488ac7b8d
2024-04-24 08:43:23 -04:00
amd-jmacaran 8a893ea0b8 Add support for external CI builds using Azure Pipelines
Change-Id: I8f4de331f00317a959b86f7e5b7a1025ba03564b
2024-04-23 21:10:49 -04:00
David Galiffi a8bd453243 Fixed MD linting issue regarding code blocks.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: Id0467b332bf033642a2d403090ffe598e41689f5
2024-04-23 16:48:15 -04:00
David Galiffi 975c5dd24a Fixed broken link to ROCm documentation
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: I9a6bc0ad08a060d83fdc3a0589dfc81c68ce2b0e
2024-04-23 16:47:50 -04:00
David Galiffi f94c1794bb Fixed MD linting issue regarding headers
They should start at H1

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: Id11a2599c4609255a1a9916f70b58adc41cdddb4
2024-04-23 16:47:19 -04:00
David Galiffi c9103a00ef Update GitHub links to point to the new organization.
ie., RadeonOpenCompute --> ROCm

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: I6724cbcbbb525f767af297e3986cd61fa69cd49f
2024-04-23 16:46:42 -04:00
Philip Yang 97497c7efc libhsakmt: Support contiguous VRAM allocation flag
Add HsaMemFlags Contiguous bit for hsaKmtAllocMemory to allocate
contiguous VRAM, to support RDMA device with limited scatter-gather
ability.

Check KFD ioctl minor version >= 17.

Change-Id: I0db00dad125b2b7be523f343082641f59b850423
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2024-04-23 14:27:12 -04:00
Your Name e2d742ac6f libhsakmt: Remove HsaMemFlag reserved bit init
HsaMemFlag new flags added and the number of the reserved bits is
reduced, and generate value overflow compilanation error.

The reserved bits is not used, remove the init.

Change-Id: I603596977dfd558ce31ead03711d7c5ce5ee5b71
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2024-04-23 14:27:12 -04:00
David Yat Sin f2751b7030 Fix queue creation for PC Sampling
Fix lazy pointer initialization for dedicated PC Sampling queue.
Previous implementation would always create a queue on GPU agent
creation instead of creating the queue on first use.

Change-Id: Icf300f2b162e59143ba61ba182d9bee6e1308fc1
2024-04-22 19:00:48 +00:00
Shweta.Khatri bc9cac97fe Fixing compilation errors related to MUSL libc
Fix Musl libc NULL errors and unsupported pthread funcs for compatibility.
Also ensures cleanup and error handling irrespective of CPU affinity override.

Fix submitted by github dev - AngryLoki
https://github.com/ROCm/ROCR-Runtime/issues/181

Change-Id: Ia487315e504112be5d3370756f23f6e23b9ae4be
2024-04-17 07:14:15 -04:00
David Yat Sin d6d5786051 Adding queue information queries
New hsa_amd_queue_get_info API to support:

- HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue

- HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue
completion signal.

Change-Id: I98842131bcbdd08552649791a5d43e578a615808
2024-04-11 12:53:48 -04:00
David Yat Sin 3443fdf665 PC Sampling: Disable coredump when sessions active
When doing a coredump, we try to park the wave and save its PC in
ttmp7/ttmp11, but these registers will be overwritten by PC Sampling
requests.

Change-Id: I60fb734eb3bed4ee3cc8d8bba9ec4a527fff9671
2024-04-11 12:53:43 -04:00
David Yat Sin 49e56ce782 PC Sampling: Convert timestamps to system time
Convert timestamps inside samples to system time

Change-Id: I5fad9a6887fa27c0ded9aa9b5f251cba2868f88f
2024-04-11 12:53:37 -04:00
David Yat Sin 547c9cb143 PC Sampling: Implement lost sample count
Change-Id: Idfdfbac71c1813dd7a97c301619cf8ce83713c53
2024-04-11 12:53:31 -04:00
David Yat Sin 8abbf9475b PC Sampling: Implement flush
Flush is used by the client to retrieve data that are currently stored
in the buffers. This is used by the client to retrieve current data when
the buffers are not full.

Change-Id: Ib8304dcdfb2797cb060ec72df4970d95cf6be348
2024-04-11 12:53:24 -04:00
David Yat Sin 5177d17f5d PC Sampling: Push data to PC Sampling client
Each time there is enough data to fill the client session buffer,
callback the client data ready function to transfer the buffer contents
to the client.

Change-Id: Id79775426fa6d22e00dc2ef6f55c439eacb9b2af
2024-04-11 12:53:17 -04:00
David Yat Sin 855e454671 PC Sampling: Retrieve data from trap handler
Retrieve data from the buffers previously set in the 2nd level trap
handler TMA. We use a double buffering mechanism to allow the 2nd level
trap handler to write to one buffer while we are copying data from the
other.

Co-authored by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Co-authored by: James Zhu <James.Zhu@amd.com>

Change-Id: I252c381ea06b8cf927c4f9af6ea59dedc3717fbb
2024-04-11 12:53:12 -04:00
David Yat Sin efdb72fd71 PC Sampling: Update 2nd level trap handler
Update 2nd level trap handler when PC Sampling is enabled

Change-Id: I95bf2bca8057d2f8313923c7f012f033e12ccc3a
2024-04-11 12:53:06 -04:00
David Yat Sin 8d666dea01 PC Sampling: Allocate resources to retrieve data from trap handler
Allocate required device and host buffers to be able to interact with
the 2nd level trap handler.

Change-Id: If99de5aacf956ca57ecafc7b04b797be9c9decaa
2024-04-11 12:53:00 -04:00
Joseph Greathouse 431a70471e PC Sampling: Add gfx9 2nd trap handler for PC Sampling
Code is valid for gfx9 GPUs excluding gfx94x.

1st level trap handler will use TTMP13[22] to indicate host trap and
TTMP13[21] to indicate stochastic trap.

For each PC sampling method (hosttrap and stochastic), we use a double
buffering mechanism to transfer data between GPU and host.
The GPU will dump data into one buffer while CPU may be reading data
from the other buffer. There are 2 separate signals, one for each
buffer.
When signal != 0, the buffer belongs to the GPU and the GPU can write
to it. Once the buffer has reached the high watermark, the GPU will
set the signal to 0 to wake up the host and so that the host can try
to switch the buffers and read the data.

Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: If3eb0913e52fb4788059a71e5feca334612f3d5d
2024-04-11 12:52:54 -04:00
David Yat Sin a83f872a23 PC Sampling: Create dedicated CP queue
Create dedicated CP queue with highest priority for PC Sampling. Reduce
the highest priority that LRT's can set for existing API so that PC
Sampling queue will always have highest priority over any other CP
queues

Change-Id: Ia70d74415edc83b4862a3e18dbdbd7cebe73ab47
2024-04-11 12:52:48 -04:00
David Yat Sin a842247482 PC Sampling: Add start stop and flush APIs
Create PC Sampling APIs for start and stop functions. And create stub
for flush function.

Change-Id: I7a093b29dc87e34ac06faaae6cac2be50e4663e1
2024-04-11 12:52:42 -04:00
David Yat Sin 632f9e60f7 PC Sampling: Add create and destroy APIs
Implement PC Sampling session create and destroy APIs.

Change-Id: I93370d3d01b74ee15e71b8b0e20feb8f0066a3dc

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Vladimir Indic <Vladimir.Indic@amd.com>
Change-Id: Ib0c64356a1a4616b12d5dbeebe16273fe2a84abe
2024-04-11 12:52:35 -04:00
David Yat Sin 295acf6b27 PC Sampling: API to list supported configurations
Add new PC Sampling API to list the supported PC Sampling methods and
options on a specific agent. If there is already a PC Sampling session
active on this agent, the list of methods returned will be reduced to
methods that can be run simultaneously with the current active session.

Change-Id: I42ac2b8f30d5c368faf8ed4cf37ca4134db22985
2024-04-11 12:52:30 -04:00
David Yat Sin 0bc244e10a PC Sampling: Create PC Sampling interfaces
Create new interface group for PC Sampling

Change-Id: I59b4cfe9f8d1ae313dc28be1d2ed49f750d8212b
2024-04-11 12:52:23 -04:00
David Yat Sin 6a7122b183 PC Sampling: Update public headers for new APIs
Change-Id: Ib9987efdb41d5f6d203e7e86f9b26809d020e04e
2024-04-11 12:52:16 -04:00
David Yat Sin 71f1a6726c Create fine-grained allocator
Create allocator helper function to provide fine-grained memory on
a specific agent.

Change-Id: I32ba9aceb9c9dc708b140a0c45158e6e7a018844
2024-04-11 12:52:10 -04:00
David Yat Sin 721e56ef5c Extend ExecutePM4() to accept completion signal and fences
ExecutePM4() function can optionally accept extra arguments for
acquire fence scope, release fence scope andcompletion signal. When
a completion signal is provided, ExecutePM4() does not wait for the
commands to complete.

Change-Id: Ib2a433b7bce1cb6260be8b76fe902335bd5dfada
2024-04-11 12:51:52 -04:00
David Yat Sin d7adc94e3f Add limit checks for HSA_SINGLE_SCRATCH_LIMIT
Hard limit for scratch is 4GB per XCC and checks in case user specifies
values exceeding this value

Change-Id: Ib3cade762ff66c7e7d6a2d311e482cacbcf2b0de
2024-04-11 14:03:25 +00:00
Konstantin Zhuravlyov 08c94463de loader: allow but skip static relocations for code object v2+
Change-Id: I4ae14cb5e740d7d45810b75038b15a0b94d2bf0b
2024-04-09 11:39:18 -04:00
Konstantin Zhuravlyov b983c19729 Switch to per-executable contexts in the loader
- Per-executable contexts should be used from now on
  - Global contexts are left as is for now for backwards
    compatibility and will be phased out in follow up
    patches.

Change-Id: I6291abf865c7ed24ee71f5065e539afc23f5ce64
2024-04-09 10:31:51 -04:00
Shweta Khatri 244ad319ac Revert "Use HybridMutex for IPC locks"
This reverts commit 5c520f4544c654e5f18e05cabd1c63d64473cfab.

Reason for revert: This patch is introducing a synchronization related bug in Unit_hipGetSetDevice_MultiThreaded testcase.

Change-Id: I367e4d4f1d75b21658ac1127c58982894a97cedb
2024-04-02 12:27:55 -04:00
Stella Laurenzo a180fea5ad Properly nest build time headers to match arrangement at install time.
The build tree was missing a level of nesting, causing diversions based
on in-tree/out-of-tree use.
KR: Also fixed kfdtest paths

Change-Id: I8638b6d6227daabddd8eaa2aa387ba578b8dfab8
Signed-off-by: Stella Laurenzo <stellaraccident@gmail.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-04-01 17:10:40 -04:00
David Yat Sin efe455c2fa Temporary: Set AllocateGTTAccess and node_id for MES
Temporary change to set the AllocateGTTAccess flag and node_id
on MES devices.

Change-Id: I22385d11b17b76cfb44278fa0d8a09bc8721cea6
2024-03-29 19:38:19 +00:00
David Yat Sin 541d0dbbae Set NUMA region to 0 when using GTTAccess flag
When allocating memory for MES AQL queue structure, the PreferredNode
is set to the device index of GPU to hint the location where the BO
needs to be created. But we need to ignore the device index when calling
bind_mem_to_numa.

Change-Id: Iae69fe02bfd48c5a3bd495319f6f2706d6e8aea2
2024-03-29 17:17:56 +00:00