Граф коммитов

996 Коммитов

Автор SHA1 Сообщение Дата
David Belanger 13c3f06dfe Fix overflow in max_slice variable for GFX12
Change max_slice type to uint64_t and calculation to 64-bit, otherwise
value overflows to 0.

Problem triggered only on GFX12 as field size was increased.

Change-Id: If26451224538743dabc41bdc1b327c6ef021bc24
Signed-off-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-07-17 22:32:23 -05:00
David Belanger 4f453f3bd4 Fix image issue on GFX12
Fix encoding of pitch in SRD (1 bit missing).
Issue affects images with pitch > 8192.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Id0b431f51ab3984d1a47d3e8c13d35e28a6009cf
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-07-17 22:32:23 -05:00
Chris Freehill 5820fa37d7 Set PARENT_SCOPE for HSA_DEP_ROCPROFILER_REG
This variable is now in a sub-project, but needs to be visible
in the super-project.

Change-Id: I14d307646253df8f0a8a50d01b8ca677b904234c
2024-07-17 17:52:59 -05:00
David Yat Sin 08c44fbda6 Add hsa_amd_vmem_address_reserve_align API
New API to support alignment parameter when reserving virtual addresses.
If the alignment is 0, then the default size is used. Otherwise the
alignment needs to be a power of 2 and greater than or equal to page
size.

Existing hsa_amd_vmem_address_reserve marked for future deprecation.

Change-Id: I17cee75420183dea5842fc1ecc2514cdcd760bac
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:57:22 -05:00
Yifan Zhang 71494a920b Add support for GC 11.5.2
Change-Id: Iad8604881dc66108933ac2155fef3b74bca9ac3f
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:50:03 -05:00
Ranjith Ramakrishnan 14ed20e0cc Update elf library search path with lib64 path as well
The elf libraries are installed in /usr/lib64 in RHEL.
Removed invalid paths

Change-Id: I8c2b5525c1e3b62a2bd4e31a442d9931005c2f30
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:50:03 -05:00
Vladimir Indic c15e5d0e9d PC Sampling: Add s_nop prior to s_sendmeg
Add s_nop before s_sendmsg. This is required because the HW does not
check for dependencies for SALU writes to M0.

Section 4.5: Manually Inserted Wait States (NOPs)
"AMD Instinct MI200" Instruction Set Architecture
https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/instinct-mi200-cdna2-instruction-set-architecture.pdf

Change-Id: I90f503e3cc80cd29eab8bafa2565699461654055
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:50:03 -05:00
Lancelot SIX cb8705627f trap_handler_gfx12: fix-math-excp-size
The current trap handler defined:

    .set SQ_WAVE_EXCP_FLAG_USER_MATH_EXCP_SHIFT    , 0
    .set SQ_WAVE_EXCP_FLAG_USER_MATH_EXCP_SIZE     , 6
    .set SQ_WAVE_TRAP_CTRL_MATH_EXCP_SHIFT         , 0
    .set SQ_WAVE_TRAP_CTRL_MATH_EXCP_SIZE          , 6

However, the ALU exception in EXCP_FLAG_USER go from bit 0 (alu_invalid)
to bit 6 (alu_int_div0), making it a total of 7 bits, not 6.  Similarly,
the corresponding bits in TRAP_CTRL go from bit 0 to 6 as well.

Fix the incorrect size to be sure to properly detect the int_div0
exception.

Change-Id: I60c2d94a447b71ca0ce26a87b7f55b055b9aef8e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:41:53 -05:00
Yifan Zhang 1d1a32d725 GFX1150: remove dupilcated definition of GFX1150
This patch is to remove duplicated definition of GFX1150.

Change-Id: I4a8b8bce5c2721748c4d64e1da13b59feae2139a
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:41:53 -05:00
David Yat Sin f1a13b6d87 Move addrlib into rocr namespace
This avoids conflicts in case application is loading another copy of
addrlib.

Change-Id: Ifb4a10270c867366d5eed0a8c015257b415189a5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:41:53 -05:00
David Yat Sin beb9a42998 VMM: return error if memory-only handle alloc fail
Return HSA_STATUS_ERROR_OUT_OF_RESOURCES if thunk call to allocate
memory handle returns NULL.

Change-Id: I6cf74f93f7d606416414ea7c2354db86aeef3137
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:41:53 -05:00
Lancelot SIX 9e625307d2 trap_handler_gfx12: Do not override STATE_PRIV.BARRIER_COMPLETE
The value of STATE_PRIV is captured by the 1st level trap handler, and
passed on to the second level trap handler.  The value is to be restored
before exit.  However it is possible for the value of
STATE_PRIV.BARRIER_COMPLETE to change while the wave is in the trap
handler (all the other waves in the workgroup has signaled the
work-gropu barrier), and in this case restoring STATE_PRIV in full would
result in STATE_PRIV.BARRIER_COMPLETE to be cleared.

Restore every bits of STATE_PRIV except for BARRIER_COMPLETE before
return to prevent this race.

Change-Id: I76c875bced7d23c58670b28f257d22c933f99fc5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
Jonathan Kim b8aae52404 Disable large copies for gfx94x
GFX94x runs into performance regression when doing large packet
enqueues.

Drop back to legacy packet sizes for now.

Change-Id: I595838ebada66c6c5143bfdb2f56c83ee71654a9
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Yat Sin e721eb509b Remove debug bits set in forbiddenBlock
Removing extra bits set in forbiddenBlock that seemed to be set for
debugging and are causing unexpected image formats to be used.

Change-Id: I29c9e319907027a2b0b6bf7c1c0c8558eb6a36f4
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Yat Sin cf4b5e1598 Update Addrlib gfx10 files
Update changes to  gfx10 addrlib files from:
https://gitlab.freedesktop.org/mesa/mesa.git

mesa top commit:
4d298673da9b05d826b960eece2e715a6b187330

Change-Id: I6015c827d3e9b1fbde034686432670958f424a1d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Belanger 6d147dd3b1 Implement SDMA_PKT_COPY_LINEAR_RECT for GFX12
Packet for GFX12 is incompatible with pre-GFX12 as some fields changed
location.   Implement code path and packet specific to GFX12.
This fixes some issues with SDMA blits and 3D images.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I56c204aaa12160e563ec960bd3b226cfa94e142d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Belanger f8a015f53e Implement AddrLib support for GFX12
Add new files image_manager_gfx12.{h,cpp}.

Implement BUF/IMG/SAMP desc changes for GFX12.

Implement compute surface info code using AddrLib3 API (new starting
from GFX12).

Implement algorithm for choosing "best" swizzle mode (starting
from AddrLib3/GFX12, AddrLib provides only list of suitable swizzle mode,
up to client, ROCr, to choose the best).   Algorithm implemented follows
behaviour in GFX11 and behaviour for GFX12 on other platforms.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ib344c86228a98bbac5acdab421ee2ef9b1e84eef
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Belanger def4a6c326 Updated amd_aql_queue for GFX12
Added GFX12 implementation for InitScratchSRD and for compute_tmpring.
Implementation for compute_tmpring could be combined with GFX11 with some
refactoring as a possible future improvement.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I8013cbe4438786bf41bbfd03f6a5d3b9ef51e7bf
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Belanger 8165da63cc Added/Updated header files for AddrLib support (GFX12)
Updated struct definitions, field size changes and new fields in
registers.h.

Added resource_gfx12.h and updated fields in BUF/IMG/SAMP descriptor
structs based on documentation.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I08f05ba30f54c40e7b823a6a105829a1e8590b3d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Yat Sin 7dd90f8361 Disable extended-scope memory on gfx120x
Do not allow extended-scope fine-grain memory on gfx120x devices.

Change-Id: I1e6e6c1860de00160cca9d8137b129c7e32c0526
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Belanger 288dea4c71 Updated makefile for GFX12 addrlib
Added GFX12 and AddrLib3 files, updated include paths.

Change-Id: I4880eadfd627b79ebcf2fe26b91649642911b050
Signed-off-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Belanger bb02f4e9a7 GFX12: Update addrlib
Updated address lib to mesa amd-temp-gfx12 branch.
Commit: 6e5244bd3184f0720197270a10e031b5ecd5fe75

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Icaead4f38c5f3019c375116070b1f97a927f09b0
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
Lancelot SIX 7a3bf30769 trap_handler_gfx12.s: Fix access to EXCP_FLAG_PRIV
There is an issue in the gfx12 trap handler where the EXCP_FLAG_PRIV
is only fetched under certain conditions (trap_id != 0) while it should
have been fetched unconditionally.  As a consequence, the interrupt
payload might contain invalid data, leading to incorrect exceptions
being reported by the runtime.  Debugger is mostly un-affected as it
will inspect the wave's state to figure out what exception(s) have been
reported for each wave.

Also, it is not necessary to check for the host trap bit if trap_id is
!= 0 in gfx12, there is on trap ID anymore for host trap.

This patch implements those fixes.

Co-Authored-By: Laurent Morichetti <laurent.morichetti@amd.com>
Change-Id: Ib72cd8cc5d935ca643e241da7fccd3f96201b09d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
Lancelot SIX ff9b11fd89 trap_handler_gfx12.s: re-order constant declarations
The constant declarations in trap_handler_gfx12.s have been sorted
alphabetically, which causes inconsistencies.  Fix the order of
declarations where it makes sense.

Change-Id: I5b05d87a5afbe1ff3362746801a1c9373537b49e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
Lancelot SIX 855015377c Add GFX12 trap handler
Given the differences between previous architectures and gfx12, this
patch implements the gfx12 2nd level trap handler in a separate source
file, and adjusts the build system.

Change-Id: I65192ffbbcd66a4f78d2d0c3fb1739a92cac95d4
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
Sreekant Somasekharan 24463635f9 Initial GFX1201 changes.
Add target gfx1201 to several files.

Change-Id: I5cae7dba00ed58f8fbfa6e7147275bd7d5feaed0
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Belanger 40cc6559f1 Add Blit shaders for GFX12
For GFX12, the workgroup id is passed in ttmp9 (trap temp register) instead of the scalar register.
Normal shader code (i.e. not priv, not trap handler) can only read the ttmp registers.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I42404d8c8c0ee9c746e23879fd30b2d16cfa1787
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
Shweta.Khatri 4e9647704d Fix soft hang on AQLQueue destruction with a timeout
Add timeout to AQLQueue destructor signal wait to prevent indefinite hang

Change-Id: I6c6c98a7bdd27d39569af1d667aa9aa7e9596535
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Yat Sin 2f05c2a273 Revert "Use pthread_setaffinity_np"
This reverts commit 1df7a44112e45b7fb447926778490f741601219a.

Change-Id: Ib386c8f944b6da0ef68ddd2be3f26013cd36ef5b
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Yat Sin 1cee8656df Revert "Use pthread_attr_setaffinity_np when available"
This reverts commit ef95ccf81e59b8608861e8f2f256d981eee19df7.

Reason for revert: Causing performance regressions on some systems

Change-Id: I82951350cafbd57c495852d6f90023a3373f04f6
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Belanger 2f14acd9c1 Initial GFX12 changes.
Add target gfx1200 to several files.
Add cases for GFX12 in a few switch statements.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ib90032f5b9d5a3306060f13a43d970108a1399df
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
Ranjith Ramakrishnan 696d8fae9e Static package generation for hsa
Generate static package by combining binary and dev components.
Binary and dev component dependencies are added to the static package dependencies
No dependency to rocprofiler-register
Package name will have suffix static-dev/devel

Change-Id: I2f9680f13dbffc9eb7ced9fa9b28e360c47ebcca
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
Tony Gutierrez 69ba32fa95 driver: Add a core driver interface component
Add a new driver interface as a core ROCr component.

The driver component provides an interface for ROCr to interact with
agent kernel-model drivers in a generic way. This interface will be used
to interact with the XDNA NPU driver. Eventually, the ROCt library's
functionality should be implemented behind this interface.

For now the interface provides basic queue and memory allocation
for supporting HSA queues and signals and matches the thunk API
closely.

Change-Id: I37ac9f2dcbadc86ce45999f76b0e9ce753fd0c06
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:16:40 -05:00
Lang Yu 2f50b35daa Simplify APU query
Query APU from thunk instead of parsing device id.

Change-Id: I95efa9e2a94fb979eaa88042991ee6921abbed7f
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:16:40 -05:00
Chris Freehill 662f6817d7 Changes to build ROCr & thunk (optionally tests) in rocr-runtime repo
Create a new top-level CMakeLists.txt file to control building thunk
and ROCr. kfdtest and rocrtest are built separately.

Most of the cmake code that existed for thunk, ROCr, rocrtst and kfdtest
still reside in their respective CMakeLists.txt files, except the
CPack packaging directives which have been moved to the top-level
CMakeLists.txt.

Change-Id: I1a537359029504af8b1abb324bc6f0d75d98471e
2024-06-24 14:26:21 -05:00
David Yat Sin ac5fb8be9e Temporary: Do not early release mutex when not ganging
It seesm the Release() function is not reliable and can cause segfaults.
This is a temporary work-around until the Release() function is fixed.

Change-Id: I95470a800c6153673e4b8f4fe46a646903325074
2024-04-30 17:07:39 -04:00
David Yat Sin 57b93e02a4 Use pthread_attr_setaffinity_np when available
If pthread_attr_setaffinity_np function exists use it instead of
pthread_setaffinity_np as pthread_setaffinity_np seems to fail to set
the affinity settings on some systems.

Change-Id: Icd8b17039699ac10d9cd5c4dbb6ac44630673949
2024-04-29 15:02:54 +00:00
David Yat Sin b6829f7a72 Bump HSA_AMD_INTERFACE_VERSION_MINOR
Bumping HSA_AMD_INTERFACE_VERSION_MINOR version to 5 to account for
previously added GPU agent query: HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES

Change-Id: Ic8cfdcfb7bad6f3d1e0b3d68f505a62074fc26b9
2024-04-29 12:55:18 +00:00
David Yat Sin 3d999a1adf Perform HDP flush for SDMA copies gfx10/gfx11
Perform HDP flush on gfx10/gfx11 PCIe devices.

Exclude gfx101x devices

Change-Id: Ief76c34634b09b0a7942cb71519d4082ca8b4fad
2024-04-24 18:07:34 -04:00
David Yat Sin 9af225e1b1 Add support for contiguous memory allocations
Support contiguous physical memory allocation flag. Allocations with
this flag will have contiguous physical memory. This is dependent on KFD
support for this flag and the AllocateKfdMemory(..) function call will
fail when it is not supported.

Change-Id: I6c51c8b061f7b026fdcc2aa2c37c74ecc13d95b6
2024-04-24 14:02:07 -04:00
David Yat Sin e539c8dce2 Remove assert for physical vs virtual memory size
On systems with more than 1 TB of memory per NUMA region, this triggers
unnecessary errors.

Change-Id: I1bc7f209b9c1739b516c9f6b0acf434488ac7b8d
2024-04-24 08:43:23 -04:00
David Yat Sin f2751b7030 Fix queue creation for PC Sampling
Fix lazy pointer initialization for dedicated PC Sampling queue.
Previous implementation would always create a queue on GPU agent
creation instead of creating the queue on first use.

Change-Id: Icf300f2b162e59143ba61ba182d9bee6e1308fc1
2024-04-22 19:00:48 +00:00
Shweta.Khatri bc9cac97fe Fixing compilation errors related to MUSL libc
Fix Musl libc NULL errors and unsupported pthread funcs for compatibility.
Also ensures cleanup and error handling irrespective of CPU affinity override.

Fix submitted by github dev - AngryLoki
https://github.com/ROCm/ROCR-Runtime/issues/181

Change-Id: Ia487315e504112be5d3370756f23f6e23b9ae4be
2024-04-17 07:14:15 -04:00
David Yat Sin d6d5786051 Adding queue information queries
New hsa_amd_queue_get_info API to support:

- HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue

- HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue
completion signal.

Change-Id: I98842131bcbdd08552649791a5d43e578a615808
2024-04-11 12:53:48 -04:00
David Yat Sin 3443fdf665 PC Sampling: Disable coredump when sessions active
When doing a coredump, we try to park the wave and save its PC in
ttmp7/ttmp11, but these registers will be overwritten by PC Sampling
requests.

Change-Id: I60fb734eb3bed4ee3cc8d8bba9ec4a527fff9671
2024-04-11 12:53:43 -04:00
David Yat Sin 49e56ce782 PC Sampling: Convert timestamps to system time
Convert timestamps inside samples to system time

Change-Id: I5fad9a6887fa27c0ded9aa9b5f251cba2868f88f
2024-04-11 12:53:37 -04:00
David Yat Sin 547c9cb143 PC Sampling: Implement lost sample count
Change-Id: Idfdfbac71c1813dd7a97c301619cf8ce83713c53
2024-04-11 12:53:31 -04:00
David Yat Sin 8abbf9475b PC Sampling: Implement flush
Flush is used by the client to retrieve data that are currently stored
in the buffers. This is used by the client to retrieve current data when
the buffers are not full.

Change-Id: Ib8304dcdfb2797cb060ec72df4970d95cf6be348
2024-04-11 12:53:24 -04:00
David Yat Sin 5177d17f5d PC Sampling: Push data to PC Sampling client
Each time there is enough data to fill the client session buffer,
callback the client data ready function to transfer the buffer contents
to the client.

Change-Id: Id79775426fa6d22e00dc2ef6f55c439eacb9b2af
2024-04-11 12:53:17 -04:00
David Yat Sin 855e454671 PC Sampling: Retrieve data from trap handler
Retrieve data from the buffers previously set in the 2nd level trap
handler TMA. We use a double buffering mechanism to allow the 2nd level
trap handler to write to one buffer while we are copying data from the
other.

Co-authored by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Co-authored by: James Zhu <James.Zhu@amd.com>

Change-Id: I252c381ea06b8cf927c4f9af6ea59dedc3717fbb
2024-04-11 12:53:12 -04:00