Commit graph

679 Commits

Autor SHA1 Nachricht Datum
David Yat Sin c2a60a4d5d Fix scratch memory alignment on GFX11
GFX11 requires scratch memory alignment of 256 Bytes instead of 1024.

Change-Id: I103de1c12f3a4877d7d36f13254301166c66e11f
2022-08-04 11:23:28 -04:00
David Yat Sin 90322899fe Update scratch register definitions for GFX11
Update scratch register definitions for GFX11 asics.

Change-Id: I6195e04b0a099fe84d1015c2f34ca3756a8175ef
2022-08-04 11:23:28 -04:00
Graham Sider 061aa04147 Make queue memory allocation non-paged
Non-paged allocation for queue memory necessary for binding wptr to
GART. Required to support usermode queue oversubscription with MES for
GFX11.

Adds AllocateNonPaged entry to MemoryRegion::AllocateEnum for clarity;
aliases AllocateIPC.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I1a97a1820da26cf2433d9c237b2e6d2b0b8628b4
2022-08-04 11:21:00 -04:00
Graham Sider db1a13aa05 Clean up includes in queue.h
Formatting.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I141c8308d6b283b376035e21344629dc665289bb
2022-08-03 10:57:17 -04:00
David Yat Sin 907e05c1b3 Add new ImageManager for GFX11
Adding new ImageManager class for GFX11 GPUs

ImageManagerGfx11 functions copied from ImageManagerNv.
Register descriptions in resource_gfx11.h updated for gfx11.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Change-Id: I48b39f6a633aef14aa829f7240a43fe0feb1c290
2022-08-03 10:57:09 -04:00
David Yat Sin cc3bd31591 Add gfx1102 support
Change-Id: I39cbda81a7a999aa2ecfad7a3e720000f7ca3408
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2022-08-03 10:56:54 -04:00
Graham Sider 446c5e9672 Add gfx1100 support
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ic5d5559e43df5c73409ba900a42c6901aabae661
2022-08-03 10:56:49 -04:00
Jay Cornwall 710adcc252 Add gfx11 blit/trap shaders
David Yat Sin:
   Rebased to amd-staging branch
   Changed MSG_GET_DOORBELL to MSG_RTN_GET_DOORBELL

Change-Id: I6015e54c4d8897f4c796f58c7fbc298758c6d76d
2022-08-03 10:56:41 -04:00
Jonathan Kim 9d2fe1ac2a Fix GPU destruction when user disabled
GPUs excluded by RVD are not expected to have scratch, memory, trap
handling nor memory regions set up.  Now that these GPUs are added to
a new list, early return on agent destruction to prevent bad function
calls on destroy.

Also fix up broken memory releases between the gpu lists and ugly braces.

Change-Id: I52fc6e86ceba0a0383cedc63310eb409515eaf9f
2022-08-02 14:18:43 -04:00
skhatri 364715cbc6 Enabled allocation of pseudo fine grain memory where memory ordering is per point to point connection
Atomic memory operations on these memory buffers are not guaranteed
to be visible at system scope

Change-Id: I4cccde114632071a000384502a83bc191e77e85b
2022-07-29 15:15:56 -04:00
Konstantin Zhuravlyov d962fc39bb Add support for the following kernel symbol query:
- HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK

Change-Id: Idff5c1a2ce2a3e2d65bcc9cf1f66a68d37cd41ef
2022-07-29 15:15:24 -04:00
Konstantin Zhuravlyov 5a49b4d17f Bring AMDHSAKernelDescriptor.h in sync with llvm
Change-Id: Icd35100ad4d7eb8638786d306ecfbbb1c8842db1
2022-07-29 15:14:39 -04:00
Ashutosh Mishra a229f5c320 Removing package dependency to thunk
The current state of hsa-rocr does
NOT requires thunk lib as its dependency.
Its unnecessary pulling thunk package while
installing rocr. This patch corrects
the same

Change-Id: Id98ede8b66ffd9aaf4a47da96ba2f981f4c3da73
2022-07-22 09:42:38 -04:00
Sean Keely c2b9abaa1d Add missing query on CPU agents.
Adds HSA_AMD_AGENT_INFO_SVM_DIRECT_HOST_ACCESS.

Change-Id: I317d7b451ed2910cdf2290b196fd89e3bf0be435
2022-07-22 09:42:38 -04:00
Ashutosh Mishra 23f908708a Adding Maintainer DL
Maintainer distribution list field had wrong information.
Adding the newly formed DL by the component team.

Change-Id: I61651e429375cdc512d0fe4b0768f917506b5392
2022-07-22 09:42:28 -04:00
Jonathan Kim f600687537 Only allow pairwise CU enable for devices with WGPs
A work group processor (WGP) require both its CU to be enabled
in order to be enabled.

The KFD will round robin distribute by even-indexed pairs so
enforce this requirement for runtime set mask calls.

Change-Id: Ic46661b01f398aa1fe24d96b5c9c31f122f967a3
2022-07-07 12:50:24 -04:00
Sean Keely a8603b9397 Fix IPC copy agent lookup.
Discovered agent handles should only apply to copy routing, not to
copy device selection.  The user may not have mapped all allocations
to all GPUs so we must ensure that the copying device is one passed
by the user.

Change-Id: I2532e66d30e6842624e594f235dd144a186220d4
2022-07-05 22:51:26 -05:00
Sean Keely dec37625ed Report nominal GPU wallclock frequency.
Adds agent info query HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY.

Change-Id: Ib9108d51f9df89f8566291258aab3d1b87243441
2022-06-28 11:25:18 -04:00
Sean Keely 965df6eef7 Basic SVM profiler.
Mostly a demo at this point.  Logs SVM (aka HMM) info to
HSA_SVM_PROFILE if set.

Example: HSA_SVM_PROFILE=log.txt SomeApp

Change-Id: Ib6fd688f661a21b2c695f586b833be93662a15f4
2022-06-23 19:30:06 -05:00
skhatri e7fc301aa7 Adding support for rocrtracer tools loading without environment variable
During hsa initializing stage, ROCr now searches all the loaded libraries
for a  symbol "HSA_AMD_TOOL_PRIORITY" and adds all those libraries to
the tools library init list.  Tools libraries listed in HSA_TOOLS_LIB
env variable are also loaded in the given order and take priority
over HSA_AMD_TOOL_PRIORITY.

Change-Id: I739af42bbd777c44a9152c11e17dd69979b65e82
2022-06-23 20:08:30 -04:00
Sean Keely e7152c8b16 Add format script.
Adds a script to run clang-format on the latest patch so we don't
need to remember the command line.

Also applies missing formatting to the prior commit,
"Add API for available GPU memory".

Change-Id: Ida51aedc38af229f6a26e275072654860748fa93
2022-06-23 20:08:30 -04:00
Ranjith Ramakrishnan 52bea549e3 : Use GNUInstallDirs
Use GNUInstallDirs variables in post install scripts

Change-Id: Id0e3e37d412a30521d9846082d025a9e19a43942
2022-06-22 16:28:06 -04:00
David Yat Sin 4ac840269c Add API for available GPU memory
Add support for AMD Agent to return amount of memory available

Change-Id: I5c32e2cebbaa2993b044250aefe434e4cc02d8c2
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-06-07 10:33:18 -04:00
Sean Keely dd671b49e5 Lookup copy agent when blit is selected.
Disallow passing agent 0 to avoid any API change.

Change-Id: I704fb2e04cec50500fac41a405c8a7e83a3c9fb5
2022-05-14 18:08:57 -05:00
Sean Keely 3ebe99f96d Add experimental option to force discovery of all copy agents.
Discards all user provided async copy agent info and relies on
pointer info discovery.

Change-Id: Ife3e708a49ffccbede4983ab47d5ed0032970857
2022-05-14 18:08:57 -05:00
Sean Keely 13a0cdfa77 Use block pointer info in async copy.
Only block info can return an agent which is disabled in the
process.

Change-Id: I34cb1f9eea9217e10a484726c90d930e3414e769
2022-05-14 18:08:57 -05:00
Sean Keely 247606c455 Report owning agent with pointer info block information.
Physical owning agent may not be visible to the current process
due to RVD.

Change-Id: Ib463336a5ed73a479f3aa74eb140932b9e0435fb
2022-05-14 18:08:57 -05:00
Sean Keely c289a43e88 Allow zero agent handle in AsyncCopy APIs.
IPC use cases with RVD set can't convey proper agent handles.
Runtime discovery is required to properly route the copy in this
case.

Change-Id: I4c97e132fb4b6ac1040de1cb17fe5a3e36d6be48
2022-05-14 18:08:49 -05:00
Sean Keely ace0599c69 Report pointer info queries to released fragments as type UNKNOWN.
We should not leak suballocation info to users.

Change-Id: I13b2a22bf5517b523ba04ddc039b49da8378b55f
2022-05-09 13:46:16 -05:00
Sean Keely 0ba9b162db Ensure IPC imports always create an allocation map entry.
Simplifies behavior.  A memory type now either always generates an
entry or never does.

Change-Id: Ie98cddea01e801308ac0ba650795fdef92b7e47d
2022-05-09 13:46:16 -05:00
Sean Keely 752cfd5ffd Adjust include paths for new header locations.
Thunk and rocm_smi_lib paths have been updated.

Change-Id: If2948172f8064dd992cbccbc2a80f9161ad4d457
2022-05-09 14:44:32 -04:00
Ranjith Ramakrishnan bb4da8545a File Reorganization changes with backward compatibility
Wrapper header files and library soft links for backward compatibility
Install interface updated with /opt/rocm/include

Change-Id: If772b24320f9d1de90f9be0930b1f2aa1d073777
2022-05-06 19:12:14 -04:00
Sean Keely 7f370dd84c Drop build dependency on DeviceLibs.
DeviceLibs is still needed but is found and included by clang now.

Change-Id: I03ff7dc91c028d2ee6747aa1779d223a9ba13915
2022-05-06 01:01:05 -04:00
Sean Keely 0ee82742a7 Switch to CLOCK_BOOTTIME for HSA system clock.
This is consistent with KFD and has significantly better latency.
KFD is taking this as the definition of the SystemClockCounter.

Change-Id: I4c1b3bc58c738206265c55ebefd41356c013bfe5
2022-05-05 15:27:29 -04:00
David Yat Sin cd0788938c Remove unused variable
Change-Id: Ie29eb1cabef38c259280237c32d83aaa126e3b7a
2022-05-04 13:32:06 -04:00
Yifan Zhang 54c8b7900d add gfx1036 support
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: Ifc1b3cf2e46cf753f57470ebc6b034c1a349d3d2
2022-04-29 17:52:22 -04:00
Shweta Khatri 1b0440e7b3 Assemble trap handler at build time.
Eliminates the need for manually assembling the source of the
second level trap handler to produce the shader binary.  Also
separated blit shaders' binary source and version one second
level trap handler binary sources into different header files.

Change-Id: If29a18ee06dc083ec880ea962f234c6b5cac806a
2022-04-28 20:14:14 -04:00
Jonathan Kim 658b053943 Bypass HDP flush during SDMA copies on A+A GPU-CPU xGMI connections
Host to device SDMA copies do not require an HDP cache flush when
connected by xGMI since data copies over the data fabric and not HDP.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Sean Keely <sean.keely@amd.com>
Change-Id: I78d73a47edcc1a9c0ba59f33cf91485f13f1c45b
2022-04-27 21:45:26 -04:00
Sean Keely 64dae113b1 Minor typo fixes.
Declare the type of HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT
and add a missing break statement.

Change-Id: I86ce8a2e620438e046b60cee991ce1fbe07a3e88
2022-04-26 15:51:22 -04:00
Sean Keely 2eedf953f3 Handle scratch interleave per SE for gfx10+
On gfx10+ we need to issue a minimum count of active lanes or
groups before ADC moves on.  Ensure that scratch allocations
attempt to reach this limit.

Occupancy throttling due to OOM condition may still drop below this
limit.

Change-Id: I0edf2e40fbe1a95e9a262564cebd2b6a82501a0b
2022-04-26 15:32:03 -04:00
Jeremy Newton 178a7a5cfa Drop some unnecessary definitions
__x86_64__ and __AMD64__ should be already defined by the compiler to
specify the compilation target and shouldn't be defined manually.

I fixed two x86_64 checks to include VS variables, as removing this
might cause it to fail to compile on that compiler.

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I600ff449af85bf7d83ecab167d97933922e2d917
2022-04-19 12:22:42 -04:00
Jeremy Newton ddf4edcafc Use CMAKE_INSTALL_*
Instead of installing to lib or include, use CMAKE_INSTALL_LIBDIR and
CMAKE_INSTALL_INCLUDEDIR to allow the builder to override if desired.

The default LIBDIR should be "lib" to avoid breaking ROCm packaging, but
using GNUInstallDirs would use lib64 on RHEL. By setting a default value
prior to including GNUInstallDirs, we can always use "lib" unless the
builder explicitly overrides it via "-DCMAKE_INSTALL_LIBDIR", which is
typical in most distro scripts.

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I135f21bcfeb02b6849f6e8ca403b39c029a02d5c
2022-04-19 12:22:42 -04:00
Jeremy Newton a0931f4a3c Only default IMAGE_SUPPORT=ON for x86
Image support does not compile on other archectures, since it relies on
the x86 only header "x86intrin.h".

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I120d15870e74e20bd618e6f5da8c05e28fb1203b
2022-04-12 09:24:45 -04:00
Konstantin Zhuravlyov 9265409f08 Add code object v5 support
Change-Id: I03522765056e99ed49e6c5e213ee3753852de27b
2022-04-12 08:53:27 -04:00
Sean Keely b3caf6782b Revert "Release host buffers after segment freeze."
This reverts commit 03a52655a8.

Change-Id: Idc7e568b2b54a226dbe4d189b25a78be3bd16eea
2022-04-11 20:43:07 -05:00
Sean Keely 4e9849034d Correct inf loop defect in fast clock init.
Each time delay is grown we need to reset elapsed.  We want to take
the most accurate sample from the set at fixed delay.

Without this we will hang if there is ever an insufficiently accurate,
high unit clock read.

Change-Id: Ic65f364067789ac85a6572d67af2d77528e265bb
2022-04-01 16:15:37 -04:00
Sean Keely 03a52655a8 Release host buffers after segment freeze.
Release staging buffers after loading has completed.  The debugger
no longer uses this copy.

Change-Id: I46f36b50033bebe5a9ebc648b291d46f1d09b21d
2022-03-23 23:53:02 -05:00
Sean Keely 048700f2e7 Correct loader memory interfaces.
The loader must use internal interfaces to access page allocation
flags.  Code pages should also ensure use of cached memory.

Also relocate i-cache flush after code page copy.

Change-Id: I86d36243b6eebb1d46b991b372a5236baaf941ab
2022-03-23 23:52:56 -05:00
Sean Keely fbc48521dc Correct queue error reporting.
VM faults should not report via the queue error handler.
The system event contains much more useful information.

Change-Id: I744d9b97b23334d7ed2c0f450111c1b8032567e3
2022-03-23 23:37:53 -05:00
Sean Keely af0f90800d Ignore hive id for CPUs when selecting copy paths.
Hive ID is used during copy path selection to locate an optimal
pool of SDMA engines.  However, for CPU-GPU connections we always
want to use the host port facing engines, known generally as the
PCIe optimzed engines.  We want this selection even when the
connection is XGMI hence dropping the hive id for CPUs.

Change-Id: Iffe44174afecfc0bb3272b806fce549c930a49d9
2022-03-18 18:48:44 -05:00