Wykres commitów

839 Commity

Autor SHA1 Wiadomość Data
Sean Keely b757b209ad Report owning agent with pointer info block information.
Physical owning agent may not be visible to the current process
due to RVD.

Change-Id: Ib463336a5ed73a479f3aa74eb140932b9e0435fb


[ROCm/ROCR-Runtime commit: 247606c455]
2022-05-14 18:08:57 -05:00
Sean Keely 289a86785b Allow zero agent handle in AsyncCopy APIs.
IPC use cases with RVD set can't convey proper agent handles.
Runtime discovery is required to properly route the copy in this
case.

Change-Id: I4c97e132fb4b6ac1040de1cb17fe5a3e36d6be48


[ROCm/ROCR-Runtime commit: c289a43e88]
2022-05-14 18:08:49 -05:00
Sean Keely 14c6bd37fd Report pointer info queries to released fragments as type UNKNOWN.
We should not leak suballocation info to users.

Change-Id: I13b2a22bf5517b523ba04ddc039b49da8378b55f


[ROCm/ROCR-Runtime commit: ace0599c69]
2022-05-09 13:46:16 -05:00
Sean Keely 588e124c4e Ensure IPC imports always create an allocation map entry.
Simplifies behavior.  A memory type now either always generates an
entry or never does.

Change-Id: Ie98cddea01e801308ac0ba650795fdef92b7e47d


[ROCm/ROCR-Runtime commit: 0ba9b162db]
2022-05-09 13:46:16 -05:00
Sean Keely c96272841b Adjust include paths for new header locations.
Thunk and rocm_smi_lib paths have been updated.

Change-Id: If2948172f8064dd992cbccbc2a80f9161ad4d457


[ROCm/ROCR-Runtime commit: 752cfd5ffd]
2022-05-09 14:44:32 -04:00
Ranjith Ramakrishnan 416074aaac File Reorganization changes with backward compatibility
Wrapper header files and library soft links for backward compatibility
Install interface updated with /opt/rocm/include

Change-Id: If772b24320f9d1de90f9be0930b1f2aa1d073777


[ROCm/ROCR-Runtime commit: bb4da8545a]
2022-05-06 19:12:14 -04:00
Sean Keely 35ae610c0c Drop build dependency on DeviceLibs.
DeviceLibs is still needed but is found and included by clang now.

Change-Id: I03ff7dc91c028d2ee6747aa1779d223a9ba13915


[ROCm/ROCR-Runtime commit: 7f370dd84c]
2022-05-06 01:01:05 -04:00
Sean Keely 2b8c129efb Switch to CLOCK_BOOTTIME for HSA system clock.
This is consistent with KFD and has significantly better latency.
KFD is taking this as the definition of the SystemClockCounter.

Change-Id: I4c1b3bc58c738206265c55ebefd41356c013bfe5


[ROCm/ROCR-Runtime commit: 0ee82742a7]
2022-05-05 15:27:29 -04:00
David Yat Sin be1d3bef2d Remove unused variable
Change-Id: Ie29eb1cabef38c259280237c32d83aaa126e3b7a


[ROCm/ROCR-Runtime commit: cd0788938c]
2022-05-04 13:32:06 -04:00
Yifan Zhang a57d706974 add gfx1036 support
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: Ifc1b3cf2e46cf753f57470ebc6b034c1a349d3d2


[ROCm/ROCR-Runtime commit: 54c8b7900d]
2022-04-29 17:52:22 -04:00
Shweta Khatri 2a635aa54d Assemble trap handler at build time.
Eliminates the need for manually assembling the source of the
second level trap handler to produce the shader binary.  Also
separated blit shaders' binary source and version one second
level trap handler binary sources into different header files.

Change-Id: If29a18ee06dc083ec880ea962f234c6b5cac806a


[ROCm/ROCR-Runtime commit: 1b0440e7b3]
2022-04-28 20:14:14 -04:00
Jonathan Kim 495a3f233f Bypass HDP flush during SDMA copies on A+A GPU-CPU xGMI connections
Host to device SDMA copies do not require an HDP cache flush when
connected by xGMI since data copies over the data fabric and not HDP.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Sean Keely <sean.keely@amd.com>
Change-Id: I78d73a47edcc1a9c0ba59f33cf91485f13f1c45b


[ROCm/ROCR-Runtime commit: 658b053943]
2022-04-27 21:45:26 -04:00
Sean Keely cdf734c771 Minor typo fixes.
Declare the type of HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT
and add a missing break statement.

Change-Id: I86ce8a2e620438e046b60cee991ce1fbe07a3e88


[ROCm/ROCR-Runtime commit: 64dae113b1]
2022-04-26 15:51:22 -04:00
Sean Keely 761653fa00 Handle scratch interleave per SE for gfx10+
On gfx10+ we need to issue a minimum count of active lanes or
groups before ADC moves on.  Ensure that scratch allocations
attempt to reach this limit.

Occupancy throttling due to OOM condition may still drop below this
limit.

Change-Id: I0edf2e40fbe1a95e9a262564cebd2b6a82501a0b


[ROCm/ROCR-Runtime commit: 2eedf953f3]
2022-04-26 15:32:03 -04:00
Shweta Khatri 4effeb8f9f Fix heap-buffer-overflow error in Memory access test. Also reverted most of first array element from 0 to 1 changes.
Change-Id: I62dee9bab379210a322848132e2846dc153724d9


[ROCm/ROCR-Runtime commit: 539ec6a87d]
2022-04-21 12:09:58 -04:00
Jeremy Newton 9e346a1c58 Drop some unnecessary definitions
__x86_64__ and __AMD64__ should be already defined by the compiler to
specify the compilation target and shouldn't be defined manually.

I fixed two x86_64 checks to include VS variables, as removing this
might cause it to fail to compile on that compiler.

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I600ff449af85bf7d83ecab167d97933922e2d917


[ROCm/ROCR-Runtime commit: 178a7a5cfa]
2022-04-19 12:22:42 -04:00
Jeremy Newton feba682013 Use CMAKE_INSTALL_*
Instead of installing to lib or include, use CMAKE_INSTALL_LIBDIR and
CMAKE_INSTALL_INCLUDEDIR to allow the builder to override if desired.

The default LIBDIR should be "lib" to avoid breaking ROCm packaging, but
using GNUInstallDirs would use lib64 on RHEL. By setting a default value
prior to including GNUInstallDirs, we can always use "lib" unless the
builder explicitly overrides it via "-DCMAKE_INSTALL_LIBDIR", which is
typical in most distro scripts.

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I135f21bcfeb02b6849f6e8ca403b39c029a02d5c


[ROCm/ROCR-Runtime commit: ddf4edcafc]
2022-04-19 12:22:42 -04:00
Jeremy Newton 3d0b0fd774 Only default IMAGE_SUPPORT=ON for x86
Image support does not compile on other archectures, since it relies on
the x86 only header "x86intrin.h".

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I120d15870e74e20bd618e6f5da8c05e28fb1203b


[ROCm/ROCR-Runtime commit: a0931f4a3c]
2022-04-12 09:24:45 -04:00
Konstantin Zhuravlyov 625b1c99b3 Add code object v5 support
Change-Id: I03522765056e99ed49e6c5e213ee3753852de27b


[ROCm/ROCR-Runtime commit: 9265409f08]
2022-04-12 08:53:27 -04:00
Sean Keely 6622fe0163 Revert "Release host buffers after segment freeze."
This reverts commit cf3f441625.

Change-Id: Idc7e568b2b54a226dbe4d189b25a78be3bd16eea


[ROCm/ROCR-Runtime commit: b3caf6782b]
2022-04-11 20:43:07 -05:00
Sean Keely 16efad0cdc Correct inf loop defect in fast clock init.
Each time delay is grown we need to reset elapsed.  We want to take
the most accurate sample from the set at fixed delay.

Without this we will hang if there is ever an insufficiently accurate,
high unit clock read.

Change-Id: Ic65f364067789ac85a6572d67af2d77528e265bb


[ROCm/ROCR-Runtime commit: 4e9849034d]
2022-04-01 16:15:37 -04:00
Sean Keely cf3f441625 Release host buffers after segment freeze.
Release staging buffers after loading has completed.  The debugger
no longer uses this copy.

Change-Id: I46f36b50033bebe5a9ebc648b291d46f1d09b21d


[ROCm/ROCR-Runtime commit: 03a52655a8]
2022-03-23 23:53:02 -05:00
Sean Keely b7afebc27f Correct loader memory interfaces.
The loader must use internal interfaces to access page allocation
flags.  Code pages should also ensure use of cached memory.

Also relocate i-cache flush after code page copy.

Change-Id: I86d36243b6eebb1d46b991b372a5236baaf941ab


[ROCm/ROCR-Runtime commit: 048700f2e7]
2022-03-23 23:52:56 -05:00
Sean Keely f875298836 Correct queue error reporting.
VM faults should not report via the queue error handler.
The system event contains much more useful information.

Change-Id: I744d9b97b23334d7ed2c0f450111c1b8032567e3


[ROCm/ROCR-Runtime commit: fbc48521dc]
2022-03-23 23:37:53 -05:00
Sean Keely 60191a659b Ignore hive id for CPUs when selecting copy paths.
Hive ID is used during copy path selection to locate an optimal
pool of SDMA engines.  However, for CPU-GPU connections we always
want to use the host port facing engines, known generally as the
PCIe optimzed engines.  We want this selection even when the
connection is XGMI hence dropping the hive id for CPUs.

Change-Id: Iffe44174afecfc0bb3272b806fce549c930a49d9


[ROCm/ROCR-Runtime commit: af0f90800d]
2022-03-18 18:48:44 -05:00
Sean Keely 2be7abd7e1 Revert "add gfx1036 support"
Compiler is not promoted to mainline yet.

This reverts commit 7dcccdf452.

Change-Id: I7256aeb3698ee3ae640a9f457a929abe24d5ef17


[ROCm/ROCR-Runtime commit: 7e73760cd0]
2022-03-18 02:35:01 -05:00
Sean Keely 8c1fad3f12 Disable warnings as errors for rocrtst.
Change-Id: Ibe76c4c7f20fc0273dd02038477e7f9fc7800a3d


[ROCm/ROCR-Runtime commit: 7ab0d786c2]
2022-03-11 17:55:55 -05:00
Yifan Zhang 7dcccdf452 add gfx1036 support
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I075779b1369fde759c29572fa2027a3748d6ed4c


[ROCm/ROCR-Runtime commit: 2f97f17df9]
2022-03-05 13:16:19 +08:00
Sean Keely 47e1632188 Do not allow occupancy restriction on cooperative groups.
Excessive scratch allocations can normally trigger occupancy
reduction.  This breaks cooperative groups so if occupancy
reduction is required on a cooperative dispatch fail with OOM.

Change-Id: I64612a2e38bf1286f3b74c1c2a68ab0c85452771


[ROCm/ROCR-Runtime commit: 8a6954c63c]
2022-03-02 19:59:30 -06:00
Sean Keely c58913a8c8 Correct scratch allocation logic to account for asymmetric harvest.
With asym. harvest hw does not issue groups equally to each SE,
occasionally hw will skip an SE so that the distribution reflects
each SE's CU count.  Scratch resources must be allocated to reflect
this asymmetric distribution of groups.

Change-Id: I65e26206500483ea18e6e8796e65ecba5354b029


[ROCm/ROCR-Runtime commit: 552dcead93]
2022-03-02 19:59:30 -06:00
Sean Keely c196acd677 Do not bump up total scratch size for large cached allocations.
HW does not ignore low bits of the scratch wave count and will
stride beyond the end of the allocation if the wave count is
ever indivisible by SE count.  Rather than returning the allocation
size for cached large scratch allocations, use the requested
scratch size in scratch setup.  Scratch cache will retain the
cached allocation's size.

Change-Id: I0129ddc99a8940d01d8fbcd0b02d5061f31f456d


[ROCm/ROCR-Runtime commit: cedc3e80a8]
2022-03-02 20:48:19 -05:00
Saravanan Solaiyappan 66a81cc965 Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
in package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic766d8d68b5168e5f1b065d846ca2604d281e5be


[ROCm/ROCR-Runtime commit: a496adafaa]
2022-02-24 10:26:04 -05:00
Sean Keely 523e6e883a Do not discard fragment allocator blocks multiple times.
discardBlock may be called multiple times on the same block.
We must not discard the block multiple times or we will corrupt
in-use memory accounting.

Change-Id: Ife9f3162785965a795dcf81887d4d447cc096e62


[ROCm/ROCR-Runtime commit: b9a0c1d313]
2022-02-10 18:39:46 -06:00
Sean Keely 305b7394b3 Add fallback case for cache line size.
KFD sometimes returns 0 for cache line sizes.

Change-Id: If82de0068318bbc138f0d1d4692ff908359174ad


[ROCm/ROCR-Runtime commit: 266cd68524]
2022-02-10 18:39:46 -06:00
Sean Keely ab97440eba Retrieve cache line size from KFD topology.
Change-Id: I16ddd9d9888bb973eccf3c562619894c88c7df15


[ROCm/ROCR-Runtime commit: 21291b48c6]
2022-01-16 08:44:44 -06:00
Sean Keely 0e96cb895f Correct queue minimum size enforcement.
Minimum queue size was not enforced at the Agent level.  Minimum
size should be one page to give unifority across all asics.

Change-Id: I26394f79458d09fbceb79fc8aaf495e2c26a8ff3


[ROCm/ROCR-Runtime commit: a6742209f7]
2022-01-16 08:28:34 -06:00
Sean Keely 92f675889c Improve scratch error detection in debug mode.
Adds asserts for invalid dispatch dims and scratch requests that
don't actually use scratch.

Change-Id: I6e6eef3f17dc38adaf96550fa55bd8625868efa3


[ROCm/ROCR-Runtime commit: a65f3f5b71]
2022-01-31 20:53:24 -05:00
Sean Keely e2e10173d2 Add HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT.
On gfx90a only a reduced number of CUs must be used for cooperative
dispatches due to CWSR and launcher interactions with asymetric
harvest.  We must use one fewer CUs per SE than the lowest count of
CUs on any SE.

Also adds env var HSA_COOP_CU_COUNT which enables the cooperative
CU count computation.  Set to 1 to enable the new computation.
This is an opt-in feature that will become enabled by default (opt-out)
in a future release.

Change-Id: Ifbb75ced3bbc15876eef44922c6a4f6fde8c4c28


[ROCm/ROCR-Runtime commit: 37942c982a]
2022-01-31 15:22:07 -05:00
Chen Gong df788f1e49 Correct the gfx version of gfx90c to 90c
Corrections have been made in libhsakmt, and corresponding changes are required here as well.

Signed-off-by: Chen Gong <curry.gong@amd.com>
Change-Id: Ib697ce25278c2c5ac6ef0206930ec285f46c60d1


[ROCm/ROCR-Runtime commit: dec63b4f15]
2022-01-25 19:05:46 +08:00
Jeremy Newton f654b7d852 Install license file
See 

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I80e9664b5ade520d9bf9b9a20ac36d67cfe85107


[ROCm/ROCR-Runtime commit: bd1a4adf35]
2022-01-17 10:54:54 -05:00
David Yat Sin 4fb019555b Fix for segfault after removing PrefetchRange from map
The start iterator becomes invalid after it is removed from
std::map prefetch_map_. This was causing a segfault when the iterator is
incremented afterwards.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Change-Id: I4b0b763d2cb4ee99c0b8571c2c526b834e74077a


[ROCm/ROCR-Runtime commit: 86164fbfec]
2022-01-10 17:47:02 -05:00
Sean Keely ef1f4724c3 Correct documentation typo.
ROCM_VISIBLE_DEVICES was used where ROCR_VISIBLE_DEVICES was
intended.

Change-Id: I644a546f3c9dd0b50898ef8a21dbb8f5c3a36926


[ROCm/ROCR-Runtime commit: fce6ba052e]
2021-12-10 16:19:30 -06:00
Sean Keely 3227859ff2 Rework memory locks to allow device parallelism in alloc/free.
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel.  Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.

The fragment allocator now requires separate protection and is protected with a
mutex at the device level.  Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate.  This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim.  Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache.  So some device
level serialization is required in at least some paths.

Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00


[ROCm/ROCR-Runtime commit: df55cb0450]
2021-11-24 19:22:05 -06:00
Sean Keely e462118b6e Add comments to GetPcieBlit.
Comments call out the specific operation being selected since the
ternary nest is a bit hard to read.

Change-Id: If033dbaa6cba132e96196ad3fc6d5572042041f4


[ROCm/ROCR-Runtime commit: fc75731034]
2021-11-15 19:34:03 -06:00
Sean Keely 01c7c9856c Fix leak in hsa_amd_interop_map_buffer.
Agent temp array could have leaked if one of the given agent
handles was invalid.

Change-Id: I9e638b3a4f6bb917a4e3209ad81a1253bb603365


[ROCm/ROCR-Runtime commit: b198016949]
2021-11-15 19:22:20 -06:00
Sean Keely 289cc7b6b4 Correct order of argument check and default assignment in lock APIs.
Argument must be checked for nullptr before being dereferenced and
filled with the default return value.

Change-Id: I9ff366f066a5e18c78129bf59cc3ba00fca3ef18


[ROCm/ROCR-Runtime commit: f48a786662]
2021-11-15 19:22:02 -06:00
Sean Keely a7dc6d7802 Add missing return in ScopeGuard::operator=.
This omission did not cause problems earlier due to having not been
instanced.

Change-Id: I7a54f82e06c299902f3bf6b4d3737cc5e30961ad


[ROCm/ROCR-Runtime commit: 322588a60e]
2021-11-15 18:50:46 -06:00
Sean Keely c8bb2905d3 Correct node id assertion in pointer info.
Size of the node map was used as the max node id previously.  This
is wrong when RVD is used.

Change-Id: Ic632ec96891b92186e5b68cd53f81414db34f59f


[ROCm/ROCR-Runtime commit: 19454fcf26]
2021-11-10 22:09:24 -06:00
Sean Keely 0ed7eac560 Correct size of SVM node array.
Was size of the map.  Needs to be size of the node id range.

Change-Id: I92501ea7adca5c30dbb0fdabd2c421dea58f8d6f


[ROCm/ROCR-Runtime commit: c9eb85e205]
2021-11-10 21:23:42 -06:00
Sean Keely 847df17afe Include event_id in SDMA interrupt payload.
The event id assists KFD in locating the proper event associated
with the interrupt.

Change-Id: I75d58b6be74dd5b1edb0c5fe2b9d01538a649ba1


[ROCm/ROCR-Runtime commit: d65e00bcc5]
2021-11-10 20:57:11 -06:00