Commit Graph

2930 Commits

Author SHA1 Message Date
Sean Keely 16efad0cdc Correct inf loop defect in fast clock init.
Each time delay is grown we need to reset elapsed.  We want to take
the most accurate sample from the set at fixed delay.

Without this we will hang if there is ever an insufficiently accurate,
high unit clock read.

Change-Id: Ic65f364067789ac85a6572d67af2d77528e265bb


[ROCm/ROCR-Runtime commit: 4e9849034d]
2022-04-01 16:15:37 -04:00
Sean Keely cf3f441625 Release host buffers after segment freeze.
Release staging buffers after loading has completed.  The debugger
no longer uses this copy.

Change-Id: I46f36b50033bebe5a9ebc648b291d46f1d09b21d


[ROCm/ROCR-Runtime commit: 03a52655a8]
2022-03-23 23:53:02 -05:00
Sean Keely b7afebc27f Correct loader memory interfaces.
The loader must use internal interfaces to access page allocation
flags.  Code pages should also ensure use of cached memory.

Also relocate i-cache flush after code page copy.

Change-Id: I86d36243b6eebb1d46b991b372a5236baaf941ab


[ROCm/ROCR-Runtime commit: 048700f2e7]
2022-03-23 23:52:56 -05:00
Sean Keely f875298836 Correct queue error reporting.
VM faults should not report via the queue error handler.
The system event contains much more useful information.

Change-Id: I744d9b97b23334d7ed2c0f450111c1b8032567e3


[ROCm/ROCR-Runtime commit: fbc48521dc]
2022-03-23 23:37:53 -05:00
Felix Kuehling 8fc6558236 libhsakmt: Update kfd_ioctl.h
Import the latest version from the kernel tree.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If5f998ad55085ebd5020adaa382181204d834e3e


[ROCm/ROCR-Runtime commit: f88aaa933b]
2022-03-21 14:41:18 -04:00
Sean Keely 60191a659b Ignore hive id for CPUs when selecting copy paths.
Hive ID is used during copy path selection to locate an optimal
pool of SDMA engines.  However, for CPU-GPU connections we always
want to use the host port facing engines, known generally as the
PCIe optimzed engines.  We want this selection even when the
connection is XGMI hence dropping the hive id for CPUs.

Change-Id: Iffe44174afecfc0bb3272b806fce549c930a49d9


[ROCm/ROCR-Runtime commit: af0f90800d]
2022-03-18 18:48:44 -05:00
Sean Keely 2be7abd7e1 Revert "add gfx1036 support"
Compiler is not promoted to mainline yet.

This reverts commit 7dcccdf452.

Change-Id: I7256aeb3698ee3ae640a9f457a929abe24d5ef17


[ROCm/ROCR-Runtime commit: 7e73760cd0]
2022-03-18 02:35:01 -05:00
Sean Keely 8c1fad3f12 Disable warnings as errors for rocrtst.
Change-Id: Ibe76c4c7f20fc0273dd02038477e7f9fc7800a3d


[ROCm/ROCR-Runtime commit: 7ab0d786c2]
2022-03-11 17:55:55 -05:00
Alex Sierra d2864edc69 kfdtest: remove log message at hsaKmtSVMSetAttr failure
This error messages should be handled by the caller.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I68d879d6d41835f47b8ac138c2218eaa6b86a512


[ROCm/ROCR-Runtime commit: dc33a092c0]
2022-03-08 12:15:59 -06:00
Yifan Zhang 7dcccdf452 add gfx1036 support
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I075779b1369fde759c29572fa2027a3748d6ed4c


[ROCm/ROCR-Runtime commit: 2f97f17df9]
2022-03-05 13:16:19 +08:00
Sean Keely 47e1632188 Do not allow occupancy restriction on cooperative groups.
Excessive scratch allocations can normally trigger occupancy
reduction.  This breaks cooperative groups so if occupancy
reduction is required on a cooperative dispatch fail with OOM.

Change-Id: I64612a2e38bf1286f3b74c1c2a68ab0c85452771


[ROCm/ROCR-Runtime commit: 8a6954c63c]
2022-03-02 19:59:30 -06:00
Sean Keely c58913a8c8 Correct scratch allocation logic to account for asymmetric harvest.
With asym. harvest hw does not issue groups equally to each SE,
occasionally hw will skip an SE so that the distribution reflects
each SE's CU count.  Scratch resources must be allocated to reflect
this asymmetric distribution of groups.

Change-Id: I65e26206500483ea18e6e8796e65ecba5354b029


[ROCm/ROCR-Runtime commit: 552dcead93]
2022-03-02 19:59:30 -06:00
Sean Keely c196acd677 Do not bump up total scratch size for large cached allocations.
HW does not ignore low bits of the scratch wave count and will
stride beyond the end of the allocation if the wave count is
ever indivisible by SE count.  Rather than returning the allocation
size for cached large scratch allocations, use the requested
scratch size in scratch setup.  Scratch cache will retain the
cached allocation's size.

Change-Id: I0129ddc99a8940d01d8fbcd0b02d5061f31f456d


[ROCm/ROCR-Runtime commit: cedc3e80a8]
2022-03-02 20:48:19 -05:00
Mukul Joshi a01f9f6a61 libhsakmt: Update context save area size calculations
Currently, context save area size passed to KFD includes the
size of the debug area. Change this to report the actual size
of the context save area to KFD.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I5d440ae802255a97ade046775f6a000bae79d5d5


[ROCm/ROCR-Runtime commit: b8dc875b3c]
2022-03-02 15:28:38 -05:00
Saravanan Solaiyappan 2325ccba30 Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
in package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ib95ea72f15bfbf4141b69b0a8ca4d3a71fe1c093


[ROCm/ROCR-Runtime commit: 046f2e9116]
2022-02-24 12:01:39 -05:00
Saravanan Solaiyappan 66a81cc965 Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
in package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic766d8d68b5168e5f1b065d846ca2604d281e5be


[ROCm/ROCR-Runtime commit: a496adafaa]
2022-02-24 10:26:04 -05:00
Sean Keely 523e6e883a Do not discard fragment allocator blocks multiple times.
discardBlock may be called multiple times on the same block.
We must not discard the block multiple times or we will corrupt
in-use memory accounting.

Change-Id: Ife9f3162785965a795dcf81887d4d447cc096e62


[ROCm/ROCR-Runtime commit: b9a0c1d313]
2022-02-10 18:39:46 -06:00
Sean Keely 305b7394b3 Add fallback case for cache line size.
KFD sometimes returns 0 for cache line sizes.

Change-Id: If82de0068318bbc138f0d1d4692ff908359174ad


[ROCm/ROCR-Runtime commit: 266cd68524]
2022-02-10 18:39:46 -06:00
Lang Yu ed964ceadf libhsakmt: Add another pci device id for cyan skillfish
Add PCI DID for cyan skillfish.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: I1d06936cccdf99af76fe5ca3ff323538fac76c9c


[ROCm/ROCR-Runtime commit: 052b7957ea]
2022-01-27 01:41:00 -05:00
Aaron Liu 90f60da2c8 libhsakmt: correct the gfx version for gfx90c
The gfx version of gfx90c is 90C instead of 902.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: Id009c9357f816b8ccab605090df47626f1a579ef


[ROCm/ROCR-Runtime commit: 7cdf38f6c0]
2022-01-26 01:25:58 -05:00
Sean Keely ab97440eba Retrieve cache line size from KFD topology.
Change-Id: I16ddd9d9888bb973eccf3c562619894c88c7df15


[ROCm/ROCR-Runtime commit: 21291b48c6]
2022-01-16 08:44:44 -06:00
Sean Keely 0e96cb895f Correct queue minimum size enforcement.
Minimum queue size was not enforced at the Agent level.  Minimum
size should be one page to give unifority across all asics.

Change-Id: I26394f79458d09fbceb79fc8aaf495e2c26a8ff3


[ROCm/ROCR-Runtime commit: a6742209f7]
2022-01-16 08:28:34 -06:00
Sean Keely 92f675889c Improve scratch error detection in debug mode.
Adds asserts for invalid dispatch dims and scratch requests that
don't actually use scratch.

Change-Id: I6e6eef3f17dc38adaf96550fa55bd8625868efa3


[ROCm/ROCR-Runtime commit: a65f3f5b71]
2022-01-31 20:53:24 -05:00
Sean Keely e2e10173d2 Add HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT.
On gfx90a only a reduced number of CUs must be used for cooperative
dispatches due to CWSR and launcher interactions with asymetric
harvest.  We must use one fewer CUs per SE than the lowest count of
CUs on any SE.

Also adds env var HSA_COOP_CU_COUNT which enables the cooperative
CU count computation.  Set to 1 to enable the new computation.
This is an opt-in feature that will become enabled by default (opt-out)
in a future release.

Change-Id: Ifbb75ced3bbc15876eef44922c6a4f6fde8c4c28


[ROCm/ROCR-Runtime commit: 37942c982a]
2022-01-31 15:22:07 -05:00
Chen Gong df788f1e49 Correct the gfx version of gfx90c to 90c
Corrections have been made in libhsakmt, and corresponding changes are required here as well.

Signed-off-by: Chen Gong <curry.gong@amd.com>
Change-Id: Ib697ce25278c2c5ac6ef0206930ec285f46c60d1


[ROCm/ROCR-Runtime commit: dec63b4f15]
2022-01-25 19:05:46 +08:00
Jeremy Newton f654b7d852 Install license file
See 

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I80e9664b5ade520d9bf9b9a20ac36d67cfe85107


[ROCm/ROCR-Runtime commit: bd1a4adf35]
2022-01-17 10:54:54 -05:00
Eric Huang e007b37f6e kfdtest: dynamically increase timeout for P2PBandWidthTest
Incease more timeout according to peers number to pass the
test on some PCIe link platforms.

Change-Id: Ifcb8c7297d6960c96fc18d29bc0a48733ca50165
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>


[ROCm/ROCR-Runtime commit: 7c62a12918]
2022-01-11 11:01:11 -05:00
David Yat Sin 4fb019555b Fix for segfault after removing PrefetchRange from map
The start iterator becomes invalid after it is removed from
std::map prefetch_map_. This was causing a segfault when the iterator is
incremented afterwards.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Change-Id: I4b0b763d2cb4ee99c0b8571c2c526b834e74077a


[ROCm/ROCR-Runtime commit: 86164fbfec]
2022-01-10 17:47:02 -05:00
David Yat Sin 0f5d0a9c3f libhsakmt:Add MADV_DONTFORK to device mappings
Mapped memory areas become invalid after fork, and the child process is
required to remap the memory areas after a fork. So we mark these device
memory mappings with MADV_DONTFORK so that they are removed from the
child process after fork.

This was causing some issues when doing CRIU checkpoint/restore because
CRIU and amdgpu_plugin were not able to handle these mappings.

Change-Id: I50eb334aecea6dab7522d94da0273adcf4fb1ce0
Signed-off-by: David Yat Sin <david.yatsin@amd.com>


[ROCm/ROCR-Runtime commit: 4986f4a5c2]
2022-01-10 16:25:16 -05:00
Ruili Ji 4abf6241ae kfdtest : adjust memory size for KFDMemoryTest.
Total VRAM size on APU is 512M usually,
Framebuffer also is allocated from VRAM.
There is no enough memory for this case.

/home/ruiliji2/p5/libhsakmt/tests/kfdtest/src/KFDMemoryTest.cpp:1285: Failure
Value of: (hsaKmtMapMemoryToGPUNodes(bufs[i], bufSize, &altVa, mapFlags, 1, &defaultGPUNode))
[  FAILED  ] KFDMemoryTest.MMBench (1034 ms)

Change-Id: Ib4201291122d85f6512a85859aea9a4713fb4f5c
(cherry picked from commit a9f924484e7022a2d53ee02811b080f0833eba55)


[ROCm/ROCR-Runtime commit: 0340c68031]
2022-01-09 20:52:11 -05:00
Yang Wang c26bbaa521 kfdtest: skip hdp flush test in sriov mode
skip HDP flush test when remap feature is not supported.

Backgroud:
the HDP register remap is skipped in sriov mode,
it will cause mmio base is nullPtr.

Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ib9aea1900931e30571656397a485ee4db051ec0a


[ROCm/ROCR-Runtime commit: 033b52c4e4]
2021-12-20 20:00:43 +08:00
Sean Keely ef1f4724c3 Correct documentation typo.
ROCM_VISIBLE_DEVICES was used where ROCR_VISIBLE_DEVICES was
intended.

Change-Id: I644a546f3c9dd0b50898ef8a21dbb8f5c3a36926


[ROCm/ROCR-Runtime commit: fce6ba052e]
2021-12-10 16:19:30 -06:00
Alex Sierra 2ce2ce8229 kfdtest: free user ptr buffer at SetGetAttributesTest
Explicitly free the user buffer ptr before test's tear down. Otherwise
the svm_bo object will never be released, causing a BUG error. Due to
a late callback to svm_migrate_page_free when prange not longer exist.

Also did cosmetic adjustments.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I989c62de8a9634faa84e42def956cecb3f84e329


[ROCm/ROCR-Runtime commit: 2dbee30232]
2021-12-09 18:22:20 -06:00
Joseph Greathouse c60cb043e6 Correct gfx90c gfx arch number in HSA topology
The AMD compiler team has confirmed that they expect gfx90c
to be gfx90c, with a major/minor/stepping of 9, 0, and 12
respectively. It appears that there is a typo in the libhsakmt
topology information that lists this part as gfx902. This patch
fixes the issue.

Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Change-Id: I6f907a7aa6f190b12aba8bb4210c7b341b3c720b


[ROCm/ROCR-Runtime commit: a06d1a3884]
2021-12-03 13:11:26 -05:00
Jeremy Newton 22a9a73290 Just install license into /opt/rocm*/share/doc
This is causing issues with side by side, sorry for the noise.

This license location isn't ideal but it's good enough for now.

Change-Id: Iba2a84cedf22466fdaaf3c63b6ea49c9fc277967
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: 3f90750304]
2021-12-02 10:04:51 -05:00
Jeremy Newton ae48b90895 Add Makefile to gitignore
Calling cmake replaces this file, so no need to commit it.

Change-Id: Ic4747cc9eebd9cbfc61d524a31d2025c04eda12e
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: 3b64517787]
2021-11-30 17:29:37 -05:00
Jeremy Newton fe6f3d8487 Fix side-by-side copyright file
The copyright file will conflict if multiple thunks are installed. This
should resolve the issue by adding the version to the install path.

Change-Id: Ieac5a3eba979b3e934fb9100f890b92fc7c35d71
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: 348a3613d6]
2021-11-25 15:59:18 -05:00
Sean Keely 3227859ff2 Rework memory locks to allow device parallelism in alloc/free.
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel.  Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.

The fragment allocator now requires separate protection and is protected with a
mutex at the device level.  Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate.  This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim.  Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache.  So some device
level serialization is required in at least some paths.

Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00


[ROCm/ROCR-Runtime commit: df55cb0450]
2021-11-24 19:22:05 -06:00
Jeremy Newton b93ee2fe7a Fix packaging of license file
CPACK doesn't have proper logic for installing the license as described
by CPACK_RESOURCE_FILE_LICENSE.

For Debian packaging, the license is expected to be installed as:

/usr/share/doc/PACKAGENAME/copyright

To do this, I've added a bit of logic for CPACK to copy this into the
package using CPACK_INSTALL_COMMANDS to prep the directory, and
CPACK_INSTALLED_DIRECTORIES to add it to the package. This applies to
both RPM and DEB, so I've added some logic to the spec file to exclude
this file (note that CPACK_RPM_EXCLUDE_FROM_AUTO_FILELIST_ADDITION does
not work for files installed with CPACK_INSTALLED_DIRECTORIES).

For RPM install, I've just added a small bit of logic to the spec file
to handle it. The file needs to be copied into the spec working
directory, then a macro is used to handle the rest. Note the license
macro does not work on EL6, but I don't think we want to support this.

Change-Id: I06ce63d300419893cb8274bc504a15633e304d91
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: 7649cd862e]
2021-11-18 16:41:48 -05:00
Jeremy Newton c0397d4a44 Fix to previous commit
I used the binary directory instead of the source directory to specify
the spec.in path, which passed local testing since these directories
are in the same location. This is not guarenteed to be true.

Change-Id: I1b49ca8453b9c074a947104c26fb39667d728a8e
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: 529c96c08b]
2021-11-17 17:47:53 -05:00
Jeremy Newton 04144c6d0d Implement RPM recommends for libdrm-amdgpu
CPack does not support recommends for RPM generation, so I've generated
a template RPM SPEC files in order to make modifications to allow for
support of recommends.

The spec.in file was generated using the cpack option
"CPACK_RPM_GENERATE_USER_BINARY_SPECFILE_TEMPLATE" and was modified very
sparingly to avoid any maintanance burden, e.g. can be easily
regenerated. The CPACK_RPM_USER_BINARY_SPECFILE is then used to specify
the customized template file, instead of using the cmake's template.

From what I understand, the point of these two options is to allow
developers to tailor the specfile to their desire, since rpm spec files
are much more advanced then the equilivent debian file.

Change-Id: I80c69be58a3c57729ed997fd2ce01f5d16b9e9b9
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: 3c0e4fee0f]
2021-11-17 11:57:55 -05:00
Jeremy Newton 5d8bcd03db Use recommends for libdrm-amdgpu-amdgpu1
For the use of libdrm-amdgpu-amdgpu1 and libdrm-amdgpu, we should use
recommends, as we want these packages installed with a strong dependency
but avoid a strict dependency, since this is enhancement feature.

Using the newer libdrm, which is build for amdgpu-dkms, is ideal since
it will produce more correct marketing names, but should not be mandated
due to two reasons:
- A user may not want to install both libdrms on their system
- The system might not have the newer libdrm available

This patch only fixes the Ubuntu/debian package since recommends is not
properly implemented for the RPM generator for CPACK. For now,
"suggests" will have to do, since it's the closest option we have. I
will investigate if we can get around this issue.

Change-Id: I33a90c3ead235bbbe265238c026933688ea63fe3
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: 86c27a7af8]
2021-11-16 11:08:40 -05:00
Sean Keely e462118b6e Add comments to GetPcieBlit.
Comments call out the specific operation being selected since the
ternary nest is a bit hard to read.

Change-Id: If033dbaa6cba132e96196ad3fc6d5572042041f4


[ROCm/ROCR-Runtime commit: fc75731034]
2021-11-15 19:34:03 -06:00
Sean Keely 01c7c9856c Fix leak in hsa_amd_interop_map_buffer.
Agent temp array could have leaked if one of the given agent
handles was invalid.

Change-Id: I9e638b3a4f6bb917a4e3209ad81a1253bb603365


[ROCm/ROCR-Runtime commit: b198016949]
2021-11-15 19:22:20 -06:00
Sean Keely 289cc7b6b4 Correct order of argument check and default assignment in lock APIs.
Argument must be checked for nullptr before being dereferenced and
filled with the default return value.

Change-Id: I9ff366f066a5e18c78129bf59cc3ba00fca3ef18


[ROCm/ROCR-Runtime commit: f48a786662]
2021-11-15 19:22:02 -06:00
Sean Keely a7dc6d7802 Add missing return in ScopeGuard::operator=.
This omission did not cause problems earlier due to having not been
instanced.

Change-Id: I7a54f82e06c299902f3bf6b4d3737cc5e30961ad


[ROCm/ROCR-Runtime commit: 322588a60e]
2021-11-15 18:50:46 -06:00
Kent Russell 17e97b8757 Revert "CMakeLists: Fix libdrm-amdgpu dependencies"
This reverts commit af55f02fab.

Reason for revert: Infra still not ready for it yet

Change-Id: I03e043c1ca7924264e3e70e3e82c73b4efc2ae75


[ROCm/ROCR-Runtime commit: e842d7f480]
2021-11-12 14:30:04 -05:00
Sean Keely c8bb2905d3 Correct node id assertion in pointer info.
Size of the node map was used as the max node id previously.  This
is wrong when RVD is used.

Change-Id: Ic632ec96891b92186e5b68cd53f81414db34f59f


[ROCm/ROCR-Runtime commit: 19454fcf26]
2021-11-10 22:09:24 -06:00
Sean Keely 0ed7eac560 Correct size of SVM node array.
Was size of the map.  Needs to be size of the node id range.

Change-Id: I92501ea7adca5c30dbb0fdabd2c421dea58f8d6f


[ROCm/ROCR-Runtime commit: c9eb85e205]
2021-11-10 21:23:42 -06:00
Sean Keely 847df17afe Include event_id in SDMA interrupt payload.
The event id assists KFD in locating the proper event associated
with the interrupt.

Change-Id: I75d58b6be74dd5b1edb0c5fe2b9d01538a649ba1


[ROCm/ROCR-Runtime commit: d65e00bcc5]
2021-11-10 20:57:11 -06:00