__x86_64__ and __AMD64__ should be already defined by the compiler to
specify the compilation target and shouldn't be defined manually.
I fixed two x86_64 checks to include VS variables, as removing this
might cause it to fail to compile on that compiler.
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I600ff449af85bf7d83ecab167d97933922e2d917
[ROCm/ROCR-Runtime commit: 178a7a5cfa]
Instead of installing to lib or include, use CMAKE_INSTALL_LIBDIR and
CMAKE_INSTALL_INCLUDEDIR to allow the builder to override if desired.
The default LIBDIR should be "lib" to avoid breaking ROCm packaging, but
using GNUInstallDirs would use lib64 on RHEL. By setting a default value
prior to including GNUInstallDirs, we can always use "lib" unless the
builder explicitly overrides it via "-DCMAKE_INSTALL_LIBDIR", which is
typical in most distro scripts.
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I135f21bcfeb02b6849f6e8ca403b39c029a02d5c
[ROCm/ROCR-Runtime commit: ddf4edcafc]
Image support does not compile on other archectures, since it relies on
the x86 only header "x86intrin.h".
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I120d15870e74e20bd618e6f5da8c05e28fb1203b
[ROCm/ROCR-Runtime commit: a0931f4a3c]
Each time delay is grown we need to reset elapsed. We want to take
the most accurate sample from the set at fixed delay.
Without this we will hang if there is ever an insufficiently accurate,
high unit clock read.
Change-Id: Ic65f364067789ac85a6572d67af2d77528e265bb
[ROCm/ROCR-Runtime commit: 4e9849034d]
Release staging buffers after loading has completed. The debugger
no longer uses this copy.
Change-Id: I46f36b50033bebe5a9ebc648b291d46f1d09b21d
[ROCm/ROCR-Runtime commit: 03a52655a8]
The loader must use internal interfaces to access page allocation
flags. Code pages should also ensure use of cached memory.
Also relocate i-cache flush after code page copy.
Change-Id: I86d36243b6eebb1d46b991b372a5236baaf941ab
[ROCm/ROCR-Runtime commit: 048700f2e7]
VM faults should not report via the queue error handler.
The system event contains much more useful information.
Change-Id: I744d9b97b23334d7ed2c0f450111c1b8032567e3
[ROCm/ROCR-Runtime commit: fbc48521dc]
Hive ID is used during copy path selection to locate an optimal
pool of SDMA engines. However, for CPU-GPU connections we always
want to use the host port facing engines, known generally as the
PCIe optimzed engines. We want this selection even when the
connection is XGMI hence dropping the hive id for CPUs.
Change-Id: Iffe44174afecfc0bb3272b806fce549c930a49d9
[ROCm/ROCR-Runtime commit: af0f90800d]
Compiler is not promoted to mainline yet.
This reverts commit 7dcccdf452.
Change-Id: I7256aeb3698ee3ae640a9f457a929abe24d5ef17
[ROCm/ROCR-Runtime commit: 7e73760cd0]
Excessive scratch allocations can normally trigger occupancy
reduction. This breaks cooperative groups so if occupancy
reduction is required on a cooperative dispatch fail with OOM.
Change-Id: I64612a2e38bf1286f3b74c1c2a68ab0c85452771
[ROCm/ROCR-Runtime commit: 8a6954c63c]
With asym. harvest hw does not issue groups equally to each SE,
occasionally hw will skip an SE so that the distribution reflects
each SE's CU count. Scratch resources must be allocated to reflect
this asymmetric distribution of groups.
Change-Id: I65e26206500483ea18e6e8796e65ecba5354b029
[ROCm/ROCR-Runtime commit: 552dcead93]
HW does not ignore low bits of the scratch wave count and will
stride beyond the end of the allocation if the wave count is
ever indivisible by SE count. Rather than returning the allocation
size for cached large scratch allocations, use the requested
scratch size in scratch setup. Scratch cache will retain the
cached allocation's size.
Change-Id: I0129ddc99a8940d01d8fbcd0b02d5061f31f456d
[ROCm/ROCR-Runtime commit: cedc3e80a8]
Include the upgrade operation check in the prerm and postun scripts
in package.
Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic766d8d68b5168e5f1b065d846ca2604d281e5be
[ROCm/ROCR-Runtime commit: a496adafaa]
discardBlock may be called multiple times on the same block.
We must not discard the block multiple times or we will corrupt
in-use memory accounting.
Change-Id: Ife9f3162785965a795dcf81887d4d447cc096e62
[ROCm/ROCR-Runtime commit: b9a0c1d313]
Minimum queue size was not enforced at the Agent level. Minimum
size should be one page to give unifority across all asics.
Change-Id: I26394f79458d09fbceb79fc8aaf495e2c26a8ff3
[ROCm/ROCR-Runtime commit: a6742209f7]
Adds asserts for invalid dispatch dims and scratch requests that
don't actually use scratch.
Change-Id: I6e6eef3f17dc38adaf96550fa55bd8625868efa3
[ROCm/ROCR-Runtime commit: a65f3f5b71]
On gfx90a only a reduced number of CUs must be used for cooperative
dispatches due to CWSR and launcher interactions with asymetric
harvest. We must use one fewer CUs per SE than the lowest count of
CUs on any SE.
Also adds env var HSA_COOP_CU_COUNT which enables the cooperative
CU count computation. Set to 1 to enable the new computation.
This is an opt-in feature that will become enabled by default (opt-out)
in a future release.
Change-Id: Ifbb75ced3bbc15876eef44922c6a4f6fde8c4c28
[ROCm/ROCR-Runtime commit: 37942c982a]
Corrections have been made in libhsakmt, and corresponding changes are required here as well.
Signed-off-by: Chen Gong <curry.gong@amd.com>
Change-Id: Ib697ce25278c2c5ac6ef0206930ec285f46c60d1
[ROCm/ROCR-Runtime commit: dec63b4f15]
The start iterator becomes invalid after it is removed from
std::map prefetch_map_. This was causing a segfault when the iterator is
incremented afterwards.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Change-Id: I4b0b763d2cb4ee99c0b8571c2c526b834e74077a
[ROCm/ROCR-Runtime commit: 86164fbfec]
ROCM_VISIBLE_DEVICES was used where ROCR_VISIBLE_DEVICES was
intended.
Change-Id: I644a546f3c9dd0b50898ef8a21dbb8f5c3a36926
[ROCm/ROCR-Runtime commit: fce6ba052e]
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel. Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.
The fragment allocator now requires separate protection and is protected with a
mutex at the device level. Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate. This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim. Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache. So some device
level serialization is required in at least some paths.
Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00
[ROCm/ROCR-Runtime commit: df55cb0450]
Comments call out the specific operation being selected since the
ternary nest is a bit hard to read.
Change-Id: If033dbaa6cba132e96196ad3fc6d5572042041f4
[ROCm/ROCR-Runtime commit: fc75731034]
Agent temp array could have leaked if one of the given agent
handles was invalid.
Change-Id: I9e638b3a4f6bb917a4e3209ad81a1253bb603365
[ROCm/ROCR-Runtime commit: b198016949]
Argument must be checked for nullptr before being dereferenced and
filled with the default return value.
Change-Id: I9ff366f066a5e18c78129bf59cc3ba00fca3ef18
[ROCm/ROCR-Runtime commit: f48a786662]
This omission did not cause problems earlier due to having not been
instanced.
Change-Id: I7a54f82e06c299902f3bf6b4d3737cc5e30961ad
[ROCm/ROCR-Runtime commit: 322588a60e]
Size of the node map was used as the max node id previously. This
is wrong when RVD is used.
Change-Id: Ic632ec96891b92186e5b68cd53f81414db34f59f
[ROCm/ROCR-Runtime commit: 19454fcf26]
Was size of the map. Needs to be size of the node id range.
Change-Id: I92501ea7adca5c30dbb0fdabd2c421dea58f8d6f
[ROCm/ROCR-Runtime commit: c9eb85e205]
The event id assists KFD in locating the proper event associated
with the interrupt.
Change-Id: I75d58b6be74dd5b1edb0c5fe2b9d01538a649ba1
[ROCm/ROCR-Runtime commit: d65e00bcc5]
This really should be set to conform to distro standards.
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I8c3bdcc7eb103cec9db6aa9f9cfec25754784be8
[ROCm/ROCR-Runtime commit: 48e4e2c5ff]
On gcc-10.3.0 environment, hsa-runtime building is failed as below log:
compute/hsa/runtime/rocrtst/suites/negative/queue_validation.cc:470:18: error: conversion from ‘unsigned int’ to ‘uint16_t’ {aka ‘short unsigned int’} changes value from ‘4294967295’ to ‘65535’ [-Werror=overflow]
470 | aql().header |= 0xFFFFFFFF << HSA_PACKET_HEADER_TYPE;
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make[2]: *** [CMakeFiles/rocrtst64.dir/build.make:339: CMakeFiles/rocrtst64.dir/home/aaliu/work/compute/hsa/runtime/rocrtst/suites/negative/queue_validation.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I95fe72030368abc211b4b97b5a7ba00b5e094730
[ROCm/ROCR-Runtime commit: f2a50c34f9]
GetGlobalMemoryPool had improper return codes for an iterator callback
and did not properly order the APU pool selection path.
Change-Id: I01ab9d23e2352be98d9718bc25889ad4f779d3ca
[ROCm/ROCR-Runtime commit: 534dc3f60c]
Clang warns about bitwise operators on bools. Cast to int silences
the warning without introducing short circut logic.
Change-Id: I6e25138e1acf4a5562d3925ea5b2fcef3addb783
[ROCm/ROCR-Runtime commit: 4b0c94cfe8]
Would be nice to get warning count changes highlighted in CI though.
Clang's increasingly suspect diagnostics has caused multiple build
breaks without highlighting any actual issues.
Also: https://embeddedartistry.com/blog/2017/05/22/werror-is-not-your-friend/
Change-Id: I7dc82da58cd86f7b4f1a9fb511c4c039419271d4
[ROCm/ROCR-Runtime commit: efeee734db]
Limits CU masking application to cases where it is explicitly requested.
Change-Id: Ib65ad0ac98f86d840c0328fa15ce40c05cd4bfae
[ROCm/ROCR-Runtime commit: 5e8d261352]
Due to a CPACK bug the package needs to remove header file
symlinks. Cleanup is required for uninstall and upgrade
since each release installs to a different folder.
Change-Id: I5ec378b21e69235404781c7bce3c0203eb38eed1
[ROCm/ROCR-Runtime commit: ca899ea429]
KFD topology has been corrected and the defaults used by this
workaround are no longer true for all chips.
Change-Id: I0242d8077e9666ed1cf0dc3985244258ae5c0924
[ROCm/ROCR-Runtime commit: 19c1e92b4c]
For APU asics, the default configuration size of video memory is
relatively small, plus the reserved region, ratio of max alloc size to
the pool size may below the expected value, so adjust it.
Change-Id: I798b44d9532aa6a381a1cc19faa5a46110bf0ad6
[ROCm/ROCR-Runtime commit: df59bfd57b]
Early exit if the range is found to be fine grain. Indeterminate
should only apply if the range is neither coarse nor fine.
Change-Id: I54133e14f4e8cfa53e2d612f6112cdcdb5a47dfa
[ROCm/ROCR-Runtime commit: a2fb1cbfbc]