Environment variable HSA_HIGH_PRECISION_MODE can be used to control MFMA
precision
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ib78dd9dd8867025e090a3cca96ab6db4f65dea12
[ROCm/ROCR-Runtime commit: 2a64fa5e06]
For debian use cases, package conflict is required to remove the
deprecated package during package upgrade Also removed the duplicate
setting of package obseletes in RPM usecase.
[ROCm/ROCR-Runtime commit: 3be9c49b63]
- When waiting on non-interrupt signals, do not uSleep. This causes
regressions compared to interrupt signal usage.
- Cleanup code.
Change-Id: I706bda0b13e64ffec0b607c1915d8380a2ce0dea
[ROCm/ROCR-Runtime commit: 890399a7cf]
Set underlying type of hsa_region_info_t, hsa_amd_region_info_t
to int.
Change-Id: Ibf97a025eec6176d8e28af8009e9bd6795ca061f
[ROCm/ROCR-Runtime commit: 166b08346b]
Update rocm_ci_caller.yml to use amd-master , until amd-mainline is aligned
Signed-off-by: Choudhary, Rahul <Rahul.Choudhary@amd.com>
[ROCm/ROCR-Runtime commit: 16cd712685]
BUILD_SHARED_LIBS is a global flag so we don't need to set a default
option for it in both libhsakmt and hsa-runtime, only the top level
CMakeLists file. Also updated README to reflect that libhsakmt is
always built statically and gets linked to libhsa-runtime.
Change-Id: I1511f68a268032bec9758bc731d8074f33ec980f
[ROCm/ROCR-Runtime commit: ff01f62777]
Convert test to use multi-GPU framework.
Add mutex to fix intermixed log issue and annotate logging with
gpu node number.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ic2beeadb1eb4b5a9a0710ac1dbd60b9bf1d84c33
[ROCm/ROCR-Runtime commit: f24d789dee]
"s_waitcnt 0" (deprecated in gfx12) is redundant here.
s_endpgm will wait for all outstanding instructions
to complete before executing.
Change-Id: Ia8b4dd0fd8dd713e7ba2cba9db85b7b12cee1dd4
Signed-off-by: Lang Yu <lang.yu@amd.com>
[ROCm/ROCR-Runtime commit: d159b29dc6]
Since GFX950 can support page table fragment up to 18 without
performance loss. So set GFX950 default svm.alignment_order to 18.
Change-Id: Ibcdb7f041fb07a38e924c471beec261ea227ca1d
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 9509af4b98]
This patch creates the blacklist for gfx950 by copying gfx942 but adding
KFDGWSTest.Semaphore as GWS support is completely removed from gfx950.
Change-Id: I5d7c17e57b8cfd9fae63780ecc9dd55662cfdade
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 0b6e457201]
Make sure to use allocate the same amount of size for VGPR data in
gfx950 as it is done for gfx940.
Change-Id: I6a0820996389627ccbdfef856e5150c46fac92a1
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
[ROCm/ROCR-Runtime commit: 76052ba028]
Added HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving. Note that all features of 3D images are supported
with 2D swizzles,it's just that the access patterns are different
and therefore cache hit-rates may be better or worse, depending
on how it's used. Volumetric algorithms do better with 3D and apps
that tend to access a single slice at a time do better with 2D.
Change-Id: Id8574a6710fe4333a1ee331e5ce9195a81434198
[ROCm/ROCR-Runtime commit: 6361466baa]
Replaces WaitAny with WaitMultiple to more closely align with the
underlying driver API for waiting on multiple events.
WaitMultiple adds a single parameter, wait_on_all, to the WaitAny
interface providing a single function for waiting on multiple
events when we only need AND and OR semantics for the signal
checking logic.
Change-Id: I68a4a45d48151d9d69aef02fd8f7263b9e6c0e75
[ROCm/ROCR-Runtime commit: 8a38f121ea]
The CWSR area size needs to take into account the size of LDS each
active workgroup can have. The current implementation uses a constant
for that. This patch refactors this to use the HsaNodeProperties of the
device's the CWSR area is for to figure out the size of LDS.
Change-Id: Ib8585b2b7140ec5c99e7b7d62e67f785697c028a
Signed-off-by: Lancelot Six <Lancelot.Six@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: c51aa0d155]
Set priority to maximum for signal event handler and minimum for
exceptions event handler.
Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc
[ROCm/ROCR-Runtime commit: 7ea25ebb85]
Generalize the driver discovery and move driver-specific
functionality to the concrete driver implementations.
Currently, this process is tightly coupled to the hsakmt
which is GPU and OS specific.
Change-Id: Ie1c53fef407a71b5ec4c6eaf3a3ed00871184409
[ROCm/ROCR-Runtime commit: 15107afb11]
This reverts commit 5a8092bccf.
Reason for revert: This will put back the change ID - Id1154f08f6ba21c633905fd46b06053994d6f3cc to ROCR repo, which will prevent memory allocations from being automatically granted the 'executable' flag, addressing previously - incorrect and unsafe behavior in ROCm driver.
Change-Id: I3d45c45859929a80f7791681b411251e099a1901
[ROCm/ROCR-Runtime commit: 2d4a578020]
local variable 'counter_id' exceeded the max single
use of stack, thus move to heap to prevent overflow
also, use of a contiguous memory block for 2D array
to reduce space complexity, add error messages for
NO_MEMORY exits and check MAX_COUNTER limit for IDs
Change-Id: Id0249ca767a336b31c759c693a82d3f5c950a2fa
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
[ROCm/ROCR-Runtime commit: ecf57310ca]
HW_REG_HW_ID1 is only available from gfx12 onwards
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ibf4bd62e01ada3dee6dd88762ccb853bab63ff87
[ROCm/ROCR-Runtime commit: 1d71975fcc]
Add gfx12 so that it gets tested when KFDASMTest.AssembleShaders is run.
GWS support has been removed for gfx12. Modify shaders to take that into
account.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I70e87febb6388852ea54d69cf9201339a7910581
[ROCm/ROCR-Runtime commit: f8ae5c47ba]
The recent static initialization changes cause this clean up to
happen when it previously never did. The result of ~RuntimeCleanup()
being executed is that the static global "loaded_" is set to false,
which in turn prevents hsa_init() from executing again. Clean up
already happens when hsa_shut_down() occurs.
Change-Id: Ib5cefb80d82880c1945e04eb6ec246bc2c7d2324
[ROCm/ROCR-Runtime commit: b1d6cacf79]
1, Initialize the registers before using them is the best practice.
Though the use case here doesn't care whether the registers are
initialized or not, some emulators complain the "read_before_write"
behavior. Initialize the registers used to silence these complaints.
2, Update s_wait stuff for gfx12.
Change-Id: I462b2b0b5017dd2876a5954169d3b6b2f1c2a75b
Signed-off-by: Lang Yu <lang.yu@amd.com>
[ROCm/ROCR-Runtime commit: fe5f12342d]
Do a memset, since we can't initialize variable-sized objects
Change-Id: I57faf4a0581a29f9d30391aa387812c2b7bb5011
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: cc7ff73e7f]
New implementation of CU mask testing that focuses on correctness of
masking. Unlike previous implementations, this new implementation does not
rely on performance measurements to decide on the results of the test.
Instead, this implementation checks if waves were executed on all the CUs
enabled and only the CUs enabled.
Test case initially supported on GFX12.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I5af8b890179bc9a415fc7f47e736f4971fc40c4a
[ROCm/ROCR-Runtime commit: 9667af97d9]
We can inherit it from gtest, but not in ASAN builds. And we should be
including what we use, instead of hoping to inherit it through other
headers
Change-Id: Id47ab06a57e1c71c88f72da5f21a71f37db8a2f3
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: eda54222ea]
to send data back to user.
Change-Id: If11fb4147e32c0eed319ccf76bcde9d76815ff67
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: b07a80e505]