Resource allocated in SetUp/HsaNodeInfo::Init,
needs be delete in TearDown/HsaNodeInfo::Delete.
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: f8d8b8011f]
Use the core Driver object in the CPU agent to make it OS/driver
agnostic.
Implement the GetMemoryProperties() and GetCacheProperties methods
for the KFD driver.
[ROCm/ROCR-Runtime commit: a9f6bc8d0e]
Add support for these 2 new queries:
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_MAX
Maximum amount of scratch memory allowed on this agent
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_CURRENT
Current limit for scratch memory on this agent
[ROCm/ROCR-Runtime commit: 107b48fb15]
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.
[ROCm/ROCR-Runtime commit: aa2f98e6f9]
Allow IPC signals to be registered with hsa_amd_signal_async_handler.
This forces AsyncEventsLoop to switch to polling instead of interrupts.
[ROCm/ROCR-Runtime commit: fa8be44df9]
Environment variable HSA_HIGH_PRECISION_MODE can be used to control MFMA
precision
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ib78dd9dd8867025e090a3cca96ab6db4f65dea12
[ROCm/ROCR-Runtime commit: 2a64fa5e06]
For debian use cases, package conflict is required to remove the
deprecated package during package upgrade Also removed the duplicate
setting of package obseletes in RPM usecase.
[ROCm/ROCR-Runtime commit: 3be9c49b63]
- When waiting on non-interrupt signals, do not uSleep. This causes
regressions compared to interrupt signal usage.
- Cleanup code.
Change-Id: I706bda0b13e64ffec0b607c1915d8380a2ce0dea
[ROCm/ROCR-Runtime commit: 890399a7cf]
Set underlying type of hsa_region_info_t, hsa_amd_region_info_t
to int.
Change-Id: Ibf97a025eec6176d8e28af8009e9bd6795ca061f
[ROCm/ROCR-Runtime commit: 166b08346b]
Update rocm_ci_caller.yml to use amd-master , until amd-mainline is aligned
Signed-off-by: Choudhary, Rahul <Rahul.Choudhary@amd.com>
[ROCm/ROCR-Runtime commit: 16cd712685]
BUILD_SHARED_LIBS is a global flag so we don't need to set a default
option for it in both libhsakmt and hsa-runtime, only the top level
CMakeLists file. Also updated README to reflect that libhsakmt is
always built statically and gets linked to libhsa-runtime.
Change-Id: I1511f68a268032bec9758bc731d8074f33ec980f
[ROCm/ROCR-Runtime commit: ff01f62777]
Convert test to use multi-GPU framework.
Add mutex to fix intermixed log issue and annotate logging with
gpu node number.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ic2beeadb1eb4b5a9a0710ac1dbd60b9bf1d84c33
[ROCm/ROCR-Runtime commit: f24d789dee]
"s_waitcnt 0" (deprecated in gfx12) is redundant here.
s_endpgm will wait for all outstanding instructions
to complete before executing.
Change-Id: Ia8b4dd0fd8dd713e7ba2cba9db85b7b12cee1dd4
Signed-off-by: Lang Yu <lang.yu@amd.com>
[ROCm/ROCR-Runtime commit: d159b29dc6]
Since GFX950 can support page table fragment up to 18 without
performance loss. So set GFX950 default svm.alignment_order to 18.
Change-Id: Ibcdb7f041fb07a38e924c471beec261ea227ca1d
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 9509af4b98]
This patch creates the blacklist for gfx950 by copying gfx942 but adding
KFDGWSTest.Semaphore as GWS support is completely removed from gfx950.
Change-Id: I5d7c17e57b8cfd9fae63780ecc9dd55662cfdade
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 0b6e457201]
Make sure to use allocate the same amount of size for VGPR data in
gfx950 as it is done for gfx940.
Change-Id: I6a0820996389627ccbdfef856e5150c46fac92a1
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
[ROCm/ROCR-Runtime commit: 76052ba028]
Added HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving. Note that all features of 3D images are supported
with 2D swizzles,it's just that the access patterns are different
and therefore cache hit-rates may be better or worse, depending
on how it's used. Volumetric algorithms do better with 3D and apps
that tend to access a single slice at a time do better with 2D.
Change-Id: Id8574a6710fe4333a1ee331e5ce9195a81434198
[ROCm/ROCR-Runtime commit: 6361466baa]
Replaces WaitAny with WaitMultiple to more closely align with the
underlying driver API for waiting on multiple events.
WaitMultiple adds a single parameter, wait_on_all, to the WaitAny
interface providing a single function for waiting on multiple
events when we only need AND and OR semantics for the signal
checking logic.
Change-Id: I68a4a45d48151d9d69aef02fd8f7263b9e6c0e75
[ROCm/ROCR-Runtime commit: 8a38f121ea]
The CWSR area size needs to take into account the size of LDS each
active workgroup can have. The current implementation uses a constant
for that. This patch refactors this to use the HsaNodeProperties of the
device's the CWSR area is for to figure out the size of LDS.
Change-Id: Ib8585b2b7140ec5c99e7b7d62e67f785697c028a
Signed-off-by: Lancelot Six <Lancelot.Six@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: c51aa0d155]
Set priority to maximum for signal event handler and minimum for
exceptions event handler.
Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc
[ROCm/ROCR-Runtime commit: 7ea25ebb85]
Generalize the driver discovery and move driver-specific
functionality to the concrete driver implementations.
Currently, this process is tightly coupled to the hsakmt
which is GPU and OS specific.
Change-Id: Ie1c53fef407a71b5ec4c6eaf3a3ed00871184409
[ROCm/ROCR-Runtime commit: 15107afb11]
This reverts commit 5a8092bccf.
Reason for revert: This will put back the change ID - Id1154f08f6ba21c633905fd46b06053994d6f3cc to ROCR repo, which will prevent memory allocations from being automatically granted the 'executable' flag, addressing previously - incorrect and unsafe behavior in ROCm driver.
Change-Id: I3d45c45859929a80f7791681b411251e099a1901
[ROCm/ROCR-Runtime commit: 2d4a578020]