This reverts commit ef95ccf81e59b8608861e8f2f256d981eee19df7.
Reason for revert: Causing performance regressions on some systems
Change-Id: I82951350cafbd57c495852d6f90023a3373f04f6
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 1cee8656df]
Add target gfx1200 to several files.
Add cases for GFX12 in a few switch statements.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ib90032f5b9d5a3306060f13a43d970108a1399df
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 2f14acd9c1]
Generate static package by combining binary and dev components.
Binary and dev component dependencies are added to the static package dependencies
No dependency to rocprofiler-register
Package name will have suffix static-dev/devel
Change-Id: I2f9680f13dbffc9eb7ced9fa9b28e360c47ebcca
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 696d8fae9e]
Add a new driver interface as a core ROCr component.
The driver component provides an interface for ROCr to interact with
agent kernel-model drivers in a generic way. This interface will be used
to interact with the XDNA NPU driver. Eventually, the ROCt library's
functionality should be implemented behind this interface.
For now the interface provides basic queue and memory allocation
for supporting HSA queues and signals and matches the thunk API
closely.
Change-Id: I37ac9f2dcbadc86ce45999f76b0e9ce753fd0c06
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 69ba32fa95]
Create a new top-level CMakeLists.txt file to control building thunk
and ROCr. kfdtest and rocrtest are built separately.
Most of the cmake code that existed for thunk, ROCr, rocrtst and kfdtest
still reside in their respective CMakeLists.txt files, except the
CPack packaging directives which have been moved to the top-level
CMakeLists.txt.
Change-Id: I1a537359029504af8b1abb324bc6f0d75d98471e
[ROCm/ROCR-Runtime commit: 662f6817d7]
Minor instructions changes for GFX12.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Iab2c430bb5d7d8fa2b166d07fd33ea15aca3a5cd
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 8917561625]
Since PC Sampling is still under experiment, we can't
bump KFD_IOCTL_MINOR_VERSION to enable pc sampling.
KFD_IOCTL_MINOR_VERSION 16 already includes all pc sampling
code, so use version 16 to enable pc sampling implicitly for
customer to try-out this new feature.
Need update the version accordingly when pc sampling upstream.
Change-Id: I65840128f94e8f347c0617971c0aa4b7e478691a
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 338721c24a]
Minor instructions changes for GFX12.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I57cca6393d4b4aae869a2bc9862d75eef1f29ed7
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 9665499f83]
Minor instructions changes for GFX12.
Change-Id: I78a37fa37950b378cdd2a1618c71c97c6ba66aac
Signed-off-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 39f4fbee10]
KFD ioctl version is 1.16 on upstream for contiguous memory support.
Remove pc_sampling version, should be added after pc_sample upstream.
Change-Id: I6e6c3340bc8e371d68dd7741b02578be2fdef801
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 6e6f445f75]
Update amdp2ptest.h to sync with the same file from rdma test driver
folder.
Add ContiguousVRAMAllocation to verify rdma get pages will get
contiguous VRAM pages, skipped RDMA getpages if amdp2ptest.ko is not
loaded.
Change rdma buffer mmap with MAP_SHARED flag, because MAP_PRIVATE goes
to COW path, which requires mmap the entire vma and cannot support
multiple sg nents.
Change-Id: I5fbb1902251f1454616d4404a4b048a88996d4f7
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: e076a4ee82]
mmap system call parameter vma->vm_start, vm_end is mmap virtual address
start, end, vma->vm_pgoff is rdma buffer GPU address, which is used to
find the sg_table dma_address.
Handle multiple sg table nents case because sg->length is limited to max
2GB.
Change-Id: I677dd6662ee58f0b5c93f8eef32b7009e1e890d8
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 9d9fbceefb]
The application may use parent process KFD handle or invalid KFD handle,
add CHECK_KFD_OPEN in all APIs to catch this application bug earlier
without calling to KFD.
Change-Id: I0391e91eeca8e6752fc9c23f0742445b823ea9b0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: c98a8dc179]
New API to support optional alignment parameter for memory allocations.
The alignment should be larger than or equal to page size and a power
of 2.
Change-Id: Ic3fec43b3c4281f74dd33a57ab4143dcf76e1186
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: a31e84eaef]
Since the amdgpu driver commit 1f4ac94b59aebebf.
https://lore.kernel.org/all/a121a72c-b441-4f42-94a3-4597b7f19e7d@amd.com/T/
gtt and vram are available for compute.
So, the vramSize obtained by function GetSysMemSize is actually about 50% system memory.
But small APUs don't have large system memory, and kernel memory limit is smaller for them.
Therefore, it will fail to register SVM Range for SysBuffer and SysBuffer2.
Example:
System Memory size: 3373M Kernel memory limit:1791M
VRAM Memory Size: 256M GTT Memory Size: 1686M
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Change-Id: Ib3826933100ab7b432cb476caaf2d91cc9cdb948
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 1abd02af32]
hsaKmtRegisterMemory* can only register OS allocated userptr.
v2: Apply changes to all hsaKmtRegisterMemory* stuff.(Philip)
v3: Unlock aperture->fmm_mutex to aviod deadlock.
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I1045af7edb4da8206cb878f64c0176ba4fc59f60
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 4844a70d94]
Update CMakeLists.txt to use Thunk pkgconfig.
Add rdma contiguous memory allocation test, to verify if KFD rdma get
pages to pin buffer on contiguous VRAM pages.
Change-Id: I7cc617fc083ce1998c214c327c130f033ce41d6f
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 3f00c88910]
Update Makefile for newer kernel version, and support build with dkms
amdgpu driver. Use symbol_request to get KFD peerdirect interface.
Sync up with KFD peerdirect interface changes, remove the free callback
which is not used any more.
Change-Id: I01d8906d9ffa427a058a26e88e36f6b80e9e22c2
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 88dabfcc04]
Small APUs use same memory allocation approach with APP APU now,
skip these tests as well.
Change-Id: I13c953cc53da071f6f36af0d4a0153a48ea066fe
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 5eb2a2660c]
To differentiate discrete and integrated GPU more flexibly in runtime,
this will aid in querying HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU
and hipDeviceAttributeIntegrated.
Change-Id: Ic8a6c9aea3b4bd19c4d5f6729af7e64c328fc61d
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: ae3ede062f]
Add test cases excluded from GFX11 to GFX12 list if they are also not
stable on GFX12.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ifeab24f8ea94085250ea86128a3e401479bdb53d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 24578e10c1]
Minor changes to instructions for GFX12.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Iac5be900e3755099d83010fb1a2066b4dbb52dda
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: bde8e7a212]
Updated ShaderStore shader (used by CWSR test) for GFX12.
Workgroup ID now pass in a different register.
Minor changes for new scope syntax.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I6fdabc8b62cba201d7777a736d3d43cfae28ca4c
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: e086c383fe]
New watchpoint exception status bits have to been assign to the first 4 least
significant bits so change test verification mask to check against the
first watch point ID accordingly.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: If83950207ea9f66cd230c23e7386a97b3893c2eb
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 3b842c39f1]
Fix traphandler for KFD debugger testing.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Ib8f5aac3d1b99e4463ac56b5f6d5dee2c367c447
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: a2e9226784]
Set max size needed for VGPR when doing a CWSR for GFX12 and GFX12.1.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Iddefc62f1ad419c6f5ab6a872048457a1dc24037
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 259a724e21]
Since PC Sampling not upstream yet, so use 1.16 for
contiguous VRAM allocation, and 1,17 for pc sampling.
Change-Id: Ib5d22e8f386ce7fe3f7111485b9632b61227e539
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 5786dbbb76]
Skip test when PC Sampling is not supported by ASIC.
Change-Id: I6f9be0bdaed66e51052723b6df6908079470cefb
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 1087dea925]
C Error returns are positive in user space and should check against errno
instead.
Fix declaration of return to type HSAKMT_STATUS.
KFD IOCTL should handle size return when querying capabilities so return
size to caller unconditionally.
Clean up error translations per function so that it's stylistically
clear.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Ic37390425f370c7ad88f9ed014444decf19383a3
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 206db80a56]
We need : to end each subtest, except for the last entry.
Change-Id: I9515d90703c9679e06a4acd124883540c1d5b832
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 371d078226]
This test may fail when run on non-upstream versions of KFD as this
feature will not be upstreamed.
Change-Id: I7131e1f50984739c0df12e4c9afe790bd7e4cdfa
[ROCm/ROCR-Runtime commit: d2d95a8948]
It seesm the Release() function is not reliable and can cause segfaults.
This is a temporary work-around until the Release() function is fixed.
Change-Id: I95470a800c6153673e4b8f4fe46a646903325074
[ROCm/ROCR-Runtime commit: ac5fb8be9e]
If pthread_attr_setaffinity_np function exists use it instead of
pthread_setaffinity_np as pthread_setaffinity_np seems to fail to set
the affinity settings on some systems.
Change-Id: Icd8b17039699ac10d9cd5c4dbb6ac44630673949
[ROCm/ROCR-Runtime commit: 57b93e02a4]
Bumping HSA_AMD_INTERFACE_VERSION_MINOR version to 5 to account for
previously added GPU agent query: HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES
Change-Id: Ic8cfdcfb7bad6f3d1e0b3d68f505a62074fc26b9
[ROCm/ROCR-Runtime commit: b6829f7a72]
Support contiguous physical memory allocation flag. Allocations with
this flag will have contiguous physical memory. This is dependent on KFD
support for this flag and the AllocateKfdMemory(..) function call will
fail when it is not supported.
Change-Id: I6c51c8b061f7b026fdcc2aa2c37c74ecc13d95b6
[ROCm/ROCR-Runtime commit: 9af225e1b1]