New trap handler ABI: Record in ttmp11[8:7] the event that caused the
trap handler to be entered. We currently record 2 events, trap_raised
if an s_trap instruction was executed, or excp_raised if an exception
(MEM_VIOL or ILLEGAL_INST) was raised.
Change-Id: Ie278c8277437b3b67c2737dcd1a12fe6511df428
Remove the hard-coding of "SHARED" as the lib type, and move any
SO-specific linking to only happen if the .so exists in the first place
Change-Id: I3f0bfd5c03f19b2425423b4dc8eed8fd87acc1d6
Make the hsakmt library take the value from CMake regarding
static/shared
KFDTest automatically grabs the right one due to it checking the normal
shared folders. Tested locally (and via automation by the time that this
is merged)
Also set the default to building SOs if BUILD_SHARED_LIBS is not defined
Change-Id: I7f8b76a7e60f3b41e5981f472b388301ae09e2c6
Signed-off-by: Kent Russell <kent.russell@amd.com>
Changes in the compiler are being made to add controls for XNACK and SRAM ECC
for all targets which can support these features. By default the conservatively
correct settings of XNACK on and SRAM ECC on will be used. This change is to
facilitate these backend updates.
Change-Id: I2fd6b6bc1d32937737e7f56d8e08c70fe781c745
Using the building OS isn't guaranteed, as we can theoretically build
RPMs in Ubuntu or DEBs in CentOS. Use CPack's DEB/RPM-specific variables
to get around this issue
Change-Id: I404246c070eac2c74b45ae4b763c612891d66de1
Signed-off-by: Kent Russell <kent.russell@amd.com>
IPC create must only be used on whole ROCr allocations.
Fragments were allowing handle creation with offsets.
Change-Id: I1faa96d36bc7a6199bdc2e3ff1b8871d1a36a2fa
This has been the default mode for a while now since we don't
distribute or build the finalizer. Removing the attempt cleans
up debug mode messages that are causing confusion.
Change-Id: I8162c95abd5bbedaa22b90191f7a384a34c388ae
Pool size was being used where alloc_max_size should be.
Changes are necessary on NUMA systems where not all nodes have
installed memory.
Change-Id: If8f507cae50a8dfeae8572d4e39df757abe28599
Do not request host accessible memory otherwise small-bar XGMI fails.
Change-Id: I6b1e750839ae66a34c85405fa8d0a4aa455399ef
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Lock API suceeds but the GPU still faults on the address.
This should be fixed in Thunk and/or KFD as well.
Change-Id: I8b2fbcae61ab181e4fe7f0b64e43a5f0772efb24
When running run_kfdtest.sh through a wrapper script that sources
run_kfdtest.sh, kfdtest.exclude isn't found because $0 points to the
location of the wrapper script. User $BASH_SOURCE instead of $0 to
find the location of the correct run_kfdtest.sh script.
Change-Id: I0ae7899e527e6d98bb8651197484e5ee03a5fd7b
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Some systems don't support coarse-grained DPM, so performance level will
fail. Remove the compute_utils.sh references, and just use the SMI if we
request clocks be high, without throwing errors if it fails.
Change-Id: Ic5beda9921128be36ac2d58cae3f0608618a8e21
Iterate the loaded shared objects to see if the given elf image binary
is part of a loaded segment.
Change-Id: I074cacd99eb5b59f883f4ce2bd901e0e35a660b8
- Update the documentation comment in hsa_ext_amd.h, which contained
contradictory and incorrect information about an argument to the
hsa_amd_agents_allow_access function.
Change-Id: I60b0dbbdc761078cd81906bc2c63a27d7e6b53e1
Don't set kfd_open_count=1 unless hsaKmtOpenKFD actually succeeds.
This prevents returning HSAKMT_STATUS_KERNEL_ALREADY_OPENED in
subsequent calls when KFD is actually closed.
Signed-off-by: Sean Keely <Sean.Keely@amd.com>
Change-Id: Ia870b5faa8626826a6c8795aa10784d376cf2e80
- Symlink creation is corrected only for deb packages
- It is follow up package of http://git.amd.com:8080/c/hsa/ec/hsa-runtime/+/334403
- configure_file() is called to update the scripts with proper cmake variable values
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: I0e833ead265166411e83593fd57265a9ab356904
CPack now incorrectly adds two copies of directory symlinks when
building Debian packages. This causes dpkg to see a file conflict
and fail installing.
The correct long term solution is to remove the symlink and use a
flat directory structure. This patch adds the symlink in the post
install script as a workaround until we can switch to flat layout.
Change-Id: I879b6cbc2661c19df3db639cb42fba0972fddb93
libpci was only used to find a marketing string for a device.
This patch looks for a pci.ids on disk and parses it to extract the
same string, using 'Device xxxx' as the fallback on file i/o error
or missing data from the text file. Tested by checking every vendor/
device pair against the values returned from libpci.
Change-Id: I21af3157472c1824d57fcee31393c6ee8ce07330
Signed-off-by: Jon Chesterfield <Jonathan.Chesterfield@amd.com>
Read device unique id from sysfs and expose it in HsaNodeProperties.
For devices not supported the value will be 0
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I97b8689dfa090971c6876de6feaa97652e28c03d
HDP will now be used for coarse grain kernarg so needs to be
reported without consideration of fine grain vram over pcie.
Change-Id: I648167299faa583876a3d8685c3b3c4d8d31ebf9
Setting to 1 prevents the scratch handler from reducing peak occupancy.
Scratch allocations that would normally reduce peak occupancy will
instead fail.
Diagnostic for TF and PyTorch.
Change-Id: I2d7ea47077eb5cf708251c8aa3fd183ad4261be0
scratch_used_large_ was uninitialized leading to the observed hang.
DynamicScratchHandler would wait for a large scratch release despite no
large scratch having yet been allocated. Fixes .
The patch also removes a potential race between AddScratchNotifier and
ReleaseQueueScratch. The race condition does not exist today since both
scratch alloc and release run on the same thread. The changes will
prevent this potential race from manifesting if the async event handler
is ever updated to use multiple threads.
Also enhances scratch occupancy reduction reporting. Reporting now
prints the initial request size as well as the allocated size and the
effect on occupancy this has. Occupancy is computed in terms of the
requesting dispatch grid size so may be >100%.
Change-Id: I0fc5ee01467ff4c29bdd25d545177c97862c3bd9
Ensures that all CPU agents will have a pool handle to allocate
system memory. These pools will have no numa binding since the
node their owning Agent represents has no installed memory.
Change-Id: I9f72b455d633646839753c6719ff7f6a4c41f7c4
- This new path is required when libhsaruntime.so is referred
from the top level ROCm lib directory.
- Once ROCm stack lib/lib64 structure is flatten, RUNPATH in all
the libraries needs to be updated.
Change-Id: I369131ce93e14958ec57a54701671f2bfd8d522a
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Move the shader code before the test case code so that all test case
code is consecutive.
Rectify the print messages and avoid calling GetSysMemSize() repeatedly.
Change-Id: I1c4aa5552de4d74163717fe66ad9759fb09e1316
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
The original test takes forever to run on emulators because emulators
are much slower than Asic. So intelligently detect the emulator scenarios
and reduce the run time by slashing the iteration times.
Change-Id: I087f43c04c2b23b5ab2ecaad07533b767c337e94
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Use an "if 0" to remove the code, as it is not working as expected. The
test is supposed to test mapping from a foreign device (like a NIC), so
it uses a separate GPU to map, but this mapping can be evicted and thus
the test can fail unexpectedly. Remove the code until the test can be
reworked
Change-Id: Ie4a15c2a018bbd8e931b06b6700d10b3be86e410
This will allow it to be installed with the ROCm suite,
and centralize things a little bit more
Also update run_kfdtest.sh to reflect the changes
Lastly, remove "die" reference as compute_utils.sh
may not be packaged with KFDTest
Change-Id: I4c30cd29979192496419e71e3685937d7417f739
If child process explicitly calls hsaKmtReleaseSystemProperties(), it
fails. Allow child process to release and acquire system properties.
Change-Id: I649a4600212711b2ad4474f605f3ca39a4003d03
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Tests run in the order in which they are linked. Currently that order
is non-deterministic. Listing source files in the Makefile explicitly
makes the order deterministic.
The order chosen runs basic tests before more advanced tests before
stress tests.
Change-Id: I5bc032bcd589f92a51db36acb518bb4d8ef778d3
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>