Fall back to older apertures API and old events page size if the new APIs
fail. This allows running on current upstream kernels (with only minor
fixes) on gfx801 and enables testing of further changes during upstreaming.
Change-Id: I9d86d4f576e52fcbb5bc158d80f1bf41261e4e87
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 78e683acf4]
Removed Werror CFLAGS for lower version of gcc. there
will be some warning message on lower gcc version but build
is ok.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Change-Id: Icf556625cb870c4ad73e1d89f3d4ade3a96e821f
[ROCm/ROCR-Runtime commit: 8176830577]
Non paged system memory is allocated with node id 0. However, since a
gpu node is required for allocating system memory via KFD, the first
dgpu is used. In hsaKmtShareMemory() if system memory use the same
(first) dgpu.
Change-Id: I85789a89a4e4f7888e3826826401ea89ce4d1718
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: 186527d0b7]
Topology uses cpuid to get CPU cache information. However when running
under Valgrind, data returned from cpuid are not from the processor we set
affinity to. Instead they are all from one specific processor. For a quick
workaround so other teams can continue their work, this patch will report
CPU cache from that specific processor and ignore others.
Change-Id: I5cfac2329dac277f3dbde1be92fa26e085465401
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: e46743b1dd]
Needed for some tiling formats.
Change-Id: Icd460edaa77ccbeb3c98bc74b574ca5517db22af
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: d563e2cb1d]
This should allow us to take advantage of BigK fragments and huge pages
and improve TLB efficiency for VRAM allocations. Huge pages only work
with 4-level page tables (gfx900 and up). BigK fragments work on older
GPUs.
Change-Id: I02e1fbf74de554e16fdaf44e44d03b47df45c3b0
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: c7bd7733e5]
Imported graphics buffers are most likely images. Align them for
tiled image access. 64KB seems to do the trick.
This fixes VM faults with OpenCL graphics interop.
Change-Id: I7f60e205d93fff9407e0d00d3dbb02cc4990b863
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: dc2c52be78]
Tools/CodeXL will retain older versions of structs if them need them.
Change-Id: I568d7b445778dd575ef71000b4b839300572288e
[ROCm/ROCR-Runtime commit: a0a3587345]
1. Correct amd::AqlQueue::ExecutePM4 to support interception.
2. Minor fixes to AqlPacket and SoftCP.
3. Minimal change to disable interception of runtime internal queues.
Change-Id: I103fece2ebf9a188d27f01e61221c737405d7253
[ROCm/ROCR-Runtime commit: bc0bd00746]
Performance counters have limited slots for concurrent profiling. We
need a mechanism to synchronize the slots access across different
processes. Lock file file was first used for this access control. It
reveals a RedHat bug that /var/lock, symbolic linked to /run/lock, is
not writable by others. To avoid this bug and to simplify the code,
POSIX shared memory is created to replace the lock file usage. Access
of the shared memory is controlled by semaphores.
Change-Id: I1e13c17f0e042fdfe6657afe8b3c88db7e84d292
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: ac468f676c]
Hardware block testing is done with the workgroup state offset
initialized to the control stack size on all ASICs. MEC microcode
assumes this space is available when the workgroup state offset is
reset after a context restore event.
Fixes context save area overrun when the full save area is used.
Change-Id: I8eeb62f97140c6fe409fe78b4497d833584feea8
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
[ROCm/ROCR-Runtime commit: 4fbffcdd9c]
If the application forks, close the fd inherited from the parent.
Change-Id: I48e4157d5f0d6f04d07ecb23b719a23934687cdb
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: dc6ece67fd]
Libraries normally don't print messages. We use pr_err, pr_warn,
pr_info, and pr_debug to print messages to stderr when prints are
enabled for debugging.
Change-Id: I9caf719343aa618c88e7b500f9737a46702e424a
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 897c4e2fff]
Existing Thunk has printf/fprintf in the code while normally libraries
don't print any message. This patch introduces a print machenism similar to
how the Linux kernel prints to console based on the log level. The default
is not to print any message, but setting HSAKMT_DEBUG_LEVEL will enable the
prints.
Change-Id: Ic071e122d35a82260218e9914cde4815e69df742
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: ccfe739929]
For experimental purpose, we need an option to change compute capability
by forcing the GfxIp version. This patch allows to use environment
variable HSA_OVERRIDE_GFX_VERSION=major.minor.stepping to replace the
default version. For example:
export HSA_OVERRIDE_GFX_VERSION=9.0.1
Change-Id: I90cfbd43619d9d3aebf53321d4e058f01bcd7088
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 13aadde56e]
ctl_stack_copy is allocated from malloc. It should be freed by free.
Change-Id: Ib924da20200d91f52f106fe173464d47862759a8
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 6e113e2634]
When a fatal memory fault occurs the scheduler context-saves all queues
in the process and notifies the runtime through the memory event. The
saved state contains all GPR/LDS data at the moment of the fault.
Retrieve this state and present it to the user if HSA_DEBUG_FAULT is set
to "analyze" and the wavefront caused the fault. If amdgcn-capable objdump
is in the PATH invoke this to disassemble code around the PC.
Queue lifetime is now managed by the runtime to allow querying the
context save state for all active queues.
Change-Id: I6fee662fad1c4f9aa125bf5c53d7d0ea1ab32f95
[ROCm/ROCR-Runtime commit: 75c9506f9d]