Use the mapped_device_id_array size when allocating temp_node_id_array
for unmapping queues in fmm_map_to_gpu_nodes. registered_device_id_array
size may be 0. Also, this temporary array is small enough to allocate it
on the stack. Malloc and free are overkill here.
Fix potential memory leak when registering the same device ID array
multiple times.
Change-Id: I83f09fd0925d9de7cf11bf72ba0ebb77273f587d
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 395ecaa985]
IOMMU path in sysfs was amd_iommu. After implementing multiple devices
support, the path is replaced with amd_iommu_<index>. Current Thunk spec
is not clear about how to support multiple instances in one block. There
is no products having multiple IOMMUs yet at this point. This patch
changes the path to support both amd_iommu and amd_iommu_0 for Carizo.
Change-Id: I3beea2fc78d96296232226191501a02ccf20d6b1
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 369902bf5b]
Add pr_debug to all memory APIs and pr_err to some failure cases.
Change-Id: I8b519a1228cc19e6c04118fd87432e7f48f3cbf9
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 73707766ef]
Simplify fmm_map_to_gpu_nodes code. Also fix a memory leak in this change.
Change-Id: I3487338b78c915de44588d0206bac4c53e728c60
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: f1a5248cf2]
Fall back to older apertures API and old events page size if the new APIs
fail. This allows running on current upstream kernels (with only minor
fixes) on gfx801 and enables testing of further changes during upstreaming.
Change-Id: I9d86d4f576e52fcbb5bc158d80f1bf41261e4e87
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 78e683acf4]
Removed Werror CFLAGS for lower version of gcc. there
will be some warning message on lower gcc version but build
is ok.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Change-Id: Icf556625cb870c4ad73e1d89f3d4ade3a96e821f
[ROCm/ROCR-Runtime commit: 8176830577]
Non paged system memory is allocated with node id 0. However, since a
gpu node is required for allocating system memory via KFD, the first
dgpu is used. In hsaKmtShareMemory() if system memory use the same
(first) dgpu.
Change-Id: I85789a89a4e4f7888e3826826401ea89ce4d1718
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: 186527d0b7]
Topology uses cpuid to get CPU cache information. However when running
under Valgrind, data returned from cpuid are not from the processor we set
affinity to. Instead they are all from one specific processor. For a quick
workaround so other teams can continue their work, this patch will report
CPU cache from that specific processor and ignore others.
Change-Id: I5cfac2329dac277f3dbde1be92fa26e085465401
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: e46743b1dd]
Needed for some tiling formats.
Change-Id: Icd460edaa77ccbeb3c98bc74b574ca5517db22af
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: d563e2cb1d]
This should allow us to take advantage of BigK fragments and huge pages
and improve TLB efficiency for VRAM allocations. Huge pages only work
with 4-level page tables (gfx900 and up). BigK fragments work on older
GPUs.
Change-Id: I02e1fbf74de554e16fdaf44e44d03b47df45c3b0
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: c7bd7733e5]
Imported graphics buffers are most likely images. Align them for
tiled image access. 64KB seems to do the trick.
This fixes VM faults with OpenCL graphics interop.
Change-Id: I7f60e205d93fff9407e0d00d3dbb02cc4990b863
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: dc2c52be78]
Performance counters have limited slots for concurrent profiling. We
need a mechanism to synchronize the slots access across different
processes. Lock file file was first used for this access control. It
reveals a RedHat bug that /var/lock, symbolic linked to /run/lock, is
not writable by others. To avoid this bug and to simplify the code,
POSIX shared memory is created to replace the lock file usage. Access
of the shared memory is controlled by semaphores.
Change-Id: I1e13c17f0e042fdfe6657afe8b3c88db7e84d292
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: ac468f676c]
Hardware block testing is done with the workgroup state offset
initialized to the control stack size on all ASICs. MEC microcode
assumes this space is available when the workgroup state offset is
reset after a context restore event.
Fixes context save area overrun when the full save area is used.
Change-Id: I8eeb62f97140c6fe409fe78b4497d833584feea8
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
[ROCm/ROCR-Runtime commit: 4fbffcdd9c]
If the application forks, close the fd inherited from the parent.
Change-Id: I48e4157d5f0d6f04d07ecb23b719a23934687cdb
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: dc6ece67fd]
Libraries normally don't print messages. We use pr_err, pr_warn,
pr_info, and pr_debug to print messages to stderr when prints are
enabled for debugging.
Change-Id: I9caf719343aa618c88e7b500f9737a46702e424a
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 897c4e2fff]
Existing Thunk has printf/fprintf in the code while normally libraries
don't print any message. This patch introduces a print machenism similar to
how the Linux kernel prints to console based on the log level. The default
is not to print any message, but setting HSAKMT_DEBUG_LEVEL will enable the
prints.
Change-Id: Ic071e122d35a82260218e9914cde4815e69df742
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: ccfe739929]
For experimental purpose, we need an option to change compute capability
by forcing the GfxIp version. This patch allows to use environment
variable HSA_OVERRIDE_GFX_VERSION=major.minor.stepping to replace the
default version. For example:
export HSA_OVERRIDE_GFX_VERSION=9.0.1
Change-Id: I90cfbd43619d9d3aebf53321d4e058f01bcd7088
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 13aadde56e]
ctl_stack_copy is allocated from malloc. It should be freed by free.
Change-Id: Ib924da20200d91f52f106fe173464d47862759a8
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 6e113e2634]
Use build machine architecture to build debian package. Useful for
building on Power8 and ARM64 machines.
Change-Id: I97fc80a6723b139e753019a355f11ced0bba0dd4
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: 5e26827d05]
DB block was missing in the UUID look-up.
Change-Id: Ife5c25859bab6ec7fd99d0cd4d098ab044a08142
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: ceaaa1a57c]
The KFD implementation has been removed and will not be upstreamed.
This API has been superseded by hsaKmtRegisterGraphicsHandleToNodes.
Change-Id: I5f2d8da3260974618cdb6ea3fdcd77d37b82c9cb
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 374bd89d8c]
For items in HsaQueueInfo, control stack information comes from KFD, CU
mask information is maintained in Thunk, and others (queue detail error
and queue type extended) are ignored (value = 0) at this point.
Change-Id: Ib21370b0f52b2bb4ebe6a9b4b6ec6139cccb25ca
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 683fc96325]
Use checkpatch.pl to fix the majority of errors. Some that remain and
will be excluded:
Use of typedefs/externs/volatile/sscanf
Lines over 80 characters
Remaining errors are due to misunderstanding the * symbol with typedefs
Also use this opportunity to spell manageable properly
Change-Id: I0b335e9cb3e1eea38bee27eaa1f582b2c9b09b38
[ROCm/ROCR-Runtime commit: b78e0e152a]
Use calloc to allocate event data. Otherwise random data may be filled
in for events that haven't actually signalled. This could trigger the
VM fault handler in the Runtime when no VM fault actually happened and
lead to intermittent HSA conformance test failures.
Change-Id: Icf702970e73a485b50633703c1b164f87fbb8606
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: ea58703ece]
This change breaks the ABI, and aligns it with the upstream ABI.
It also fixes some ioctl structures that are not 64-bit safe and
consolidates ioctl numbers.
Change-Id: Ib79944721534bd55a5299c5baf7bb5b3246cccd2
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 15764e2897]
This patch adds more non-privileged PMC blocks to GFX9/gfx900 to cover
blocks added in HSA Thunk Spec.
Change-Id: Ia3d953213a32536b2275231149f11ba060791442
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: ca06b0966b]
This patch adds more non-privileged PMC blocks to GFX8 products: gfx801,
gfx803, and gfx803. Most of them have the same counter IDs on the same
block. For certain blocks when the product doesn't have the same counter
IDs, gfx8_xx_ is used to represent the product.
Change-Id: I059913c974bf2eb875fd1cf6f8b0d8c9c9bd7c14
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: ed4a22e0d3]
HSA Thunk Spec was updated to include more non-privileged blocks for
profiling. This patch adds those newly added non-privileged blocks for
gfx70x.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Change-Id: Id745ac236c871e8e61a128a2460784f9c9c354b6
[ROCm/ROCR-Runtime commit: 9f19acbdb7]
export HSA_CHECK_USERPTR=1 to check user pointers on registration. If
the pointer doesn't point to a valid mapping, there will be a segfault.
Change-Id: I459c0902cbc90338517fbf79678871ebfbe5183b
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 34ddde0c50]
KFD added all direct IO links to sysfs, so this patch removes all direct
links related code and modify the indirect links function to reflect the
change.
Change-Id: Iaec7b5f6c59f9034f8f960ca1fe1145d51dab367
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: c119653add]
Guard pages help catch out-of-bounds memory accesses by applications
by generating VM faults (GPU) and segfaults (CPU).
Remove address space reservation from scratch aperture. That address
space is managed by the Thunk client. Guard pages would cause Thunk's
address space management to get out of sync with the client's.
Change-Id: I2e5aee2923a90186358cc7b0e131baf547996df6
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 11862b9f61]
- Typo fix: *_link_tye to *_link_type and a missing word in comments
- Replace printf with fprintf(stderr
- Shorten lines to fit in 80 characters
Change-Id: Ibeb0b98d5c59d617ae06d9854a9dde16251ded52
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 3738a1b5f2]