Граф коммитов

423 Коммитов

Автор SHA1 Сообщение Дата
Amber Lin e46743b1dd Workaround cpuid issue under Valgrind
Topology uses cpuid to get CPU cache information. However when running
under Valgrind, data returned from cpuid are not from the processor we set
affinity to. Instead they are all from one specific processor. For a quick
workaround so other teams can continue their work, this patch will report
CPU cache from that specific processor and ignore others.

Change-Id: I5cfac2329dac277f3dbde1be92fa26e085465401
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-07-24 12:04:17 -04:00
Felix Kuehling d563e2cb1d Update image alignment to 256KB
Needed for some tiling formats.

Change-Id: Icd460edaa77ccbeb3c98bc74b574ca5517db22af
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-07-20 21:03:31 -04:00
James Edwards ee22d80760 Update README.md to include new build instructions.
Change-Id: I72ca67d3016c99682cfe745bfd74c722ea181a61
2017-07-20 09:17:54 -05:00
James Edwards e93d3de0a1 Final changes to roct CMakeLists.txt file for devel package.
Change-Id: Ie0ce0c5cd8e7811f67e92439d1df1612eabefdfa
2017-07-19 17:16:17 -05:00
James Edwards a1353acd85 Update the ROCt CMakeList.txt files to build both runtime and devel packages.
Change-Id: I01b6e4e5db91dd5f56ffea54c548e10f1f4aae5d
2017-07-19 01:16:13 -04:00
Felix Kuehling c48ff6b482 Make HSA_DISABLE_CACHE work on gfx900
Change-Id: I624390bfa70b2ff4cefd1bbdf8d960b7121f22bb
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-07-17 15:13:52 -04:00
Felix Kuehling c7bd7733e5 Align large buffers to BigK or huge-page boundary
This should allow us to take advantage of BigK fragments and huge pages
and improve TLB efficiency for VRAM allocations. Huge pages only work
with 4-level page tables (gfx900 and up). BigK fragments work on older
GPUs.

Change-Id: I02e1fbf74de554e16fdaf44e44d03b47df45c3b0
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-07-17 12:04:05 -04:00
Felix Kuehling dc2c52be78 Align imported graphics buffers
Imported graphics buffers are most likely images. Align them for
tiled image access. 64KB seems to do the trick.

This fixes VM faults with OpenCL graphics interop.

Change-Id: I7f60e205d93fff9407e0d00d3dbb02cc4990b863
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-07-17 11:19:08 -04:00
Amber Lin ac468f676c Replace lock file with shared memory
Performance counters have limited slots for concurrent profiling. We
need a mechanism to synchronize the slots access across different
processes. Lock file file was first used for this access control. It
reveals a RedHat bug that /var/lock, symbolic linked to /run/lock, is
not writable by others. To avoid this bug and to simplify the code,
POSIX shared memory is created to replace the lock file usage. Access
of the shared memory is controlled by semaphores.

Change-Id: I1e13c17f0e042fdfe6657afe8b3c88db7e84d292
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-07-06 13:48:34 -04:00
Jay Cornwall 4fbffcdd9c Always allocate space for control stack at beginning of save area
Hardware block testing is done with the workgroup state offset
initialized to the control stack size on all ASICs. MEC microcode
assumes this space is available when the workgroup state offset is
reset after a context restore event.

Fixes context save area overrun when the full save area is used.

Change-Id: I8eeb62f97140c6fe409fe78b4497d833584feea8
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
2017-07-04 10:12:02 -04:00
Harish Kasiviswanathan dc6ece67fd Fix fd leak if application forks
If the application forks, close the fd inherited from the parent.

Change-Id: I48e4157d5f0d6f04d07ecb23b719a23934687cdb
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2017-06-30 15:59:41 -04:00
Harish Kasiviswanathan 4d0697bf65 Honor ReadOnly bit in HsaMemFlags
Change-Id: I456cde81384bf0f4bf055711d94b731179706d28
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2017-06-28 15:02:21 -04:00
Amber Lin 897c4e2fff Replace printf/fprintf with pr_xxx
Libraries normally don't print messages. We use pr_err, pr_warn,
pr_info, and pr_debug to print messages to stderr when prints are
enabled for debugging.

Change-Id: I9caf719343aa618c88e7b500f9737a46702e424a
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-06-28 10:47:35 -04:00
Amber Lin ccfe739929 Introduce debug level to Thunk
Existing Thunk has printf/fprintf in the code while normally libraries
don't print any message. This patch introduces a print machenism similar to
how the Linux kernel prints to console based on the log level. The default
is not to print any message, but setting HSAKMT_DEBUG_LEVEL will enable the
prints.

Change-Id: Ic071e122d35a82260218e9914cde4815e69df742
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-06-26 16:33:17 -04:00
Amber Lin 13aadde56e Use environment variable to force gfx version
For experimental purpose, we need an option to change compute capability
by forcing the GfxIp version. This patch allows to use environment
variable HSA_OVERRIDE_GFX_VERSION=major.minor.stepping to replace the
default version. For example:
export HSA_OVERRIDE_GFX_VERSION=9.0.1

Change-Id: I90cfbd43619d9d3aebf53321d4e058f01bcd7088
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-06-22 18:15:17 -04:00
Evan Quan 5b3c9f0b31 Revert "Change gfx900 compute capability to 9.0.1"
This reverts commit 5114a9368b.


Change-Id: Id9c4f43462820bf09f25674fa30e6eb04098808e
Signed-off-by: Evan Quan <evan.quan@amd.com>
2017-06-20 15:36:09 +08:00
Amber Lin 6e113e2634 Free control stack correctly
ctl_stack_copy is allocated from malloc. It should be freed by free.

Change-Id: Ib924da20200d91f52f106fe173464d47862759a8
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-06-19 09:01:35 -04:00
Amber Lin 5114a9368b Change gfx900 compute capability to 9.0.1
9.0.1 is XNACK enabled gfx900 compile target. Compiler must generate ISA
that's XNACK enabled.



Change-Id: Ic4987132ef9f8d06d9e2bcdb8f7eeb875cdd2b44
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-06-13 17:04:42 -04:00
Harish Kasiviswanathan 5e26827d05 Support deb package build for other architectures
Use build machine architecture to build debian package. Useful for
building on Power8 and ARM64 machines.

Change-Id: I97fc80a6723b139e753019a355f11ced0bba0dd4
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2017-06-13 12:12:37 -04:00
Amber Lin ceaaa1a57c A missing block in PMC
DB block was missing in the UUID look-up.

Change-Id: Ife5c25859bab6ec7fd99d0cd4d098ab044a08142
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-06-05 12:21:56 -04:00
Felix Kuehling 374bd89d8c Remove deprecated implementation of hsaKmtMapGraphicHandle
The KFD implementation has been removed and will not be upstreamed.
This API has been superseded by hsaKmtRegisterGraphicsHandleToNodes.

Change-Id: I5f2d8da3260974618cdb6ea3fdcd77d37b82c9cb
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-06-02 13:52:19 -04:00
Amber Lin 683fc96325 Implement hsaKmtGetQueueInfo interface
For items in HsaQueueInfo, control stack information comes from KFD, CU
mask information is maintained in Thunk, and others (queue detail error
and queue type extended) are ignored (value = 0) at this point.

Change-Id: Ib21370b0f52b2bb4ebe6a9b4b6ec6139cccb25ca
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-06-01 14:15:54 -04:00
Kent Russell b78e0e152a Clean up thunk code
Use checkpatch.pl to fix the majority of errors. Some that remain and
will be excluded:
Use of typedefs/externs/volatile/sscanf
Lines over 80 characters

Remaining errors are due to misunderstanding the * symbol with typedefs

Also use this opportunity to spell manageable properly

Change-Id: I0b335e9cb3e1eea38bee27eaa1f582b2c9b09b38
2017-05-31 14:38:59 -04:00
Sean Keely 59cc20d3cb Check mmap return address for allocation, not requested address.
Change-Id: Ifeb7b17976fc791e3256c70d57cb4d1324a8b960
2017-05-30 21:26:55 -05:00
Felix Kuehling 8aeb933426 Add some additional gfx900 PCI IDs
Change-Id: I5f00f3b30a27285d75c606c1308abfe032ce1d02
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-05-11 16:39:19 -04:00
Felix Kuehling ea58703ece Fix uninitialized memory bug in hsaKmtWaitOnMultipleEvents
Use calloc to allocate event data. Otherwise random data may be filled
in for events that haven't actually signalled. This could trigger the
VM fault handler in the Runtime when no VM fault actually happened and
lead to intermittent HSA conformance test failures.

Change-Id: Icf702970e73a485b50633703c1b164f87fbb8606
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-05-10 18:16:31 -04:00
Felix Kuehling 15764e2897 Fix KFD ioctl ABI
This change breaks the ABI, and aligns it with the upstream ABI.
It also fixes some ioctl structures that are not 64-bit safe and
consolidates ioctl numbers.

Change-Id: Ib79944721534bd55a5299c5baf7bb5b3246cccd2
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-05-09 14:59:13 -04:00
Felix Kuehling 5eb31b2ebe Switch to cleaned up memory management ioctls
Change-Id: Ib8971ef91138f2a051272b9b57f0ebd480e8e738
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-05-04 16:29:37 -04:00
Harish Kasiviswanathan 3b2f064cbc Get PAGE_SIZE from system configuration
Change-Id: I87f383c443b873e13d36e80bfa034665bf493520
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2017-05-02 16:54:32 -04:00
Amber Lin ca06b0966b Add more non-priv PMC blocks to GFX9
This patch adds more non-privileged PMC blocks to GFX9/gfx900 to cover
blocks added in HSA Thunk Spec.

Change-Id: Ia3d953213a32536b2275231149f11ba060791442
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-05-01 09:14:03 -04:00
Amber Lin ed4a22e0d3 Add more non-priv PMC blocks to GFX8
This patch adds more non-privileged PMC blocks to GFX8 products: gfx801,
gfx803, and gfx803. Most of them have the same counter IDs on the same
block. For certain blocks when the product doesn't have the same counter
IDs, gfx8_xx_ is used to represent the product.

Change-Id: I059913c974bf2eb875fd1cf6f8b0d8c9c9bd7c14
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-04-27 11:22:12 -04:00
Amber Lin 9f19acbdb7 Add more non-priv PMC blocks to gfx70x/GFX7
HSA Thunk Spec was updated to include more non-privileged blocks for
profiling. This patch adds those newly added non-privileged blocks for
gfx70x.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>

Change-Id: Id745ac236c871e8e61a128a2460784f9c9c354b6
2017-04-25 13:08:10 -04:00
Felix Kuehling 3f7e7933e3 Add debug option to control the number of guard pages
Change-Id: I18b10bcbb4d74a92f17330e44b2dbb4cea61da00
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-04-10 11:40:39 -04:00
Felix Kuehling 34ddde0c50 Add debug option to check userptrs on registration
export HSA_CHECK_USERPTR=1 to check user pointers on registration. If
the pointer doesn't point to a valid mapping, there will be a segfault.

Change-Id: I459c0902cbc90338517fbf79678871ebfbe5183b
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-04-10 11:40:39 -04:00
Amber Lin c119653add Create indirect IO links
KFD added all direct IO links to sysfs, so this patch removes all direct
links related code and modify the indirect links function to reflect the
change.

Change-Id: Iaec7b5f6c59f9034f8f960ca1fe1145d51dab367
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-04-07 07:18:13 -04:00
Felix Kuehling 11862b9f61 Add guard page after each address space reservation
Guard pages help catch out-of-bounds memory accesses by applications
by generating VM faults (GPU) and segfaults (CPU).

Remove address space reservation from scratch aperture. That address
space is managed by the Thunk client. Guard pages would cause Thunk's
address space management to get out of sync with the client's.

Change-Id: I2e5aee2923a90186358cc7b0e131baf547996df6
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-03-30 11:31:48 -04:00
Felix Kuehling 53838c9818 Add missing gfx900 device IDs
Change-Id: Ica5deb000279a508106125461af64a3851294b0a
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-03-24 16:03:46 -04:00
Amber Lin 73eff30d7d Add TCA block to PMC support
Add TCA to PMC tables.

Change-Id: Ia4164ab4581ea3f539706f534f672e5c24f5362f
2017-03-20 10:22:21 -04:00
Amber Lin 3738a1b5f2 Re-formatting IO link code
- Typo fix: *_link_tye to *_link_type and a missing word in comments
- Replace printf with fprintf(stderr
- Shorten lines to fit in 80 characters

Change-Id: Ibeb0b98d5c59d617ae06d9854a9dde16251ded52
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-03-09 11:08:22 -05:00
Amber Lin 2c2b1e0db2 Support profiling on gfx900
Add gfx900 to PMC support. This patch lists SQ counters.

Change-Id: Ia1e60e76ff71ab2e38d9d5de12ac9d527b3e8c6a
2017-03-07 14:30:40 -05:00
Amber Lin 9e32cdb113 Don't duplicate PMC tables
Many devices have the same counter IDs for the hardware block. Devices
in the same GFX generation usually have the same block counters. No need
to list each device individually. Instead, have a table to share with all
devices that have the same counter IDs, and have separated tables for
devices that don't have the same counter IDs.

Change-Id: I857056edc6f491f61af6e9598580e5dc7d372f94
2017-03-07 11:31:23 -05:00
Amber Lin b3b6367cb8 Add gfx803 DID
Add 0x67D0 to gfx803 support list.

Change-Id: Ifdb1fad4a3c42bea54856f6d5248c00ed546ad85
2017-03-07 07:25:49 -05:00
Amber Lin 4827b09119 Unify the device ID list
Integrate the supported device ID list distributed in topology, queue, and
pmc into one place: topology.

Change-Id: If035cf8e4a6fc6caff6c94ec627647cfb11c3d79
2017-03-06 16:26:51 -05:00
Amber Lin 1a8a9cb57b Make the lock file writable by others
Though S_IWOTH flag is set in the open() call, the lock file is not
created as accessable by others if others try to open the file with O_RDWR
permission. It's because the default umask masks off S_IWOTH. This patch
changes the umask to S_IXOTH since others don't need that permission but
it'll open up S_IWOTH. Restore the umask to original after the file is
opened.

Change-Id: I8a239e1566ce0b0b18821913385f239db7c3588e
2017-03-03 11:05:13 -05:00
Amber Lin e17c67f049 Implement Start/Stop/Query Trace
StartTrace and StopTrace send ioctl requests to enable/disable performance
counters. QueryTrace reads the counter from the perf_event fd.

Change-Id: Ibf79675bc23fcf129371bfd100f8e262121bc684
2017-03-02 14:00:25 -05:00
James Edwards ec84fbe264 Fix permissions on hsakmt include files.
Change-Id: I1d428e60268e6d2de6776ff5f16d03503d00ddcc
2017-03-02 12:00:09 -06:00
Kent Russell c991951288 fmm.c: Disable userptr for paged mem by default
Unless HSA_USERPTR_FOR_PAGED_MEM is explicitly set, don't use userptr
for all paged memory. This will also allow us to work around some 4.9
issues, and then we can explicitly set HSA_USERPTR_FOR_PAGED_MEM for
all usage once those issues are resolved.

Change-Id: I25ce22b73ae6e93f1567f2318d9d2b47d4a44e69
2017-02-28 16:09:27 -05:00
shaoyun.liu 116e5c5e8b Thunk: Don't allocate extra control stack memory for gfx900
The control stack memory for CWSR is allocate in kernel together with MQD
allocation.

Change-Id: Ib1c0ab9402df3431e9555649394320380d6c6dd8
Signed-off-by: shaoyun.liu <shaoyun.liu@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling 7de66d149b gfx900: Allow doorbell allocation independent of queue ID
On SOC15 chips, the ABI for the create_queue ioctl is changed to
allow doorbell allocation independent of the queue ID. This is
necessary to accommodate doorbell routing to specific engines in
the BIF.

Change-Id: Ie98d0a758758149dd5fc09ae088afccc29904124
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling d7063dd102 Allocate 64-bit for doorbells and write pointers
On gfx900 we need 64-bit for all doorbells and SDMA WPTRs.

Change-Id: I9b922e16442e967599ae3c928308451d5cc470b3
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-02-27 10:39:05 -05:00