Граф коммитов

225 Коммитов

Автор SHA1 Сообщение Дата
Felix Kuehling 7de66d149b gfx900: Allow doorbell allocation independent of queue ID
On SOC15 chips, the ABI for the create_queue ioctl is changed to
allow doorbell allocation independent of the queue ID. This is
necessary to accommodate doorbell routing to specific engines in
the BIF.

Change-Id: Ie98d0a758758149dd5fc09ae088afccc29904124
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling d7063dd102 Allocate 64-bit for doorbells and write pointers
On gfx900 we need 64-bit for all doorbells and SDMA WPTRs.

Change-Id: I9b922e16442e967599ae3c928308451d5cc470b3
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling 8cb89b6926 Use KFD_IOC_ALLOC_MEM_FLAGS_COHERENT for fine-grained memory
Use KFD_IOC_ALLOC_MEM_FLAGS_COHERENT when allocating fine-grained
memory and doorbell BOs so that they will be mapped with MTYPE_UC
on GFX9 hardware.

Change-Id: I51adf45b13105f479e6bcdaf54955b467920ee9a
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling e5dd2f88c6 Update kfd_ioctl.h
Copied from kernel repository.

Change-Id: I9ed021cfb5b297d9a91dce93ed6355c95fb1127b
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling 48207af92a Make doorbell-size ASIC specific
This is in preparation for gfx900, which uses 64-bit doorbells. We
maintain the same number of doorbells per process by making the
doorbell page size bigger.

KFD will need to implement the same rule.

Change-Id: I3c4110869b191b83943b5a390a48edfc94d941d8
2017-02-27 10:39:05 -05:00
Amber Lin 9ba2b68fdb Add gfx900 support
Add gfx900 device to the support

Change-Id: I71f30ef43e5e0ef0e7b5f18205b6cc4767d9d861
2017-02-27 10:39:05 -05:00
Amber Lin 1025579c0b Implement PMC AcquireTrace
Existing code uses lockf to ensure exclusive PMC access of one process and
one TraceId. However Thunk spec allows hsaKmtPmcAcquireTraceAccess to get
exclusive access to the defined set of counters, not exclusive to one
process or one TraceId. Multiple counter sets of multiple TraceIds is
allowed if they meet the concurrent access limit evaluated by the hardware
/driver.

Change-Id: I59cacb855a707fe326a4070452fcbbd3c95ac223
2017-02-27 09:33:58 -05:00
Felix Kuehling 64104fc8d9 Avoid COW after fork for API-allocated system memory
Change-Id: I5c7175114c4e6411d3beb5557e16cb71ddb01189
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-02-23 10:28:45 -05:00
Amber Lin cb60c5f18a Support multiple blocks in RegisterTrace
Existing code assumes all counters sent to hsaKmtPmcRegisterTrace belong
to one PMC block and this block is SQ. This patch considers cases when
counters are in different blocks, and removes the hard-coded SQ. As a
matter of fact, SQ is non-privileged so the user even shouldn't use SQ
counters to register/release trace. This patch also ignores
non-privileged blocks as what HSA Thunk spec describes.

This patch also records counters information in trace structure so
AcquireTrace can get counters information using that TraceId.

Change-Id: Ifa5741050553d4615baab01f7485a9e09435b019
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-02-21 15:32:18 -05:00
Harish Kasiviswanathan e79521b556 Add API entrypoints for Cross Memory Attach
Implement two new API for cross memory read and write operation.
 - hsaKmtProcessVMRead
 - hsaKmtProcessVMWrite

Add new ioclts necessary for the above APIs.

Change-Id: I0c153e3b4e1f32b7a8b102ad5c774d9ae9bfc2fa
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2017-02-17 16:59:51 -05:00
ozeng b3c3f7bae1 libkmt: Change files mode back to 644
events.c and queues.c were accidently changed to 755 by change
fc70f0c30976f4021f7d763bfc10d76a76029553. Change them back.

Change-Id: If51c0b91139afc23e9051cf94c83d61fc20297e6
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-02-16 15:09:26 -05:00
James Edwards acb1f0b689 Add --dirty tag to utils.cmake
Change-Id: Ib2487eade8d88530df34dfb8e9b442e547e9f52d
2017-02-14 11:19:33 -06:00
James Edwards 9e81f0f5e2 Add libpci to CMakeLists.txt for thunk.
Change-Id: I0228035ce769feaf54cbca75f076f73614cbb9cc
2017-02-11 16:24:54 -06:00
James Edwards a5a30e8199 Fix file stripping for release builds.
Change-Id: I538c366f0992980ffff1ef337807035b9378845c
2017-02-09 14:42:14 -06:00
James Edwards bc44715be2 Update libhsakmt CMakeList.txt file for tag based versioning and proper packaging
Change-Id: I63e82deefa8377ced810d99b5b2f0457299048a6
2017-02-08 14:42:29 -06:00
Felix Kuehling 74ebfca9f0 Free BOs before munmapping
This avoids unnecessary evictions and failed restores due to the
munmap of userptr BOs that are just about to be freed.

Change-Id: Icf2f0b73991455556a201c54c05ea7e20af80f47
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-02-07 10:45:37 -05:00
ozeng cb0f851560 libkmt: Misc fixes in thunk
1. Translate thunk queue priority to kfd priority
2. Initialize event SyncVar
3. Added HSAint32 data type


Change-Id: I7decc1be7cbe9c84cb670d9a7c99050b62ba98f3
2017-02-06 17:19:40 -05:00
Amber Lin 9dadac6dc9 Add IOMMU to performance counter table
Add IOMMUv2 to blocks returned by hsaKmtPmcGetCounterProperties(). IOMMU
information is read from sysfs.

Change-Id: I3a1c6f902f947913570a78700fc0ffc444e1dd72
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-02-03 14:35:27 -05:00
Amber Lin d4dbf562a9 Replace spaces with tabs
Thunk follows Linux kernel coding convention to use tabs instead of
spaces.

Change-Id: I4eddcfa9a0513f16c869d9cc63f9f1dae0c39f83
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-02-03 10:13:24 -05:00
Felix Kuehling a90abcb317 Allocate paged system memory as userptr
Change-Id: I0864e678681788df37eccd9d7ebc70086e1f93bf
2017-02-02 10:32:53 -05:00
Amber Lin 72b842a6dc Sync up gfx803 DIDs with KFD
Add gfx803 10/11 device IDs that were recently added to KFD.

Change-Id: Id40b117ae47bacedefa6e333fdfdf58dea92cd2d
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-02-01 12:05:24 -05:00
Harish Kasiviswanathan f1f62d863c Add fork support
If fork() is called, clear all duplicated data that is invalid in the
child process.

Change-Id: I4e27198060db593c630c6337b7071dfbd0d80b83
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2017-01-12 14:38:30 -05:00
Felix Kuehling 4181b408fc Allocate CWSR buffer in system memory
CWSR buffers can be large on dGPUs (~21MB on gfx803). Allocating them
in VRAM limits the number of queues that can be created unnecessarily.

Also make freeing of per-queue buffers symmetric with allocation. All
buffers are now allocated with allocate_exec_aligned_memory on dGPUs
and APUs, so use free_exec_aligned_memory to free them.

Change-Id: I45e8cb1801857d0268750202cdd422426611e457
2017-01-04 16:07:56 -05:00
Harish Kasiviswanathan 559e31d6ff Add API entrypoints for IPC functionality
Implement three new APIs for IPC buffer sharing:
	-hsaKmtShareMemory()
	-hsaKmtRegisterSharedHandle()
	-hsaKmtRegisterSharedHandleToNodes()

Add new ioclts necessary for the above APIs.

Change-Id: Ia2b4d0dc91ec64bff959395d11c0536467404792
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2016-11-28 16:19:22 -05:00
Harish Kasiviswanathan 4838f6e740 kfd ioctl. Remove unused definitions
Sync the file with Kernel header file

Change-Id: I52d52a38fb38bd4b37d8210ce79775b88c8a985d
2016-11-22 09:44:22 -05:00
Amber Lin 2a50ebba98 Allow a memory to be registered multiple times
A memory region is allowed to be registered multiple times when the memory
is specified by a user pointer. If it's registered with the same user
pointer but with different sizes, it's treated as different instances and
multiple VM objects are created with different GPU address.


Change-Id: I49627111bb5db36d18f1133b252fb62a611f06a4
2016-11-18 17:46:12 -05:00
Yong Zhao a1f417715b Making the code more robust by checking the NULL pointer
Change-Id: I36b9f73eadd7547c71fe3641ac131c7408b14816
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2016-11-16 11:35:26 -05:00
Sean Keely c54c63fd56 Set the default value of userdata in pointer info to NULL.
Change-Id: Ie0d94b921bbce880d9548d5a014a2d7c33062f7e
2016-11-09 21:15:07 -06:00
Andres Rodriguez 0de39b6724 Fix hsaKmtOpen incorrectly doing nothing in some fork scenarios
Currently, if a process' parent called hsaKmtOpen, the child will be
unable to open a connection to KFD, since kfd_open_count will be > 0.

When forking, the refcount should be reset, in order to allow the child
to re-open /dev/kfd.

Change-Id: Ia4b78f6bacc4f82e8ac724e5f488a3eff5084007
2016-11-01 15:54:17 -04:00
Jay Cornwall 5493ae420b Disable GPUVM-mapped doorbell on gfx802
gfx802 requires a workaround for a VM TLB bug in which lookups use
the ACTIVE bit of the 8th PTE within any aligned group of 8 PTEs.
Until this is fixed in amdgpu the GPUVM doorbell logic will fail.

Change-Id: I5ec7b1fcd8b7677011a141d27cfc486c45d9a415
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
2016-10-10 18:39:31 -05:00
James Edwards b170e0ad8c Fix CMakeList.txt file to use correct compile options. Fix compilation errors.
Change-Id: I6229a83d0823ee7a123cdaa9efd782108aa3a03c
2016-09-30 16:36:01 -05:00
James Edwards 7511631f08 Add libhsakmt cmake build and packaging files.
Change-Id: Ic7fa22d5b266480aa0c62628022f39da4e043d23
2016-09-20 17:48:36 -04:00
Felix Kuehling 2e0a6eb371 Allocate and map doorbells in SVM for discrete GPUs
Allocate doorbells for dGPUs in the SVM aperture and map them for
GPU access. This is necessary to allow GPU-initiated submissions to
user mode queues.

Depends on new doorbell BO allocation flag in KFD.

Change-Id: I0737bef4a4764bb4a66c43846707ead2108f6601
2016-09-16 16:04:27 -04:00
Amber Lin 660a6ebbd4 Disable CPU cache info in non-x86
CPU cache information reported by Thunk topology is obtained from cpuid
instruction. This instruction only applies to X86 systems. It can cause
compile errors on non-X86 platforms. This patch temporarily disables CPU
cache functions in topology for non-X86 platforms in order to compile.

Change-Id: If86671817b0d036cb324eebf3f354682bfb75856
2016-09-14 17:30:50 -04:00
Amber Lin 4911c91389 Search VM object by range
Add vm_find_object_by_userptr_range so QueryPointerInfo can find the
object as well when the pointer is not the starting address but it's
inside the memory range. Also rename vm_find_object_xxx functions to
_by_address and _by_address_range to be consistent.

Change-Id: I5c2b3a05b41493e32b7fd9154665bf078b043606
2016-09-13 12:44:29 -04:00
Amber Lin 19f2676ea7 Pointer attributes on APU
Add CPUVM aperture to keep track of memory allocation that is not known
to GPU driver. Together with GPUVM, this patch adds the pointer attributes
support to APU.

Change-Id: If13f9cf01ff8b9f709b99b66661e7505246adf4c
2016-09-12 11:32:26 -04:00
Amber Lin 51e4d27c37 Add pointer attributes API
Add two pointer attributes APIs:
hsaKmtQueryPointerInfo - allow the user to query the memory information
    using a pointer. This pointer can point to any address inside the
    range known to HSA.
hsaKmtSetMemoryUserData - allow the user to attach data to a pointer to
    add memory tracking information. This pointer must match the start
    address of a memory allocation or registration.
TODO: This patch implements support on dGPU. Needs to add APU.

Change-Id: I4711809274248434901f0794f50ebfa13a7371a8
2016-09-07 17:24:46 -04:00
Yong Zhao 8351b3d2e8 Implement hsaKmtGetTileConfig in thunk
Change-Id: Iba8d8efa46e3c268a03442d3db568e1b19230e94
2016-09-06 16:24:29 -04:00
Lan Xiao df593aa076 libhsakmt: Marketing Name and AMDName support for APUs
For APUs, use /proc/cpuinfo to get Marketing name.



Change-Id: I4a17516d26a092683f36631032be00ad44f7e7fe
Signed-off-by: Lan Xiao <Lan.Xiao@amd.com>
2016-09-02 15:16:18 -04:00
Andres Rodriguez b1d2867b60 LICENSE: add X11/MIT license file
Change-Id: I2e95af843046896708bb7a116f7b03a0fa30a255
2016-08-25 16:27:46 -04:00
Andres Rodriguez e0c77a38cb Makefile: remove 32bit thunk compilation by default
Compiling in 32bit mode is broken, and we don't have an intention on
restarting compatibility with 32bit apps.

Change-Id: I5524b5b63fe62e6026aa04d84c4510e290a86106
2016-08-25 16:27:19 -04:00
Lan Xiao 9cbbf30be7 libhsakmt: Add MarketingName and AMDName for all nodes - CPU & GPU
HSA thunk API is currently reporting engineering name to MarketingName
and returning NULL when querying for AMDName.

-Change current name reporting from MarketingName to AMDName.
-Use libpci to get MarketingName



Change-Id: I819a6de7b067a2e724a6695e7d800274b83a71f8
Signed-off-by: Lan Xiao <Lan.Xiao@amd.com>
2016-08-23 10:49:27 -04:00
Kent Russell 70b1b5b17e queues.c: Enforce CUMaskCount being a multiple of 32
The thunk spec requires that CUMaskCount be divisible by 32. Check this
and return INVALID_PARAMETER if it is not.

Change-Id: I4e0c8502d996d3da31224b817a5d4ff2c6054e13
2016-08-23 06:16:39 -04:00
Yong Zhao 9c9bfa30c0 Fix a bug when mmap fails
EventId is needed in calling hsaKmtDestroyEvent() when mmap failed,
so we should move it ahead of mmap call.

Change-Id: I5f4288b953611799a02b0e988d6b2e48104466a0
2016-08-18 14:30:03 -04:00
Amber Lin 0b5c65a903 Add performance counters for gfx803
Counter IDs in SQ_PERFCOUNTER0_SELECT are identical on gfx803 10 and
gfx803 11.

Change-Id: I5cfefd44b52989efd1d89311cf8c70c84ea2b230
2016-08-08 18:10:51 -04:00
Amber Lin 876384305b Add gfx803 support
Add gfx803 and gfx80311 device IDs to the support

Change-Id: I16220fd811db102c02e5e0c5b82e40ec299877af
2016-08-08 11:30:57 -04:00
Amber Lin 6c4d19a9d2 Shorten the device list in PerfCounter
get_block_properties uses the complete DID to identify the GPU. This list
is getting too long when more devices are added. Reading the 12 most
significant digits is good enough to identify the GPU.


Change-Id: Ieebb05402bbe08af12eb7289dfeb5bbf1f515b0f
2016-07-27 17:21:31 -04:00
shaoyunl bf16caa75f libhsakmt: Compute context save area size depends on CU num
Change-Id: Iaf35ddeee9fe5a6367097483f67c4adaa0796d7d
Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>
2016-06-10 10:19:40 -04:00
Amber Lin 6d21c4e753 Add performance counters for gfx70x
Add performance counters for gfx70x. The reference is the gfx7 register spec.
The register being looked at is SQ_PERFCOUNTER0_SELECT.

Change-Id: I344bfb7452f6148f4dc268163d12c553c6be8424
2016-05-20 16:24:36 -04:00
shaoyunl 16d5aa0d83 libhsakmt: Add new device id for virtualized function of gfx803
Signed-off-by: Shaoyun Liu <Shaoyun.liu@amd.com>
Change-Id: I90b0bdaeaed8e9e80375e5a7a142205f2a542289
2016-05-12 13:25:01 -04:00