Commit Graph

102 Commits

Author SHA1 Message Date
Kent Russell 67d98aa280 Add option to create release build for Thunk
By adding REL=1 to the make command line (e.g. make REL=1 deb), we can
create a release build of the Thunk. This will not affect existing
functionality, and will only have an effect if REL=1 is specified on the
command line, or in the build_thunk.sh script.

Change-Id: Iedc3b6094e70a4ebd726499eda56013cc254b83d


[ROCm/ROCR-Runtime commit: cb3a664065]
2015-10-30 14:05:40 -04:00
Kent Russell 39d2152a3f Cleanup RPM build of thunk
Change-Id: Ib437a3ec7be9f5aa7d3ef9e53c13e3c5e7b7382e


[ROCm/ROCR-Runtime commit: cabbcbabff]
2015-10-30 08:42:16 -04:00
Felix Kuehling b900df9215 Use correct aperture for _fmm_unmap_from_gpu_scratch
Passing in the wrong aperture resulted in failure to unmap scratch.


Change-Id: Icd7423abfb1bcc773b33becffcbefc233f4ff340


[ROCm/ROCR-Runtime commit: bd93eecc64]
2015-10-29 18:26:15 -04:00
Philip Cox 782cea350c Add SDMA IOCTL type to Create Queue function.
Change-Id: I7e31507b761ca388b2cac93f994f6106de962f17


[ROCm/ROCR-Runtime commit: 0c234c7ef3]
2015-10-29 10:25:41 -04:00
Kent Russell f4889d439d libhsakmt - Add make option to package thunk as RPM
Add an option to libhsakmt to allow the thunk to be packaged as an RPM.
The default will remain being built as-is, but this can now be packaged
as an RPM by using "make src rpm" . build_thunk.sh will be modified to
reflect this new option.

Change-Id: I38e03d10cfb5035bdf0a87635a784c47a709a5b6


[ROCm/ROCR-Runtime commit: 6ceed7def3]
2015-10-29 07:49:13 -04:00
Harish Kasiviswanathan 595f51899f Remove erroneous and redundant memory banks reported
hsaKmtGetNodeMemoryProperties -
	- Return only HSA_HEAPTYPE_SYSTEM memory for CPU only node.
	- For dGPU remove redundant HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE
	  entry.

Change-Id: I0349be39b8409a0fd64a038b8b2956191356d937


[ROCm/ROCR-Runtime commit: f885e551aa]
2015-10-23 18:43:46 -04:00
Harish Kasiviswanathan 71dc59b245 Correct parameter name for topology_is_dgpu()
The function expects device_id and not gpu_id.

Change-Id: I79794fd4e58e6e6adb26659da30f3e4d8e108434


[ROCm/ROCR-Runtime commit: 69662da3dc]
2015-10-23 18:43:45 -04:00
Harish Kasiviswanathan d7589c62e1 Unify fmm_get_aperture_xxx functions
Unify fmm_get_aperture_base and fmm_get_aperture_limit into one
function. Make the return value to HSAKMT_STATUS.

Change-Id: I0b3f563ffb268947ab891f4935f61788d0af0e01


[ROCm/ROCR-Runtime commit: cb53548c89]
2015-10-23 18:43:34 -04:00
Felix Kuehling 29561cc13e Implement flat scratch support for dGPU
hsaKmtAllocMemory only allocates aligned address space and sets up
the scratch_physical aperture to match the allocated address space.

Actual allocation of backing memory happens in hsaKmtMapMemoryToGPU.

Change-Id: Ie709815ab9bedb3d682e096b4005fdfb5e94d3a7


[ROCm/ROCR-Runtime commit: 5131ab4e64]
2015-10-22 20:40:22 -04:00
Felix Kuehling 17a31f1cce Allow address space allocations with specific alignment
Change-Id: I4bf7f7ac53c3921dd330b9dc7a40582611f88b69


[ROCm/ROCR-Runtime commit: 149261ba09]
2015-10-22 20:27:49 -04:00
Ben Goz 4ec82c7edd Casting local memory size to uint64_t
Change-Id: I5c2010056b84ac01bb65361210d2a693e437050a
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: 55b1a5dc43]
2015-10-22 09:05:34 -04:00
Ben Goz a511b7c4f7 Adding support for new AQL Queue Memory allocation
Change-Id: If84fc4b961627dbdd0b77b1c509a3c9a4c709b9f
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: e61500c46e]
2015-10-22 13:13:54 +03:00
Felix Kuehling 55bd82cd89 Fix node 0 system memory allocation for dGPU
This is a hack to allow the Runtime to allocate system memory with
PreferredNode=0 on a dGPU system. We allocated it from Node 1
instead so that the node 1 GPU can map the memory. A proper fix
will be implemented together with multi-GPU support.

Change-Id: Ieb52599e5275781c04ee34405ea850bf782c523a


[ROCm/ROCR-Runtime commit: 590c8e522c]
2015-10-21 20:00:01 -04:00
Felix Kuehling ebf6ec1806 Reserve more SVM process address space
Try to reserve as much SVM address space as GPUVM can address.
Implement a fallback scheme to smaller sizes if larger allocations
fail or are not addressable by the GPU, down to an (arbitrary)
minimum of 4GB.

Change-Id: I770177834cc9e6ddd6ef4f20d789eab63c8055cb


[ROCm/ROCR-Runtime commit: 39bde26c9b]
2015-10-19 17:44:23 -04:00
Andres Rodriguez f6eba4d367 make: add 'deb' target for creating deb packages
When 'make deb' is run create a libhsakmt.deb archive that installs
libhsakmt into the appropriate folder on the target where the dymanic
linker can find it.

Change-Id: I32de7198975f7831e509a67371e78456982b5c42


[ROCm/ROCR-Runtime commit: 0df346aaf9]
2015-10-16 19:13:51 -04:00
Harish Kasiviswanathan d38a3f1438 Fix init process apertures
Kernel ioctl AMDKFD_IOC_GET_PROCESS_APERTURES returns process apertures
only for GPU nodes. The current implementation assumed that this list of
GPU nodes returned by the ioctl has one to one correpondence to sysfs
topology nodes. This fails when non-GPU nodes exist in topology as in
case of Intel + gfx802

Fix this by using gpu_id (./sys/.../kfd/topology/nodes/1/gpu_id) to map
information obtained from kernel ioctl call.

Change-Id: I4ab8ae5354f12cf0b6609fc4b24182b82eb3677f


[ROCm/ROCR-Runtime commit: 5cc56a2647]
2015-10-15 15:38:14 -04:00
Harish Kasiviswanathan 462a775ec3 Fix hard-coded usage of Node 0
Use appropriate NodeId instead

Change-Id: I46af93b76978fea7bedb34457fcc0864ed4fe2d4


[ROCm/ROCR-Runtime commit: b6c6f79143]
2015-10-14 17:27:38 -04:00
Felix Kuehling 574fcdd340 Fix various dgpu memory management issues
Fix TONGA_PAGE_SIZE value and move it to libhsakmt.h for usiing it
consistently in all places that require the same alignment for the
same reason. Create a generic alignment helper macro to replace some
incorrect hand-coded size alignments.

Move virtual address and size alignments down into aperture management
functions. Alignment is a per-aperture property that is set during
fmm_init_process_apertures. Doing the alignment there ensures that
all allocations in the same aperture are aligned the same way. Finding
objects by size and address can take the alignment into account.

Also align the size of physical allocations to back aligned virtual
address allocations. CPU mappings do not need to be aligned.

Map anonymous pages over released memory mappings to allow the
backing pages to be released, while keeping the address space
reserved.

Add alignment parameter to free_exec_aligned_memory_gpu to match the
interface of allocate_exec_aligned_memory_cpu. It doesn't make sense
to allow an alignment parameter in one but assume a specific
alignment in the other.

Change-Id: I74226ca6938f4948f643e5aee1d474720cd89e78


[ROCm/ROCR-Runtime commit: 6a5ca4bc5a]
2015-10-13 19:14:56 -04:00
Felix Kuehling a4c4170906 Add support for gfx803
Create new device_info and add device ID. Add helper macros to
identify chip families (VI, discrete). For now gfx803 behaves like
gfx802. But if necessary we can have gfx802 or gfx803-specific
code paths or workarounds in the future.

Change-Id: I61b4ffef7dd7796bb34cb01fbff0089bd49507bb


[ROCm/ROCR-Runtime commit: 0fc0a5b526]
2015-10-09 17:40:54 -04:00
Harish Kasiviswanathan 72cc7c2234 Fix assert failure for CPU only node
hsa_gfxip_table lists only (supported) GPUs. So assert fail only when a
non-supported GPU is detected.

Change-Id: I6207dc7cd55860c8b3348b6a4ca6102131975722


[ROCm/ROCR-Runtime commit: 758824db17]
2015-10-08 11:52:59 -04:00
Harish Kasiviswanathan ee891bed05 Refactor hsa_gfxip_table lookup
Also fix some formatting

Change-Id: Ia04d7a9cd3972cc4d283c576161de639027aac6d


[ROCm/ROCR-Runtime commit: f2a46101d3]
2015-10-08 11:52:59 -04:00
Felix Kuehling 8f0b7e6a76 Update HsaMemFlags.ui32.CoarseGrain comment
As advised by Paul Blinzer

Change-Id: Icabf4acd94866ddbbe53faf48a71e1113f0c76b6


[ROCm/ROCR-Runtime commit: b94ae66c62]
2015-10-05 16:48:50 -04:00
Felix Kuehling f09c6b84af Setup APE1 on dGPU for coherent access
The default is non-coherent access for better performance on dGPU.
Disabled hsaKmtSetMemoryPolicy function on dGPU to prevent app from
overriding the APE1 settings at runtime.
Fixed dGPU VM aperture limit to be inclusive.

Change-Id: I378ff74a654f533572775c0c97c19779a56bc6d9


[ROCm/ROCR-Runtime commit: 8e836f8183]
2015-10-02 17:20:33 -04:00
Felix Kuehling c3a1263604 Add all gfx802 device IDs to supported_devices
Without this, queue creation segfaults on unknown devices.

Change-Id: Ieea0bc4783e7313b3dcdabf03ab1269e3670b217


[ROCm/ROCR-Runtime commit: 7505893cc7]
2015-10-02 15:33:37 -04:00
Felix Kuehling 9dd2664db2 Fix returning of base and limit on dgpu_mem_init reinitialization
Change-Id: I1d1500ee57c3b85fc39c224d233a62097f981719


[ROCm/ROCR-Runtime commit: f3aaba0621]
2015-09-30 18:07:04 -04:00
Felix Kuehling 048207626a Add CoarseGrain memory flag
Change-Id: If8ac0339ae8c809c6e6a4f56592a4061d110ea94


[ROCm/ROCR-Runtime commit: f2f45cc0e4]
2015-09-30 18:07:04 -04:00
shaoyunl d30daaba5d Initiali support for CWSR on thunk
1. Add IOCTL defines to set trap handler
2. Add control stack size information on create queue argument.
3. Increase the total save&restore area size for carrizo to include the control stack size.

Signed-off-by: Shaoyun Liu <Shaoyun.liu@amd.com>

Change-Id: Iccf15e073b7db2519e96e7f7b46a89d57ab9a4df


[ROCm/ROCR-Runtime commit: 2d63ee7b8f]
2015-09-25 15:12:25 -04:00
Harish Kasiviswanathan 2d8dd3a483 Merge "Sync up HSA_ENGINE_ID type with Windows/Perforce" into amd-staging
[ROCm/ROCR-Runtime commit: 1897acd78e]
2015-09-24 11:03:23 -04:00
Amber Lin 2470a3e67b Sync up HSA_ENGINE_ID type with Windows/Perforce
HSA_ENGINE_ID in Perforce added ui32 to the typedef while in Git it doesn't.
This causes conflicts to RT applications. Decision being made is to change Git
to match Perforce.

Change-Id: I7e9c6437b023bb23ec9578737f8534e9453589b9


[ROCm/ROCR-Runtime commit: 082f8314c4]
2015-09-24 00:10:52 -04:00
Harish Kasiviswanathan 416d21296a Fix VM range for dGPU local memory
Currently, Kernel imposes a limit on VM. Thunk should be aware of it.
This fix is required till Kernel VM limit is sorted.

For now both "Host Access" memory and "Local Memory" share the same VM range.

Change-Id: I5a9220face20df9ede2b78bd6201a01dd2ea70e0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 1438f15fd0]
2015-09-23 16:18:50 -04:00
Harish Kasiviswanathan 2e6101b73f Fix mem size variable type
Memory size is 64-bit. So use HSAuint64 instead of uint32.

Change-Id: Iaa607dec9c1a1c5ac46ea442fd482210ea550b45
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 4b768872c0]
2015-09-23 15:33:54 -04:00
Amber Lin c1fbcd5c63 Enable GFXIP version info for dGPU
Add GFXIP version 8.0.2(major.minor.stepping) for gfx802 and 8.0.3 for gfx803.


Change-Id: Icc7cac6b2e8a78d9cff4105aeb2bfcd2c7759027


[ROCm/ROCR-Runtime commit: f7fffdc2be]
2015-09-22 15:04:43 -04:00
Ben Goz eaed099317 Adding support for local memory on dGPU
Change-Id: I1a926b11730ba295605eeb37c9b1fc438bed8a64
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: 6170080cf6]
2015-09-21 14:13:15 -04:00
Ben Goz 1b2fd315ac Adding new memory allocation IOCTL
Change-Id: I0eb1924811a2e1e436296ebe632d8f112a61637d
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: 692e004047]
2015-09-21 13:58:32 -04:00
Harish Kasiviswanathan 9cf5049f77 Revert "Topology, memory allocation, cleanup issue for gGPU"
This reverts commit 0dc4437390.

Change-Id: I92a4ed91bf566259916d1a96207e1fe9a6099c31
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 3e9773ff2c]
2015-09-21 10:47:30 -04:00
Harish Kasiviswanathan 0dc4437390 Topology, memory allocation, cleanup issue for gGPU
Patch submitted by Besar Wicaksono

1. Bug on detecting local memory size interpreted as 32 bit value
instead of 64. The bug causes thunk to go into an infinite loop trying
to reserve virtual address range for dgpu system memory.
2. SIMD count in the node property is 0. Runtime use this attribute to
find a gpu device.
	Regarding other attributes of intel+tonga topology, Harish started a
	discussion on August iirc, could you please share an update ?
	This would help me progress with more tests such as scratch memory,
	which require the scratch aperture information in order to construct a
	buffer srd in gpuvm space.
3. Bug on releasing memory via fmm_release, where no actual release is
being done. The vm_object can't be found because the memory size does
not match due to the allocation padded the size with 32KB.
4. Pointer arithmetic on vm_area allocation/release. The value of
vm_area_t::end seems to be interpreted inconsistently whether it is
(start + size  -1) or (start + size).
	One example of potential issue I see is the logic could report
	larger size of the hole in the vm area list.
5. Resource cleanup on multiple library load/unload within a single
process.
	- Any memory allocation on subsequent library load will result
	an error "va above limit". To my understanding this is due to
	the reserved memory for the system memory not being released on unload.
	- The static variable events_page needs to be invalidated
	appropriately on library unload so the next load could
	reinitialize it.
6. Could you please update if AQL queue is ready to test with the stg
kfd/kmt ?
7. The system memory allocation with size larger than 32KB seems to be
padded by an extra 32KB. I was wondering if we could remove this
overhead.

Change-Id: I039988d36637525089c7569dc3b77e58750e2121


[ROCm/ROCR-Runtime commit: ee08f537a7]
2015-09-15 13:15:04 -04:00
David Ogbeide 7ea9567094 libhsakmt: specify build output via variable
Makefile currently sends build output a default location.
Allow choice of build output location if so desired
using a variable.



Signed-off-by: David Ogbeide <davidboyowa.ogbeide@amd.com>


[ROCm/ROCR-Runtime commit: 8a01cd1212]
2015-09-01 14:30:53 -04:00
Ben Goz 9db147f2d4 Support gfx802 dGPU
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: fb8378a18b]
2015-08-30 14:13:53 +03:00
shaoyunl 779c76a4e7 Minor fix in libhsathunk for KFDMemory test
Signed-off-by: shaoyun liu(shaoyun.liu@amd.com)
Reviewed-by: Ben Goz(Ben.Goz@amd.com)


[ROCm/ROCR-Runtime commit: 2dff5cabfa]
2015-08-05 17:32:00 -04:00
Ben Goz 918ad5eac6 Revert "Enable creating SDMA queue."
This reverts commit fcf6e22216.


[ROCm/ROCR-Runtime commit: bb4a5cddd9]
2015-08-05 13:33:42 +03:00
Amber Lin 92e34fcda6 Enable version info via thunk interface
- Replace HSAuint32 with HSA_ENGINE_ID for EngineId type so it explicitely
  presents version information for ucode and GfxIP
- Created a GfxIP lookup table to pass the version information. This lookup
  searches for matching device ID.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: John Bridgman <John.Bridgman@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: a3925a3a19]
2015-07-31 14:56:33 -04:00
Flora Cui 0f0e28cbdb Add interface to set CU mask
Signed-off-by: Flora Cui <flora.cui@amd.com>
Acked-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: fc4e07daa3]
2015-07-23 15:44:01 +08:00
Moses Reuben 9faf7b957c adding support for scratch memory
Signed-off-by: Moses Reuben <moses.reuben@amd.com>


[ROCm/ROCR-Runtime commit: 29c083f695]
2015-07-21 16:43:23 +03:00
Oded Gabbay b7d9879a36 increase event limit to provide 4K events
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>


[ROCm/ROCR-Runtime commit: 2e76017278]
2015-05-18 11:01:42 +03:00
Oded Gabbay b4d4c4b83d Don't report local mem aperture if local mem size is 0
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>


[ROCm/ROCR-Runtime commit: 8aa0791ddb]
2015-05-05 10:51:50 +03:00
Oded Gabbay 703ffebb96 Increase limit of signal events to 4096
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Ben Goz<ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: a70a98b30b]
2015-05-03 13:58:10 +03:00
Oded Gabbay 99b25b95c7 Add missing DoorbellType field to HSA_CAPABILITY
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>


[ROCm/ROCR-Runtime commit: eb2d3cfcdf]
2015-05-02 12:10:04 +03:00
Oded Gabbay b6c7551747 Revert "Add execution property in register memory for gfx801."
This reverts commit abf7770d46.


[ROCm/ROCR-Runtime commit: 4c4df38035]
2015-04-28 17:50:00 +03:00
Xihan Zhang fcf6e22216 Enable creating SDMA queue.
Signed-off-by: Xihan Zhang <xihan.zhang@amd.com>
Reviewed-by: Ben Goz<ben.goz@amd.comt>


[ROCm/ROCR-Runtime commit: 112f7e751a]
2015-04-28 23:42:49 +08:00
Xihan Zhang abf7770d46 Add execution property in register memory for gfx801.
Signed-off-by: Xihan Zhang <xihan.zhang@amd.com>


[ROCm/ROCR-Runtime commit: 5ed05c99b3]
2015-04-10 22:26:44 +08:00