コミットグラフ

135 コミット

作成者 SHA1 メッセージ 日付
Kent Russell 1b6994a2dc Fix build location for thunk RPM
Change-Id: I4f5c7688a3e9b4dd31d8d72cae3adf9a796e38f9


[ROCm/ROCR-Runtime commit: cd6d75880f]
2016-02-12 08:29:52 -05:00
Felix Kuehling 03720306b9 Make hsaKmtAllocMemory more compliant with the Thunk spec
Allocations from GPU nodes will return VRAM, not system memory.
Only non-paged allocation from GPU nodes is supported. System
memory can only be allocated from CPU nodes (usually node 0).

The HostAccess flag is no longer used to distinguish the memory
type. It only indicates, whether the memory is mapped for CPU
access.

Maintain compatibility with broken KfdTests by returning system
memory for paged-memory requested from GPU nodes.

Change-Id: I514defede735f55e6de436f41944125b6f2c4ccf


[ROCm/ROCR-Runtime commit: 887b32fe86]
2016-02-10 10:29:54 -05:00
Yair Shachar 8359dc3119 Disable scratch Host allocation - via debug registration flags.
Change-Id: Ia6e5f86ec3979c4a49800f7af4509442a4e5be27
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>


[ROCm/ROCR-Runtime commit: a815a4337f]
2016-02-10 07:52:32 -05:00
Ben Goz 18aab410cc Adding support to hsaKmtMapMemoryToGPUNodes
Change-Id: Iab6222402a43c3cd31b0efc5a316a6482986258e
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: 7070f7ec5e]
2016-02-09 17:34:29 +02:00
shaoyunl 60bbf00fb1 libhsaKmt: Add CWSR support on dGPU
This is thunk part of the  CWSR support.
1. SDMA queue don't support CWSR , no necessary to allocate the context save/restore memory
2. Allocate the context save/restore memory in local frame buffer for dGPU

Change-Id: Ie83506f0cced2a5a537c49d68125796d831c2764


[ROCm/ROCR-Runtime commit: 4e6c25e55b]
2016-02-04 15:00:58 -05:00
shaoyunl 4c5a3ca774 libhsakmt: Use GPU ID instead of Node ID in set_process_dgpu_aperture
Change-Id: I0e66ca4a018c15c009a3516d250f0044a4407878


[ROCm/ROCR-Runtime commit: 7e40877e81]
2016-02-04 10:32:23 -05:00
Andres Rodriguez cd849bc3e9 Bump version for bugfix release 1.8.1
Change-Id: I06701905592594221d26c075a8fe370b4cc92aff


[ROCm/ROCR-Runtime commit: 3797b56ec9]
2016-02-02 01:29:51 -05:00
Ben Goz 07a0c70dd5 Adding HsaMemMapFlags struct
Change-Id: Ib0ee6dede1169582fd58bfca648347c3f8aa0b54
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: e37863d7f2]
2016-01-31 05:16:53 -05:00
Felix Kuehling 61039bcd36 Remove gfx802 page size workaround on gfx803
All tonga page size alignment is done in the memory management
functions in fmm.c. All other code only specifies the minimum
alignment it needs and lets fmm.c handle the HW-specific
alignment.

Clean up aligned-exec memory allocation in queue.c to remove
hard-coded TONGA_PAGE_SIZE alignments and remove code duplication.
Make sure alignments are consistent between allocate and free.

Change-Id: Ia8923448173d1cef315af24cebff12adef385cb0


[ROCm/ROCR-Runtime commit: cc9fc386bd]
2016-01-28 16:05:18 -05:00
David Ogbeide 8fce9f7026 libhsakmt: Add marketing names for GPU nodes
HSA thunk API returns null when querying for GPU node marketing
names due to empty system topology file.

- Add marketing names to device GFX IP data structs.
- Modify name retrieval to pull from data structs instead of file.



Signed-off by: David Ogbeide <davidboyowa.ogbeide@amd.com>

Change-Id: I30ea04111be7e0df2e93894f801fbeb414ffa790


[ROCm/ROCR-Runtime commit: 4e4a881940]
2016-01-25 11:03:54 -05:00
Felix Kuehling 8ea4e037c8 Add simple test for unloading and reloading Thunk
Change-Id: I4ca95dee8a180023d1de5f69161607dd368164de


[ROCm/ROCR-Runtime commit: 641bfd2cd5]
2016-01-22 18:41:53 -05:00
Felix Kuehling db5b6fd35a Link libhsakmt with -z nodelete
This prevents the library from being unloaded at runtime, even when
dlclose is called. This preserves global variables, such as state
about the SVM address space and avoids catastrophic leaks on dlclose.

Change-Id: I34f1d19a450835200e9d4815458e8d1b3045053c


[ROCm/ROCR-Runtime commit: cc7491ec71]
2016-01-22 18:08:19 -05:00
Amber Lin 07500db1df Revert "Free resources when dlclose is called"
This reverts commit 4dd9dbb128.

Conflicts:
	src/fmm.c
	src/perfctr.c

Change-Id: Ib6113c2dd3962c72100c7f74cdef6897e1df40b3


[ROCm/ROCR-Runtime commit: 7416805a44]
2016-01-22 17:58:33 -05:00
Serguei Sagalovitch f5bebcf875 Fixed logic to return data back to user
Change-Id: I324d07c38e8d7eb202d4dccfed6e62006cf9cd29
Signed-off-by: Serguei Sagalovitch <Serguei.Sagalovitch@amd.com>


[ROCm/ROCR-Runtime commit: f44982a7ca]
2016-01-22 14:49:18 -05:00
Serguei Sagalovitch b10380d783 Skeleton for RDMA unit test v4
Added application and driver to serve as the starting point for RDMA
unit test uility.

v2: Added initial mmap support
v3: Fixed logic to find correct ioctl handler
v4: Fixed logic in mmap to find correct pages table

Change-Id: Iaf97c0eb2acef2160d542c71afed58cf400414f7
Signed-off-by: Serguei Sagalovitch <Serguei.Sagalovitch@amd.com>


[ROCm/ROCR-Runtime commit: 47cef87a34]
2016-01-21 15:20:24 -05:00
Harish Kasiviswanathan b687eaf2c2 Don't limit number of supported HSA Nodes
Remove #define MAX_NODES 8

Change-Id: I756cadc652543dd17ea48a1c956adc08c3d2631a


[ROCm/ROCR-Runtime commit: 5e53205b9e]
2016-01-15 17:27:43 -05:00
Harish Kasiviswanathan 14358ee07f Don't limit number of supported GPUs
Stop using NUM_OF_SUPPORTED_GPUS. For now the definitions itself cannot
be removed as ioctl code is in upstream Kernel.

Change-Id: If846625a8ad5062d5483e762850c793d3c00b9d0


[ROCm/ROCR-Runtime commit: ce83dc623f]
2016-01-15 11:44:42 -05:00
Harish Kasiviswanathan add443f1ef Use new ioctl for getting process apertures
Change-Id: I73678744ad73942edec442ad9c6d38637f7e1235


[ROCm/ROCR-Runtime commit: e7e1361c3d]
2016-01-12 12:09:25 -05:00
Felix Kuehling c89d3124d9 Implement hsaKmtRegisterMemoryToNodes
Fix hsaKmtRegisterMemory to be a no-op for now and move the multi-GPU
implementation to hsaKmtRegisterMemoryToNodes. Make GPU memory mappings
of host memory visible to all GPUs by default. Device memory is still
visible to the allocating GPU only by default (but can be overridden
with hsaKmtRegisterMemoryToNodes for experimenting with P2P).

Change-Id: I73408afbe3b10c8dad2ab3a780f58413249692e6


[ROCm/ROCR-Runtime commit: 063ad3ad9e]
2016-01-08 16:00:23 -05:00
Ben Goz 2fa7eef572 Adding support for mGPU
Change-Id: I5ed184e6a58b38d9dde48867f14513d161cf41a9
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: ea0f9d2a0b]
2016-01-04 15:35:15 +02:00
Ben Goz d874bcd8b3 Fix AQL Double buffer allocation mode
Change-Id: I5162ffd89416d317fd0ca0fc51da523298488922
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: 53b208adf2]
2016-01-04 15:34:53 +02:00
Yair Shachar 63f646d050 Add support for scratch GPUVM on host memory
This is required when we have a debug session



Change-Id: If9d6d2d23a9016b6ca9562e02a91fc16e0354ee4
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>


[ROCm/ROCR-Runtime commit: 681f4dcecc]
2015-12-20 15:50:50 +02:00
Harish Kasiviswanathan 8bf76bdf67 Fix node_id in gpu_mem[] array
Change-Id: I4897623612e1749e275fb97ce1603dc5130fc9ce


[ROCm/ROCR-Runtime commit: 39bf9c6611]
2015-12-14 16:25:18 -05:00
Amber Lin 4dd9dbb128 Free resources when dlclose is called
When the Thunk is initialized multiple times in the lifetime of a single process
, some global resources are leaked. This can happen when dlopen and dlclose are
 used to load the library at runtime, rather than linking the runtime against
the Thunk. This patch adds the destructor to release global resources when
dlclose is called.


Change-Id: Ia00da0d41f095d0b2706f98c0e75effedd596f49


[ROCm/ROCR-Runtime commit: 582b70f9c3]
2015-12-11 16:32:41 -05:00
Yair Shachar f01386b61c Add support for per device debug register state tracking
Change-Id: I8d51670f5de8d379ead898d484f668a8034f9878
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>


[ROCm/ROCR-Runtime commit: 8f529e3c72]
2015-12-07 21:11:21 +02:00
Harish Kasiviswanathan 419117eff9 Remove unused parameter gpu_id from few functions
This will also fix out of bound access in functions
fmm_get_aperture_base_and_limit and fmm_release



Change-Id: Icf064c46647e69a069126171dbacdf3d5b27f972
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: a4cf02d797]
2015-11-30 11:51:44 -05:00
Harish Kasiviswanathan f34b407728 Use same VM range for all dGPUs
dgpu_aperture and dgpu_alt_aperture will be shared by all dGPUs.



Change-Id: I814495e43b51acabdc6266cfa8d83db5a062e20d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 2903a610e1]
2015-11-26 15:07:29 -05:00
Harish Kasiviswanathan 87ddd7732e Fix dgpu_vm_limit
Break from the for-loop once dgpu VM range is found, otherwise the
length is reduced by half

Change-Id: Ie602054c16ea69ea1cbb75e804ead551bc3615c0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 5a55383baf]
2015-11-23 11:51:39 -05:00
Amber Lin 4c84c85252 Fix sibling map in CPU cache properties
Previous code only works for systems where shared_cpu_map lists 32 or less
bits. Some systems list more than 32 bits and express them as
XXXXXXXX,XXXXXXXX,.... This patch adds that calculation. Also increase
MAX_CPU_CORES and MAX_CACHES to accommodate more advanced systems.

Change-Id: Ia5c7041866456a6aa3b66f8f0f951022d7c51028


[ROCm/ROCR-Runtime commit: a5bc8360e8]
2015-11-12 08:31:51 -05:00
Felix Kuehling 1ab2c3341a Reserve address space with PROT_NONE
Access to reserved address space that has not been allocated should
result in a segfault. Use PROT_NONE to ensure that.


Change-Id: Ic5da9392fabbe78c9ec14f98e8b7b47e5267a98a


[ROCm/ROCR-Runtime commit: 62337b6c0a]
2015-11-10 18:19:56 -05:00
Kent Russell 650232b83b Use OUT_DIR for thunkroot variable
Pick up the thunk from the correct location. It is no longer inside
THUNK_ROOT, but instead part of the OUT folder.

Change-Id: I41dd7dae243e66270d0ea7182f1ba119b18a1cfb

[ROCm/ROCR-Runtime commit: 3786e18d99]
2015-11-09 16:21:49 -05:00
Kent Russell 63c43d3404 Fix variable for RPM build
Certain versions of rpmbuild need the variable to be outside of curly
braces. This addresses that issue in that situation.

Change-Id: Iff7200b332b9d8e41a4d7676ca14c5a32c075beb


[ROCm/ROCR-Runtime commit: 4e4d4a81e1]
2015-11-09 11:05:32 -05:00
Amber Lin 403eb13050 Add CPU cache information
Fill up cache properties of CPU node by reading data from /proc/cpuinfo
and /sys/devices/system/cpu/cpuX/cache/indexY



Change-Id: I0a96760575e504e38962554f192c3fe66bea3c15


[ROCm/ROCR-Runtime commit: b6f65f9849]
2015-11-09 07:16:24 -05:00
Kent Russell 67d98aa280 Add option to create release build for Thunk
By adding REL=1 to the make command line (e.g. make REL=1 deb), we can
create a release build of the Thunk. This will not affect existing
functionality, and will only have an effect if REL=1 is specified on the
command line, or in the build_thunk.sh script.

Change-Id: Iedc3b6094e70a4ebd726499eda56013cc254b83d


[ROCm/ROCR-Runtime commit: cb3a664065]
2015-10-30 14:05:40 -04:00
Kent Russell 39d2152a3f Cleanup RPM build of thunk
Change-Id: Ib437a3ec7be9f5aa7d3ef9e53c13e3c5e7b7382e


[ROCm/ROCR-Runtime commit: cabbcbabff]
2015-10-30 08:42:16 -04:00
Felix Kuehling b900df9215 Use correct aperture for _fmm_unmap_from_gpu_scratch
Passing in the wrong aperture resulted in failure to unmap scratch.


Change-Id: Icd7423abfb1bcc773b33becffcbefc233f4ff340


[ROCm/ROCR-Runtime commit: bd93eecc64]
2015-10-29 18:26:15 -04:00
Philip Cox 782cea350c Add SDMA IOCTL type to Create Queue function.
Change-Id: I7e31507b761ca388b2cac93f994f6106de962f17


[ROCm/ROCR-Runtime commit: 0c234c7ef3]
2015-10-29 10:25:41 -04:00
Kent Russell f4889d439d libhsakmt - Add make option to package thunk as RPM
Add an option to libhsakmt to allow the thunk to be packaged as an RPM.
The default will remain being built as-is, but this can now be packaged
as an RPM by using "make src rpm" . build_thunk.sh will be modified to
reflect this new option.

Change-Id: I38e03d10cfb5035bdf0a87635a784c47a709a5b6


[ROCm/ROCR-Runtime commit: 6ceed7def3]
2015-10-29 07:49:13 -04:00
Harish Kasiviswanathan 595f51899f Remove erroneous and redundant memory banks reported
hsaKmtGetNodeMemoryProperties -
	- Return only HSA_HEAPTYPE_SYSTEM memory for CPU only node.
	- For dGPU remove redundant HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE
	  entry.

Change-Id: I0349be39b8409a0fd64a038b8b2956191356d937


[ROCm/ROCR-Runtime commit: f885e551aa]
2015-10-23 18:43:46 -04:00
Harish Kasiviswanathan 71dc59b245 Correct parameter name for topology_is_dgpu()
The function expects device_id and not gpu_id.

Change-Id: I79794fd4e58e6e6adb26659da30f3e4d8e108434


[ROCm/ROCR-Runtime commit: 69662da3dc]
2015-10-23 18:43:45 -04:00
Harish Kasiviswanathan d7589c62e1 Unify fmm_get_aperture_xxx functions
Unify fmm_get_aperture_base and fmm_get_aperture_limit into one
function. Make the return value to HSAKMT_STATUS.

Change-Id: I0b3f563ffb268947ab891f4935f61788d0af0e01


[ROCm/ROCR-Runtime commit: cb53548c89]
2015-10-23 18:43:34 -04:00
Felix Kuehling 29561cc13e Implement flat scratch support for dGPU
hsaKmtAllocMemory only allocates aligned address space and sets up
the scratch_physical aperture to match the allocated address space.

Actual allocation of backing memory happens in hsaKmtMapMemoryToGPU.

Change-Id: Ie709815ab9bedb3d682e096b4005fdfb5e94d3a7


[ROCm/ROCR-Runtime commit: 5131ab4e64]
2015-10-22 20:40:22 -04:00
Felix Kuehling 17a31f1cce Allow address space allocations with specific alignment
Change-Id: I4bf7f7ac53c3921dd330b9dc7a40582611f88b69


[ROCm/ROCR-Runtime commit: 149261ba09]
2015-10-22 20:27:49 -04:00
Ben Goz 4ec82c7edd Casting local memory size to uint64_t
Change-Id: I5c2010056b84ac01bb65361210d2a693e437050a
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: 55b1a5dc43]
2015-10-22 09:05:34 -04:00
Ben Goz a511b7c4f7 Adding support for new AQL Queue Memory allocation
Change-Id: If84fc4b961627dbdd0b77b1c509a3c9a4c709b9f
Signed-off-by: Ben Goz <ben.goz@amd.com>


[ROCm/ROCR-Runtime commit: e61500c46e]
2015-10-22 13:13:54 +03:00
Felix Kuehling 55bd82cd89 Fix node 0 system memory allocation for dGPU
This is a hack to allow the Runtime to allocate system memory with
PreferredNode=0 on a dGPU system. We allocated it from Node 1
instead so that the node 1 GPU can map the memory. A proper fix
will be implemented together with multi-GPU support.

Change-Id: Ieb52599e5275781c04ee34405ea850bf782c523a


[ROCm/ROCR-Runtime commit: 590c8e522c]
2015-10-21 20:00:01 -04:00
Felix Kuehling ebf6ec1806 Reserve more SVM process address space
Try to reserve as much SVM address space as GPUVM can address.
Implement a fallback scheme to smaller sizes if larger allocations
fail or are not addressable by the GPU, down to an (arbitrary)
minimum of 4GB.

Change-Id: I770177834cc9e6ddd6ef4f20d789eab63c8055cb


[ROCm/ROCR-Runtime commit: 39bde26c9b]
2015-10-19 17:44:23 -04:00
Andres Rodriguez f6eba4d367 make: add 'deb' target for creating deb packages
When 'make deb' is run create a libhsakmt.deb archive that installs
libhsakmt into the appropriate folder on the target where the dymanic
linker can find it.

Change-Id: I32de7198975f7831e509a67371e78456982b5c42


[ROCm/ROCR-Runtime commit: 0df346aaf9]
2015-10-16 19:13:51 -04:00
Harish Kasiviswanathan d38a3f1438 Fix init process apertures
Kernel ioctl AMDKFD_IOC_GET_PROCESS_APERTURES returns process apertures
only for GPU nodes. The current implementation assumed that this list of
GPU nodes returned by the ioctl has one to one correpondence to sysfs
topology nodes. This fails when non-GPU nodes exist in topology as in
case of Intel + gfx802

Fix this by using gpu_id (./sys/.../kfd/topology/nodes/1/gpu_id) to map
information obtained from kernel ioctl call.

Change-Id: I4ab8ae5354f12cf0b6609fc4b24182b82eb3677f


[ROCm/ROCR-Runtime commit: 5cc56a2647]
2015-10-15 15:38:14 -04:00
Harish Kasiviswanathan 462a775ec3 Fix hard-coded usage of Node 0
Use appropriate NodeId instead

Change-Id: I46af93b76978fea7bedb34457fcc0864ed4fe2d4


[ROCm/ROCR-Runtime commit: b6c6f79143]
2015-10-14 17:27:38 -04:00