커밋 그래프

2959 커밋

작성자 SHA1 메시지 날짜
shaoyunl fea5ab9114 Export libKmtSetTrapHandler symbol as global
Change-Id: I065dbecd05e992bc528128d893edaf636c1beff7
2016-03-01 10:30:02 -05:00
Harish Kasiviswanathan bf03058112 Fix io_links sysfs directory name typo
Change-Id: I4f6fb43c4a038b94c0f94f66ee383e83ad0ffa62
2016-02-29 11:15:29 -05:00
Jay Cornwall 3a662ac712 Fix race in dGPU event page setup
events_page is unprotected from multiple allocation. The first event
creation ioctl is unprotected from a race with args.event_page_offset
being set (for page setup) and null (all subsequent invocations).

Change-Id: I40ba712a17e9eff257785f90c553a74ad09c661d
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
2016-02-28 07:14:23 -05:00
Felix Kuehling 006f3ee41b Fix address space leak in __fmm_release
Use the object size when freeing address space, instead of the
parameter passed in by the caller. The parameter may be incorrect
due to app or runtime bugs, or when the buffers is an AQL ring
buffer with double mapping workaround.


Change-Id: I00bb31d4520ef969a49d6d5ea723e8a33418acc3
2016-02-26 09:19:21 -05:00
Felix Kuehling 8a0161d6bb Use aligned size for looking up userptr object after allocation
The alignment performed in vm_find_object_by_address isn't sufficient
because it doesn't take into account the offset from the start of the
page.

This fixes a bug where certain unaligned userpointers and sizes fail
to register correctly.

Change-Id: I17872e264467a619f5e1bedb7e1ed3d994a856bf
2016-02-25 19:47:05 -05:00
Ramesh Errabolu (xN/A) TX f7693cf777 Configure AQL packet header with System Scope for flush
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1240170]
2016-02-24 14:08:35 -05:00
Ben Goz 3f02a3cf0b Mapping public VRAM BO to cpu
Change-Id: I2ff62ff0784f8ce556ad80739a177b90d866f1b4
Signed-off-by: Ben Goz <ben.goz@amd.com>
2016-02-24 17:30:15 +02:00
Felix Kuehling 7a383f9d88 Fix memory leaks due to stale CPU mappings
Use the aligned size of the buffer objects for CPU unmapping in
__fmm_release instead of relying on the unaligned size passed in by
the caller.

Change-Id: If986ec24e9a05d32981549fddbf143221fc40bac
2016-02-16 18:12:05 -05:00
Felix Kuehling 85f9efb1a0 Add support for register/deregister memory for dGPU
Allocate SVM address space for the registered memory and use new
userptr support in KFD to create a system memory BO associated with
the given user pointer. Map this BO at the SVM address for CPU
access.

MapMemoryToGPU can be used with the registered user pointer and
will return the SVM address as alternate GPUVA.

Change-Id: I4886e193c51fb6870a567878870c36bf8b5c3748
2016-02-16 18:12:05 -05:00
Ben Goz 00386734b1 Align gpu-id-array size to multiple of sizeof(uint32_t)
Change-Id: I9f46b6a331a8d928ef570b420fb60b99b2edfdd1
Signed-off-by: Ben Goz <ben.goz@amd.com>
2016-02-16 11:27:06 -05:00
Besar Wicaksono (xN/A) TX [TEXT] bbe0be05d4 Modify MatrixMultiplication sample to use memory pool API
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1237420]
2016-02-16 11:12:25 -05:00
Besar Wicaksono (xN/A) TX [TEXT] c494af9d49 Add sample application to use the new memory pool API.
Details:
- add HsaGetInfo program that prints out all available CPU, GPU and their respective memory pools.

[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1237219]
2016-02-15 18:11:44 -05:00
Harish Kasiviswanathan 04b92b8e05 gfx803: Add performance counter information
Change-Id: Id81b43e90029306f03c84752cef06dc336e3a4a9
2016-02-12 16:39:39 -05:00
Harish Kasiviswanathan 1a0f915957 Adding missing performance counters for gfx801
Few more counters are now available in GFX8 register specs. So adding
them. Also for gfx700 and gfx801 report correct number of SQ perf counter slots

Change-Id: I9e6b4b10238230aabeccbfaa5e491a28b5e54f2d
2016-02-12 16:37:21 -05:00
Ramesh Errabolu (xN/A) TX 2280190f70 Populate Cpu and Gpu nodes into different agent lists
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1236865]
2016-02-12 16:14:39 -05:00
Ben Goz b37f99a01e Fix double free issue and pointer alignment
Change-Id: Id5bab454d53d404883a92282168b3f6cbc468cbb
Signed-off-by: Ben Goz <ben.goz@amd.com>
2016-02-12 11:21:32 -05:00
Kent Russell cd6d75880f Fix build location for thunk RPM
Change-Id: I4f5c7688a3e9b4dd31d8d72cae3adf9a796e38f9
2016-02-12 08:29:52 -05:00
Felix Kuehling 887b32fe86 Make hsaKmtAllocMemory more compliant with the Thunk spec
Allocations from GPU nodes will return VRAM, not system memory.
Only non-paged allocation from GPU nodes is supported. System
memory can only be allocated from CPU nodes (usually node 0).

The HostAccess flag is no longer used to distinguish the memory
type. It only indicates, whether the memory is mapped for CPU
access.

Maintain compatibility with broken KfdTests by returning system
memory for paged-memory requested from GPU nodes.

Change-Id: I514defede735f55e6de436f41944125b6f2c4ccf
2016-02-10 10:29:54 -05:00
Yair Shachar a815a4337f Disable scratch Host allocation - via debug registration flags.
Change-Id: Ia6e5f86ec3979c4a49800f7af4509442a4e5be27
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
2016-02-10 07:52:32 -05:00
Ben Goz 7070f7ec5e Adding support to hsaKmtMapMemoryToGPUNodes
Change-Id: Iab6222402a43c3cd31b0efc5a316a6482986258e
Signed-off-by: Ben Goz <ben.goz@amd.com>
2016-02-09 17:34:29 +02:00
Ding, Wei (xN/A) TX df99562905 Changes 5 hsail apps for supporting gfx803.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1235366]
2016-02-08 15:39:18 -05:00
shaoyunl 4e6c25e55b libhsaKmt: Add CWSR support on dGPU
This is thunk part of the  CWSR support.
1. SDMA queue don't support CWSR , no necessary to allocate the context save/restore memory
2. Allocate the context save/restore memory in local frame buffer for dGPU

Change-Id: Ie83506f0cced2a5a537c49d68125796d831c2764
2016-02-04 15:00:58 -05:00
shaoyunl 7e40877e81 libhsakmt: Use GPU ID instead of Node ID in set_process_dgpu_aperture
Change-Id: I0e66ca4a018c15c009a3516d250f0044a4407878
2016-02-04 10:32:23 -05:00
Andres Rodriguez 3797b56ec9 Bump version for bugfix release 1.8.1
Change-Id: I06701905592594221d26c075a8fe370b4cc92aff
2016-02-02 01:29:51 -05:00
Ben Goz e37863d7f2 Adding HsaMemMapFlags struct
Change-Id: Ib0ee6dede1169582fd58bfca648347c3f8aa0b54
Signed-off-by: Ben Goz <ben.goz@amd.com>
2016-01-31 05:16:53 -05:00
Felix Kuehling cc9fc386bd Remove gfx802 page size workaround on gfx803
All tonga page size alignment is done in the memory management
functions in fmm.c. All other code only specifies the minimum
alignment it needs and lets fmm.c handle the HW-specific
alignment.

Clean up aligned-exec memory allocation in queue.c to remove
hard-coded TONGA_PAGE_SIZE alignments and remove code duplication.
Make sure alignments are consistent between allocate and free.

Change-Id: Ia8923448173d1cef315af24cebff12adef385cb0
2016-01-28 16:05:18 -05:00
David Ogbeide 4e4a881940 libhsakmt: Add marketing names for GPU nodes
HSA thunk API returns null when querying for GPU node marketing
names due to empty system topology file.

- Add marketing names to device GFX IP data structs.
- Modify name retrieval to pull from data structs instead of file.



Signed-off by: David Ogbeide <davidboyowa.ogbeide@amd.com>

Change-Id: I30ea04111be7e0df2e93894f801fbeb414ffa790
2016-01-25 11:03:54 -05:00
Felix Kuehling 641bfd2cd5 Add simple test for unloading and reloading Thunk
Change-Id: I4ca95dee8a180023d1de5f69161607dd368164de
2016-01-22 18:41:53 -05:00
Felix Kuehling cc7491ec71 Link libhsakmt with -z nodelete
This prevents the library from being unloaded at runtime, even when
dlclose is called. This preserves global variables, such as state
about the SVM address space and avoids catastrophic leaks on dlclose.

Change-Id: I34f1d19a450835200e9d4815458e8d1b3045053c
2016-01-22 18:08:19 -05:00
Amber Lin 7416805a44 Revert "Free resources when dlclose is called"
This reverts commit 582b70f9c3.

Conflicts:
	src/fmm.c
	src/perfctr.c

Change-Id: Ib6113c2dd3962c72100c7f74cdef6897e1df40b3
2016-01-22 17:58:33 -05:00
Serguei Sagalovitch f44982a7ca Fixed logic to return data back to user
Change-Id: I324d07c38e8d7eb202d4dccfed6e62006cf9cd29
Signed-off-by: Serguei Sagalovitch <Serguei.Sagalovitch@amd.com>
2016-01-22 14:49:18 -05:00
Serguei Sagalovitch 47cef87a34 Skeleton for RDMA unit test v4
Added application and driver to serve as the starting point for RDMA
unit test uility.

v2: Added initial mmap support
v3: Fixed logic to find correct ioctl handler
v4: Fixed logic in mmap to find correct pages table

Change-Id: Iaf97c0eb2acef2160d542c71afed58cf400414f7
Signed-off-by: Serguei Sagalovitch <Serguei.Sagalovitch@amd.com>
2016-01-21 15:20:24 -05:00
Harish Kasiviswanathan 5e53205b9e Don't limit number of supported HSA Nodes
Remove #define MAX_NODES 8

Change-Id: I756cadc652543dd17ea48a1c956adc08c3d2631a
2016-01-15 17:27:43 -05:00
Harish Kasiviswanathan ce83dc623f Don't limit number of supported GPUs
Stop using NUM_OF_SUPPORTED_GPUS. For now the definitions itself cannot
be removed as ioctl code is in upstream Kernel.

Change-Id: If846625a8ad5062d5483e762850c793d3c00b9d0
2016-01-15 11:44:42 -05:00
Harish Kasiviswanathan e7e1361c3d Use new ioctl for getting process apertures
Change-Id: I73678744ad73942edec442ad9c6d38637f7e1235
2016-01-12 12:09:25 -05:00
Felix Kuehling 063ad3ad9e Implement hsaKmtRegisterMemoryToNodes
Fix hsaKmtRegisterMemory to be a no-op for now and move the multi-GPU
implementation to hsaKmtRegisterMemoryToNodes. Make GPU memory mappings
of host memory visible to all GPUs by default. Device memory is still
visible to the allocating GPU only by default (but can be overridden
with hsaKmtRegisterMemoryToNodes for experimenting with P2P).

Change-Id: I73408afbe3b10c8dad2ab3a780f58413249692e6
2016-01-08 16:00:23 -05:00
Ben Goz ea0f9d2a0b Adding support for mGPU
Change-Id: I5ed184e6a58b38d9dde48867f14513d161cf41a9
Signed-off-by: Ben Goz <ben.goz@amd.com>
2016-01-04 15:35:15 +02:00
Ben Goz 53b208adf2 Fix AQL Double buffer allocation mode
Change-Id: I5162ffd89416d317fd0ca0fc51da523298488922
Signed-off-by: Ben Goz <ben.goz@amd.com>
2016-01-04 15:34:53 +02:00
Nikolay Haustov [TEXT] d8e67d962b Split libHSAIL and libHSAIL-AMD (HSA Changes)
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1223723]
2015-12-28 10:00:43 -05:00
Yair Shachar 681f4dcecc Add support for scratch GPUVM on host memory
This is required when we have a debug session



Change-Id: If9d6d2d23a9016b6ca9562e02a91fc16e0354ee4
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
2015-12-20 15:50:50 +02:00
Harish Kasiviswanathan 39bf9c6611 Fix node_id in gpu_mem[] array
Change-Id: I4897623612e1749e275fb97ce1603dc5130fc9ce
2015-12-14 16:25:18 -05:00
Amber Lin 582b70f9c3 Free resources when dlclose is called
When the Thunk is initialized multiple times in the lifetime of a single process
, some global resources are leaked. This can happen when dlopen and dlclose are
 used to load the library at runtime, rather than linking the runtime against
the Thunk. This patch adds the destructor to release global resources when
dlclose is called.


Change-Id: Ia00da0d41f095d0b2706f98c0e75effedd596f49
2015-12-11 16:32:41 -05:00
Yair Shachar 8f529e3c72 Add support for per device debug register state tracking
Change-Id: I8d51670f5de8d379ead898d484f668a8034f9878
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
2015-12-07 21:11:21 +02:00
Harish Kasiviswanathan a4cf02d797 Remove unused parameter gpu_id from few functions
This will also fix out of bound access in functions
fmm_get_aperture_base_and_limit and fmm_release



Change-Id: Icf064c46647e69a069126171dbacdf3d5b27f972
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2015-11-30 11:51:44 -05:00
Harish Kasiviswanathan 2903a610e1 Use same VM range for all dGPUs
dgpu_aperture and dgpu_alt_aperture will be shared by all dGPUs.



Change-Id: I814495e43b51acabdc6266cfa8d83db5a062e20d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2015-11-26 15:07:29 -05:00
Harish Kasiviswanathan 5a55383baf Fix dgpu_vm_limit
Break from the for-loop once dgpu VM range is found, otherwise the
length is reduced by half

Change-Id: Ie602054c16ea69ea1cbb75e804ead551bc3615c0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2015-11-23 11:51:39 -05:00
Amber Lin a5bc8360e8 Fix sibling map in CPU cache properties
Previous code only works for systems where shared_cpu_map lists 32 or less
bits. Some systems list more than 32 bits and express them as
XXXXXXXX,XXXXXXXX,.... This patch adds that calculation. Also increase
MAX_CPU_CORES and MAX_CACHES to accommodate more advanced systems.

Change-Id: Ia5c7041866456a6aa3b66f8f0f951022d7c51028
2015-11-12 08:31:51 -05:00
Felix Kuehling 62337b6c0a Reserve address space with PROT_NONE
Access to reserved address space that has not been allocated should
result in a segfault. Use PROT_NONE to ensure that.


Change-Id: Ic5da9392fabbe78c9ec14f98e8b7b47e5267a98a
2015-11-10 18:19:56 -05:00
Kent Russell 3786e18d99 Use OUT_DIR for thunkroot variable
Pick up the thunk from the correct location. It is no longer inside
THUNK_ROOT, but instead part of the OUT folder.

Change-Id: I41dd7dae243e66270d0ea7182f1ba119b18a1cfb
2015-11-09 16:21:49 -05:00
Kent Russell 4e4d4a81e1 Fix variable for RPM build
Certain versions of rpmbuild need the variable to be outside of curly
braces. This addresses that issue in that situation.

Change-Id: Iff7200b332b9d8e41a4d7676ca14c5a32c075beb
2015-11-09 11:05:32 -05:00