Graphe des révisions

2959 Révisions

Auteur SHA1 Message Date
Felix Kuehling 15764e2897 Fix KFD ioctl ABI
This change breaks the ABI, and aligns it with the upstream ABI.
It also fixes some ioctl structures that are not 64-bit safe and
consolidates ioctl numbers.

Change-Id: Ib79944721534bd55a5299c5baf7bb5b3246cccd2
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-05-09 14:59:13 -04:00
Konstantin Zhuravlyov a777413400 Purge warning in amd_hsa_code.cpp
Change-Id: Iaa5d7af183af5e8c069365a1f0410365b46d53d5
2017-05-08 19:39:49 -04:00
James Edwards 001d43ce56 Change rpm preinstall script to post install
Change-Id: Iccc04902699bf0ba8b5269e1129b72cf69ef7f00
2017-05-07 14:02:54 -05:00
Felix Kuehling 5eb31b2ebe Switch to cleaned up memory management ioctls
Change-Id: Ib8971ef91138f2a051272b9b57f0ebd480e8e738
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-05-04 16:29:37 -04:00
Harish Kasiviswanathan 3b2f064cbc Get PAGE_SIZE from system configuration
Change-Id: I87f383c443b873e13d36e80bfa034665bf493520
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2017-05-02 16:54:32 -04:00
Amber Lin ca06b0966b Add more non-priv PMC blocks to GFX9
This patch adds more non-privileged PMC blocks to GFX9/gfx900 to cover
blocks added in HSA Thunk Spec.

Change-Id: Ia3d953213a32536b2275231149f11ba060791442
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-05-01 09:14:03 -04:00
Amber Lin ed4a22e0d3 Add more non-priv PMC blocks to GFX8
This patch adds more non-privileged PMC blocks to GFX8 products: gfx801,
gfx803, and gfx803. Most of them have the same counter IDs on the same
block. For certain blocks when the product doesn't have the same counter
IDs, gfx8_xx_ is used to represent the product.

Change-Id: I059913c974bf2eb875fd1cf6f8b0d8c9c9bd7c14
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-04-27 11:22:12 -04:00
Amber Lin 9f19acbdb7 Add more non-priv PMC blocks to gfx70x/GFX7
HSA Thunk Spec was updated to include more non-privileged blocks for
profiling. This patch adds those newly added non-privileged blocks for
gfx70x.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>

Change-Id: Id745ac236c871e8e61a128a2460784f9c9c354b6
2017-04-25 13:08:10 -04:00
hthangir 8aa19388a9 On GFX9+ amd_queue_t.scratch_backing_memory_location must store the queue's scratch backing store VA, not the offset.
Also fix permission in couple files.

Change-Id: I4203f8e5a36406b20562d8943ea5c341847f039a
2017-04-18 22:37:56 -05:00
Felix Kuehling 3f7e7933e3 Add debug option to control the number of guard pages
Change-Id: I18b10bcbb4d74a92f17330e44b2dbb4cea61da00
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-04-10 11:40:39 -04:00
Felix Kuehling 34ddde0c50 Add debug option to check userptrs on registration
export HSA_CHECK_USERPTR=1 to check user pointers on registration. If
the pointer doesn't point to a valid mapping, there will be a segfault.

Change-Id: I459c0902cbc90338517fbf79678871ebfbe5183b
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-04-10 11:40:39 -04:00
Amber Lin c119653add Create indirect IO links
KFD added all direct IO links to sysfs, so this patch removes all direct
links related code and modify the indirect links function to reflect the
change.

Change-Id: Iaec7b5f6c59f9034f8f960ca1fe1145d51dab367
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-04-07 07:18:13 -04:00
Christophe Paquot 617b6fa987 Separate gfx700 and AI architectures
Registers are different and it's cleaner to do as such

Change-Id: I36eee4c9c74deb43ca4666baa87894765a5f27b8
2017-04-07 00:14:22 -04:00
Jay Cornwall f0a1c7c4c6 Fix gfx9 trap handler to retrieve correct return address
The trap protocol changed between gfx8 and gfx9. The return address
is in trap temporaries [0,1] on gfx9 rather than [4,5] on gfx8.
Unfortunately SP3 changes the meaning of the ttmp register aliases
in gfx9, further confusing the issue.

Clean up later when LLVM assembly build is introduced to the runtime.

Change-Id: I84ea9bf3736f060dd95d0361f9d5a0f9a3576178
2017-04-05 17:33:49 -05:00
Felix Kuehling 11862b9f61 Add guard page after each address space reservation
Guard pages help catch out-of-bounds memory accesses by applications
by generating VM faults (GPU) and segfaults (CPU).

Remove address space reservation from scratch aperture. That address
space is managed by the Thunk client. Guard pages would cause Thunk's
address space management to get out of sync with the client's.

Change-Id: I2e5aee2923a90186358cc7b0e131baf547996df6
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-03-30 11:31:48 -04:00
Sean Keely 8a5ff78be6 Remove comments, no functional change.
Change-Id: I923c037803a847352c2c50d9d47460cb0f01f22c
2017-03-28 18:22:49 -05:00
Sean Keely 7dfeee5074 Support async. queue errors and dynamic scratch without KFD events.
Change-Id: I4e9e7a37aa7b9c96b28ce79f562760283e02b1e0
2017-03-28 19:18:18 -04:00
Felix Kuehling 53838c9818 Add missing gfx900 device IDs
Change-Id: Ica5deb000279a508106125461af64a3851294b0a
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-03-24 16:03:46 -04:00
Sean Keely c4544906b9 Refactor signal_wait timing code and respect small timeouts.
Optimized for Gromacs and SHOC.

Change-Id: Ib674710268b41003259711a0e42d3e770a82018d
2017-03-23 23:55:48 -05:00
hthangir ba3f1cb476 We should be using the "used" gcc attribute.
Change-Id: I1589273740ae66e8d7d8186a88e2c411a2e0425c
See: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes
2017-03-20 11:57:39 -04:00
hthangir 6c750f479d Fix the comment to specify the right type of allocation required.
Change-Id: I8bda8d64010d466d6ca5e779d2042cca3f494ecf
2017-03-20 11:56:54 -04:00
hthangir 7c6cde1871 Disable SDMA only on gfx900 until it is validated.
Change-Id: Ib960be3ca6d3fc4b664ba047243964b8c7a33f24
2017-03-20 11:55:22 -04:00
Amber Lin 73eff30d7d Add TCA block to PMC support
Add TCA to PMC tables.

Change-Id: Ia4164ab4581ea3f539706f534f672e5c24f5362f
2017-03-20 10:22:21 -04:00
Konstantin Zhuravlyov a08d760c70 [Loader] Fix memory allocations for code objects that
are larger than swap space available

Change-Id: I321487f96fe0a18998301a9058430c19427e5a94
2017-03-11 00:57:25 -05:00
Sean Keely 5f50e97d18 Support async error code 256, invalid vendor specific packet.
Change-Id: I491f34def4c3d54403864fa42670f7847a6141cc
2017-03-10 16:20:27 -05:00
Sean Keely 2824786b3b Relax signal assertion.
Informs, in debug mode only, that a signal wait violated the HSA
spec with regard to the consuming agents list.  This list is used
for optimized signal type selection.

Change-Id: I5879f8f822d01af504ab913482b2532feb00be98
2017-03-10 16:05:34 -05:00
Christophe Paquot 05d587ef79 Add inc/ to some include
Change-Id: Id027b015c8785a132835a422d97a23b0bbce208a
2017-03-09 19:45:01 -08:00
Sean Keely 426d41e27c Adjust signal sleep to reflect null kernel latency. Performance tested on Gromacs.
Change-Id: I3851148ee8544b15d840f2c26ca73a83f8d0df2e
2017-03-09 15:20:53 -05:00
Amber Lin 3738a1b5f2 Re-formatting IO link code
- Typo fix: *_link_tye to *_link_type and a missing word in comments
- Replace printf with fprintf(stderr
- Shorten lines to fit in 80 characters

Change-Id: Ibeb0b98d5c59d617ae06d9854a9dde16251ded52
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-03-09 11:08:22 -05:00
Christophe Paquot 29894df0b5 Update addrlib for gfx900.
Change-Id: I2b7b6093406c5498e9a551327701ad8973f1cf3a
2017-03-07 14:41:16 -08:00
Amber Lin 2c2b1e0db2 Support profiling on gfx900
Add gfx900 to PMC support. This patch lists SQ counters.

Change-Id: Ia1e60e76ff71ab2e38d9d5de12ac9d527b3e8c6a
2017-03-07 14:30:40 -05:00
Amber Lin 9e32cdb113 Don't duplicate PMC tables
Many devices have the same counter IDs for the hardware block. Devices
in the same GFX generation usually have the same block counters. No need
to list each device individually. Instead, have a table to share with all
devices that have the same counter IDs, and have separated tables for
devices that don't have the same counter IDs.

Change-Id: I857056edc6f491f61af6e9598580e5dc7d372f94
2017-03-07 11:31:23 -05:00
Amber Lin b3b6367cb8 Add gfx803 DID
Add 0x67D0 to gfx803 support list.

Change-Id: Ifdb1fad4a3c42bea54856f6d5248c00ed546ad85
2017-03-07 07:25:49 -05:00
Amber Lin 4827b09119 Unify the device ID list
Integrate the supported device ID list distributed in topology, queue, and
pmc into one place: topology.

Change-Id: If035cf8e4a6fc6caff6c94ec627647cfb11c3d79
2017-03-06 16:26:51 -05:00
Amber Lin 1a8a9cb57b Make the lock file writable by others
Though S_IWOTH flag is set in the open() call, the lock file is not
created as accessable by others if others try to open the file with O_RDWR
permission. It's because the default umask masks off S_IWOTH. This patch
changes the umask to S_IXOTH since others don't need that permission but
it'll open up S_IWOTH. Restore the umask to original after the file is
opened.

Change-Id: I8a239e1566ce0b0b18821913385f239db7c3588e
2017-03-03 11:05:13 -05:00
Amber Lin e17c67f049 Implement Start/Stop/Query Trace
StartTrace and StopTrace send ioctl requests to enable/disable performance
counters. QueryTrace reads the counter from the perf_event fd.

Change-Id: Ibf79675bc23fcf129371bfd100f8e262121bc684
2017-03-02 14:00:25 -05:00
James Edwards ec84fbe264 Fix permissions on hsakmt include files.
Change-Id: I1d428e60268e6d2de6776ff5f16d03503d00ddcc
2017-03-02 12:00:09 -06:00
Ramesh Errabolu 315ae6439b Extend Rocr Samples to allow collection of Perf Cntrs
Change-Id: I9c7e75128fca28b23ec54efab00bf5d32c95a877
2017-02-28 20:29:24 -06:00
Kent Russell c991951288 fmm.c: Disable userptr for paged mem by default
Unless HSA_USERPTR_FOR_PAGED_MEM is explicitly set, don't use userptr
for all paged memory. This will also allow us to work around some 4.9
issues, and then we can explicitly set HSA_USERPTR_FOR_PAGED_MEM for
all usage once those issues are resolved.

Change-Id: I25ce22b73ae6e93f1567f2318d9d2b47d4a44e69
2017-02-28 16:09:27 -05:00
shaoyun.liu 116e5c5e8b Thunk: Don't allocate extra control stack memory for gfx900
The control stack memory for CWSR is allocate in kernel together with MQD
allocation.

Change-Id: Ib1c0ab9402df3431e9555649394320380d6c6dd8
Signed-off-by: shaoyun.liu <shaoyun.liu@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling 7de66d149b gfx900: Allow doorbell allocation independent of queue ID
On SOC15 chips, the ABI for the create_queue ioctl is changed to
allow doorbell allocation independent of the queue ID. This is
necessary to accommodate doorbell routing to specific engines in
the BIF.

Change-Id: Ie98d0a758758149dd5fc09ae088afccc29904124
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling d7063dd102 Allocate 64-bit for doorbells and write pointers
On gfx900 we need 64-bit for all doorbells and SDMA WPTRs.

Change-Id: I9b922e16442e967599ae3c928308451d5cc470b3
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling 8cb89b6926 Use KFD_IOC_ALLOC_MEM_FLAGS_COHERENT for fine-grained memory
Use KFD_IOC_ALLOC_MEM_FLAGS_COHERENT when allocating fine-grained
memory and doorbell BOs so that they will be mapped with MTYPE_UC
on GFX9 hardware.

Change-Id: I51adf45b13105f479e6bcdaf54955b467920ee9a
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling e5dd2f88c6 Update kfd_ioctl.h
Copied from kernel repository.

Change-Id: I9ed021cfb5b297d9a91dce93ed6355c95fb1127b
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
2017-02-27 10:39:05 -05:00
Felix Kuehling 48207af92a Make doorbell-size ASIC specific
This is in preparation for gfx900, which uses 64-bit doorbells. We
maintain the same number of doorbells per process by making the
doorbell page size bigger.

KFD will need to implement the same rule.

Change-Id: I3c4110869b191b83943b5a390a48edfc94d941d8
2017-02-27 10:39:05 -05:00
Amber Lin 9ba2b68fdb Add gfx900 support
Add gfx900 device to the support

Change-Id: I71f30ef43e5e0ef0e7b5f18205b6cc4767d9d861
2017-02-27 10:39:05 -05:00
Amber Lin 1025579c0b Implement PMC AcquireTrace
Existing code uses lockf to ensure exclusive PMC access of one process and
one TraceId. However Thunk spec allows hsaKmtPmcAcquireTraceAccess to get
exclusive access to the defined set of counters, not exclusive to one
process or one TraceId. Multiple counter sets of multiple TraceIds is
allowed if they meet the concurrent access limit evaluated by the hardware
/driver.

Change-Id: I59cacb855a707fe326a4070452fcbbd3c95ac223
2017-02-27 09:33:58 -05:00
Felix Kuehling 64104fc8d9 Avoid COW after fork for API-allocated system memory
Change-Id: I5c7175114c4e6411d3beb5557e16cb71ddb01189
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-02-23 10:28:45 -05:00
Amber Lin cb60c5f18a Support multiple blocks in RegisterTrace
Existing code assumes all counters sent to hsaKmtPmcRegisterTrace belong
to one PMC block and this block is SQ. This patch considers cases when
counters are in different blocks, and removes the hard-coded SQ. As a
matter of fact, SQ is non-privileged so the user even shouldn't use SQ
counters to register/release trace. This patch also ignores
non-privileged blocks as what HSA Thunk spec describes.

This patch also records counters information in trace structure so
AcquireTrace can get counters information using that TraceId.

Change-Id: Ifa5741050553d4615baab01f7485a9e09435b019
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-02-21 15:32:18 -05:00
James Edwards 470750cc3c Update readme regarding CMAKE_PREFIX_PATH.
Change-Id: I322789f38b1984b2527554c10cb0f3be886d3e91
2017-02-20 14:33:53 -06:00