Граф коммитов

347 Коммитов

Автор SHA1 Сообщение Дата
Konstantin Zhuravlyov 2275c74695 Loader: add basic logging abilities
- Enabled with env var LOADER_ENABLE_LOGGING=1

Change-Id: Ibdbb1b55ffddb7dc9c63e52fc9db3013409376a4
2019-08-21 13:29:15 -04:00
Jay Cornwall ad717d2e98 Support KFD interrupt protocol in second-level trap handler
If M0[23] is set then the driver will interpret the interrupt as a
debug event, rather than a signal event.

Clear M0 before sending the interrupt. All paths here are terminal so
it's not necessary to save/restore M0.

Change-Id: Ibd85b8cc6f8556941f2308a2c3fa3c68702cd606
2019-08-08 15:16:15 -05:00
Ramesh Errabolu a043c6acbb Add override qualifier to CPU and GPU agent api
Change-Id: I930e29d671b5dc81dece6f910d611056a54d2c85
2019-08-06 18:13:26 -05:00
Konstantin Zhuravlyov 7d8205548b Allow ccache enabled builds if -DROCM_CCACHE_BUILD=ON
Change-Id: Ie3ebb5d95af5fa55f11c9c88378ab29736538e25
2019-08-01 14:33:38 -04:00
Chris Freehill 6588165de1 gfx908 loader/isa related changes
Change-Id: I638d4b2b300ac5a99d4d31d4fadcfe9e1e3c7748
2019-07-23 03:41:27 -04:00
Chris Freehill 2c15bcac9d Add ISAREG entry for gfx908 for ECC not supported
* Also, re-enable rocrtst

Change-Id: I70106c5a1788818387e46f240d577cbe59bc89f4
2019-07-22 21:50:09 -04:00
Chris Freehill 447a30e985 Initial gfx908 updates
Change-Id: I3d6307d6613a38861a95561b9ac68abaa5964b48
2019-07-22 17:25:06 -04:00
Sean Keely 0721dfd2e7 Update README build instructions.
Change-Id: I595e629117adfb44afb2e829d1f975782238277e
2019-07-19 14:17:47 -04:00
Sean Keely 6e07bc8dc4 Adjust agentOwner in pointer info queries for locked memory.
agentOwner from thunk reflects the GPU which holds the device alias.
We need to return a CPU to better reflect that the memory is system memory.

Change-Id: I9233f8779a4bfd471f68dbbbce07ae4528412e18
2019-07-19 14:17:13 -04:00
Sean Keely 465a8eb40b PR from github user DiamondLovesYou.
Allow user specified profiles if the HSAIL note is not found.

Konstantin reviewed and approved.  HSAIL note is not generated by LLVM.

Change-Id: I40fbfbaedd6787b6a716507918f698d02007afe1
2019-07-16 13:55:38 -05:00
Ramesh Errabolu 4daee0c8a1 Allocate fine-grained regions for Gpu devices that are members of Hives
Change-Id: Ibbed393aeac691793845d16d2f3fe2c3e5a7ec40
2019-07-13 01:12:53 -04:00
Jay Cornwall ff8f439112 Handle traps, illegal instruction, memory violations through queue signal
Report traps and fatal exceptions through a wavefront's
amd_queue_t.queue_inactive_signal. Previously, only traps were
reported and requireed the compiler to pass in the signal pointer
in s[0:1].

The signal is obtained through a mapping from doorbell index to
amd_queue_t*. The doorbell is fetched within a wavefront through
the gfx9+ S_SENDMSG(MSG_GET_DOORBELL) instruction.

Change-Id: I319b45f2e15dfcfe4db8f4065da1136e9539a42b
2019-07-01 22:59:41 -04:00
Jay Cornwall 6ed686ee29 Replace gfx9 SP3 trap handler with LLVM, fix IB_STS restore
Assembler toolchains are moving from SP3 to LLVM. Replace trap handler
source code with LLVM equivalent.

Fix a trap issue with SQ_WAVE_IB_STS restore. Mostly harmless as all
traps are currently considered fatal to the wavefront.

Change-Id: Iacecd9dd31a1d96a083c8b8327f442f33c861f9f
2019-07-01 22:59:27 -04:00
Sean Keely 299874f17d Initial support for deallocation callbacks.
Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback
to notify when HSA memory has been released.

Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d
2019-06-26 04:12:17 -05:00
Evgeny 6c0aaa2773 aqlprofile api fix
Change-Id: I2a710040422c7853ece5472ea776442b25d69dcb
2019-06-19 23:14:27 -04:00
Sean Keely 0c0e634458 PTHREAD_STACK_MIN may differ from system parameters.
Restrict stack adjustment to non-default stack requests and allow
stack growth within reason (20MB cutoff).

Change-Id: I320280c711402ac29683e94c7246b7c32c797611
2019-06-17 21:04:17 -05:00
Sean Keely 4b22d24346 Revert to SystemClockCounter for HSA system time.
CPUClockCounter is not NTP adjusted (CLOCK_MONOTONIC_RAW) so should be 
better for measurements.  However, it is implemented with syscall while
CLOCK_MONOTONIC is implemented via vDSO.  The latency increase becomes
significant when language layers make corresponding clock measurements.
Reverting to CLOCK_MONOTONIC will reduce latency and allow small
duration events to be measured at the cost of incorporating NTP
frequency skew errors.  NTP may adjust frequency by 500ppm so limits us
to ~3 decimals in elapsed time.

Change-Id: I920b9f707f47109d80d6c256c475638c03fb8d76
2019-06-17 21:07:26 -04:00
Sean Keely bbb90bdfc9 Fix description of HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL.
Description was inconsistent with itself and code.  Existing behavior
returns HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL == true for system
memory pools only and system memory pools do require hsa_amd_agents_allow_access.

Change-Id: I64b287bff9fdb21688aa169296e410edf1b209b5
2019-06-11 01:45:22 -04:00
Evgeny a06d96cef8 aqlprofile API: sdma blocks
Change-Id: I619af8adc17706f808644180cdd5a5c785e052ec
2019-06-05 18:54:08 -05:00
Evgeny 1be9298f72 adding new trace API
Change-Id: I6c83b5789f5a6cdbb574d041c40d5a47229c7f1a
2019-06-01 14:33:59 -04:00
Matt Arsenault 0016c6ce5b Don't check VERSION_BUILD is defined
Check if it is true or not. The string() call would define this to an
empty string, which would pass. This would then leave a trailing -
in the version string, which dpkg would error on during package
installation.

Change-Id: Ifb5fc15f5dde506e96bff7881a5d3f22d983406e
2019-05-29 11:09:31 -04:00
Sean Keely 22de0e7fb9 Allow hsa_status_string when HSA is closed.
API is a stateless lookup of RO data and needed to interpret
hsa_init error codes.

Change-Id: If80cba2f697843d08e529da0f790acf3c37127a7
2019-05-24 22:40:03 -04:00
Sean Keely 9f81bdfbe1 Add exception and error safety for CreateThread.
Change-Id: I82aaf64e039ca9614b4948deec1f87147f56279a
2019-05-24 22:39:55 -04:00
Matt Arsenault 22d29b55a4 Change include flag order
Search the local src directories first. If using a system
installed hsakmt, this would pick the installed hsa headers.

Change-Id: I9746d6e9db1749a130e4d93e024556754a537083
2019-05-22 16:43:18 -07:00
Sean Keely a913549190 Correct pthread join/detach handling.
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.

Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65
2019-05-16 12:14:06 -05:00
Sean Keely 6e2a056e1b Correlate errors for time stamps which predate process start.
Small times may be given to time conversion if GPU clocks are used to
accumulate elapsed time.  Because HSA APIs deal in absolute time this
leads to large conversion offsets of order system uptime.  Variation
in relative clock ratio estimation may be amplified in this case,
destroying elapsed time measurements.

This patch fixes the relative clock ratio used for times which predate
the call to hsa_init.  This correlates errors in such times allowing
the elapsed time to be correctly computed.

The effective maximum system uptime before elapsed time conversion becomes
inaccurate is ~3.5 months.  GPU event timestamps are good for process uptime
of ~3.5 months.  These are limited by double's mantissa precision.

Change-Id: I48752ff354920439d91016d6f2b0c8ddfa60b445
2019-05-14 17:35:06 -04:00
Sean Keely 06376e726b Expose HDP flush registers.
Exposed via agent info query.  Only valid if fine grain PCIe memory is enabled.

Change-Id: Ib4770901592ec047276458926a947737f9b93bb5
2019-05-11 00:04:47 -04:00
Sean Keely e89f9807f1 Patch from github.
At the moment it is not possible to build ROCr with Clang. This is
a spurious limitation. The present PR addresses it by guarding GCC
only flags and by fixing some additional warnings that Clang triggers;
one of said warnings did outline a rather interesting issue with math
being done on void*s. - AlexVlx

Void ptr arithmetic had already been fixed in amd-master branch.

Change-Id: I5ee97e20b5c40b10dd73facecabe75f02ba46462
2019-04-29 16:17:24 -04:00
Felix Kuehling 0c6b9532d4 Use non-paged memory for IPC signals
Non-paged memory can be IPC-shared even when HSA_USERPTR_FOR_PAGED_MEM
is enabled.

Change-Id: I8b1fa6d7a4a9327c78a77b3679697fbf55397093
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-04-29 09:20:11 -04:00
Sean Keely 1251842900 Don't create blits when copy profiling is enabled.
Change-Id: I879827133957ee610c3381ea30c536ec7d10ffab
2019-04-18 20:00:02 -05:00
Jay Cornwall 56f280c8a7 Detect memory event through Flags field insetad of Failure
KFD no longer reports MemoryAccessFault.Failure with retry fault
implementation. ROCr ignores the memory event when Failure = 0.

Use the Flags field instead, which will be non-zero when the
event is triggered.

Change-Id: Ie90799a303b0b2f1b476b20ffafdde79ae137182
2019-04-15 19:16:07 -05:00
Ramesh Errabolu ba029ebe21 Remove instantiation of MemoryRegion for heap type SVM surfaced by ROCt
Change-Id: Ib4ff7e7cabe9aacb811888aeb74f652dcb57f9e0
2019-04-10 18:33:07 -05:00
Konstantin Zhuravlyov 7001134757 Process symbols with 0 address
Change-Id: I9ed943a8ccd3b103edd6aba8264c009d8cda29fa
2019-03-30 02:14:43 -04:00
Sean Keely a535e18cc1 Add hsa_amd_memory_lock_to_pool.
Makes malloc memory accessible to GPUs so that the memory has the
capabilities of the pool it is locked to.
This admits fine grained locked memory and reserves API space for any future
special CPU pools.

Change-Id: If8c3dd8582a43f19d3d36b3763c1a688cc419ef0
2019-03-29 01:09:21 -05:00
Sean Keely 9f7df6d6fe Remove legacy memory fault event name.
Change-Id: I3ad240482523409e1152548009aecf127e63bbfa
2019-03-28 15:25:25 -05:00
Sean Keely e5de33dd9a Fix void* arithmetic.
GCC allows arithmetic on void* treating void as char.  Clang and
the language spec does not.

Change-Id: I939f2432f276979bb81881406e10528597ac6001
2019-03-28 12:49:19 -05:00
Sean Keely 7ea0cd688f Disable sram-ecc reporting via ISA until HCC is fixed.
Change-Id: I0382825884b727173385f04da9f2088650c3ba1d
2019-03-21 17:46:56 -04:00
Sean Keely fd9fb77e28 Do not strip release builds.
Customer request.

Change-Id: Id77dcdc0b6908c7a5e460edfd7d9468a1691e351
2019-03-07 14:04:14 -06:00
Sean Keely 67376e06ab Report SRAM ECC errors through the system event handler.
Modify the system event handler to support multiple users.
Name memory fault reason codes.

Change-Id: I1b5979b36ab15637eb2be59a61e2d57e76d0a70e
2019-02-27 18:08:07 -05:00
Sean Keely 3c3db0243e Loader support for SRAM ECC.
Change-Id: I0c6791c356d9186cc8dabae9fd698b1d4de19b09
2019-02-25 18:30:05 -05:00
Sean Keely c56d86100b Add fine grain vram pool.
Part 1 of 2.
Enables fine grain vram over PCIe based on env flag.
Part 2 will extend to XGMI.

Change-Id: I8ad506e004b398d56d462b0200274eae2293a461
2019-02-21 13:08:11 -05:00
Sean Keely 344d964f9f Suppress exception reporting for well defined invalid signal handles.
hsa_exceptions with empty what() strings will not report in debug builds.

Change-Id: I0d424d3b1d3044808ece1720a460a57d68bf878e
2019-02-15 19:35:57 -05:00
Sean Keely 400304aa10 Remove stop using ROCm release tags for library version numbers.
Version is now a fixed string that matches previous internal builds.
This also matches released DEB/RPM builds (but not github versions).

Change-Id: Id4819b9de8c855250aadf1a1cebb187b5c031721
2019-02-06 19:22:53 -05:00
Ramesh Errabolu 3fbf03af76 Allows users, via env ROCR_VISIBLE_DEVICES, to surface a subset of Gpu devices
Change-Id: I5662639d5d70f054831969669f9d30dec356dd5a

Update per review comments

Change-Id: I18c7d7cb00b261493b61c2cf5454d486166f40d8
2019-02-06 02:02:29 -06:00
Sean Keely 65d39cc476 Unify APU and dGPU initial queue scratch allocation.
Both support dynamic scratch allocation so there is no reason
to preemptively allocate on APUs.

Change-Id: I22eaec01a83a091ee9dc1f594a1a9106e8dd81fc
2019-01-25 02:11:39 -05:00
Jay Cornwall 079eadd71b Remove legacy microcode version check in GpuAgent::InvalidateCodeCaches
Fixes instruction cache invalidation when using microcode branches.

Change-Id: I932676e683983145f5c807204e592fb5e530c8af
2019-01-22 16:39:52 -06:00
Konstantin Zhuravlyov 8bee6e4976 Loader: update symbol processing for v2+
- Skip symbols that are STB_LOCAL and not STT_AMDGPU_HSA_KERNEL

Change-Id: I68567f58de9bf3f07dbd8020ef63f47667c86367
2019-01-18 15:42:28 -05:00
Konstantin Zhuravlyov c1ad82a6b7 Loader updates for code object v3
- Fix loading in some cases
  - Fix symbol kind

Change-Id: I721b4a35972b6d2a6d0ac733ab770b096cc74e17
2019-01-18 15:41:01 -05:00
Ramesh Errabolu 28c3f9a269 Initialize queue buffer with Invalid Pkt Headers
Change-Id: I4166f1359746ee6829b730bac2db358af72ab16e
2018-11-21 19:09:10 -05:00
Sean Keely 8e4177382a Check max wave scratch limits.
HW has limited bits for wave scratch base address stride.  Enforcement
prevents programs with larger than supported scratch allocations from
running and clobbering neighboring scratch space.

Change-Id: I574da888e9d1d5e290a9c0025ba13b5ef9f1e5c0
2018-11-16 20:59:20 -05:00