Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback
to notify when HSA memory has been released.
Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d
CPUClockCounter is not NTP adjusted (CLOCK_MONOTONIC_RAW) so should be
better for measurements. However, it is implemented with syscall while
CLOCK_MONOTONIC is implemented via vDSO. The latency increase becomes
significant when language layers make corresponding clock measurements.
Reverting to CLOCK_MONOTONIC will reduce latency and allow small
duration events to be measured at the cost of incorporating NTP
frequency skew errors. NTP may adjust frequency by 500ppm so limits us
to ~3 decimals in elapsed time.
Change-Id: I920b9f707f47109d80d6c256c475638c03fb8d76
Small times may be given to time conversion if GPU clocks are used to
accumulate elapsed time. Because HSA APIs deal in absolute time this
leads to large conversion offsets of order system uptime. Variation
in relative clock ratio estimation may be amplified in this case,
destroying elapsed time measurements.
This patch fixes the relative clock ratio used for times which predate
the call to hsa_init. This correlates errors in such times allowing
the elapsed time to be correctly computed.
The effective maximum system uptime before elapsed time conversion becomes
inaccurate is ~3.5 months. GPU event timestamps are good for process uptime
of ~3.5 months. These are limited by double's mantissa precision.
Change-Id: I48752ff354920439d91016d6f2b0c8ddfa60b445
Also rename blit_agent to region_gpu and add comments to clarify
its role in deprecated region API support rather than to do blits.
Change-Id: I80b1043db2e1c5d40a58fc801eef70a688ea9169
During registration we must not call any function that depends on registered
data as the lists are not yet complete. This includes signal allocation since
allocating shared GPU mapped memory depends on the list of GPUs.
Change-Id: I94d59e847802c546c2a5a0d9f55fe5ac3fd1d878
Debug agent requires handles to internal queues for single step debugging.
Added tools only API hsa_amd_runtime_queue_create_register for reporting.
hsa_amd_runtime_queue_create_register sets a callback which is invoked
when internal queues are created.
Change-Id: Ia5190ae724fadba686c15f25b2cd085350eeff0e
Required for debug agent requires copy API and trap handler to be initalized
prior to loading. Existing tools do not make use of internal queue or scratch
memory intercept which is what PostToolsInit allows.
PostToolsInit() will be removed in a following cleanup change.
Change-Id: If43377843808e3eff0defd9204910a67a852902f
Apertures now overlap with the change to 48bit addressing which
precludes using aperture checks to discover buffer ownership.
Switches to ptrinfo to decide which device a buffer owned by.
This corrects faults in the legacy hsa_memory_copy api.
Change-Id: I5c7ce0216e1cdc96f836fc6fec9c3defdf4b9d90
Adds HSA_AMD_SYSTEM_INFO_BUILD_VERSION=0x200 to hsa_system_info_t.
This returns a const char* pointing at the build string (git describe).
Change-Id: I73e6612482bf6ffc4037fd365808eb9211a650ad
Adds env flag HSA_REV_COPY_DIR. If set to 1 async copy will
copy from dst device to src device rather than from src to dst.
Change-Id: I3095642066fa026dc112c2eac06db9393341cd7e
1/ Revised debug event handler to handle different events.
2/ Added queue error handler using the callback in queue create, which will print out wave info when queue in error state.
3/ Preempt queue instead of destory queue when queue error state.
Change-Id: Ib727d208de9caf1c72c76d42268483b24aaebde8
1. Add hsa ext api hsa_amd_register_vmfault_handler for debugger to register callback in case of VM fault.
2. Extend hsa_ven_amd_loader API to:
(1) iterate loaded code objects in executable:
hsa_ven_amd_loader_executable_iterate_loaded_code_objects
(2) get loaded code object info:
hsa_ven_amd_loader_loaded_code_object_get_info
3. Make the id of hsa_queue the same as the one used in communication with thunk (for amd_aql_queue)
Change-Id: I68910809e59e24297350d262606f00e96c14bcbd
Since access may only be manipulated on whole pages, suballocator fragments must cooperate to set the page's access.
Since the KFD does not migrate memory on access changes this implementation makes agent access sticky across the requests in a fragmented page.
Change-Id: I88479ed45fb40e9782b704526a7b8ffb22e7bd76
Track pointer info for sub 2MB fragment allocations in allocation_map_.
Add fragment support to IPC.
Change-Id: I00cfc2e2fa289aac90a4718c392f9bb056a61a87
Added an API for creating signals with attributes.
Added two APIs for IPC operations on signals.
Initial use of exceptions for error handling.
Add ref counting to signals.
Removed spin loops from signal destructors.
Signals are no longer to be destroyed with delete, use DeleteSignal instead.
Added delete safety to doorbells.
Added secondary hsa_signal_t -> Signal* translation path for IPC enabled signals.
Change-Id: Id59065d002f0c2566b0a9425694da2ed27cb7d7f
When a fatal memory fault occurs the scheduler context-saves all queues
in the process and notifies the runtime through the memory event. The
saved state contains all GPR/LDS data at the moment of the fault.
Retrieve this state and present it to the user if HSA_DEBUG_FAULT is set
to "analyze" and the wavefront caused the fault. If amdgcn-capable objdump
is in the PATH invoke this to disassemble code around the PC.
Queue lifetime is now managed by the runtime to allow querying the
context save state for all active queues.
Change-Id: I6fee662fad1c4f9aa125bf5c53d7d0ea1ab32f95
Uncommented HSA IPC code.
Changed hsa_amd_ipc_memory_t to be 8 uint32_t's instead of 9 to
match spec
Change-Id: Id1523125e9b876a23c3743df1be29c98b47f6725