Граф коммитов

10 Коммитов

Автор SHA1 Сообщение Дата
David Yat Sin 7ea25ebb85 rocr: Add thread priority for AsyncEventHandler
Set priority to maximum for signal event handler and minimum for
exceptions event handler.

Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc
2025-01-24 10:08:12 -05:00
David Yat Sin 8d3fee5095 Use HybridMutex for signal mutexes
Implement HybridMutex to improve latencies compared to KernelMutex when
there is contention between several threads calling hsa_signal_create
and hsa_amd_signal_async_handler.

Change-Id: If53377033e749b0050727964c9303f09b02527cc
2024-01-16 21:29:39 +00:00
David Yat Sin 0ed1568afc Add function for parse CPUID information
Used to detect whether mwaitx instruction is supported

Change-Id: I66fe906325aa523c8815133cf782df3a17a7edab
2023-02-22 16:55:42 +00:00
skhatri e7fc301aa7 Adding support for rocrtracer tools loading without environment variable
During hsa initializing stage, ROCr now searches all the loaded libraries
for a  symbol "HSA_AMD_TOOL_PRIORITY" and adds all those libraries to
the tools library init list.  Tools libraries listed in HSA_TOOLS_LIB
env variable are also loaded in the given order and take priority
over HSA_AMD_TOOL_PRIORITY.

Change-Id: I739af42bbd777c44a9152c11e17dd69979b65e82
2022-06-23 20:08:30 -04:00
Sean Keely 0ee82742a7 Switch to CLOCK_BOOTTIME for HSA system clock.
This is consistent with KFD and has significantly better latency.
KFD is taking this as the definition of the SystemClockCounter.

Change-Id: I4c1b3bc58c738206265c55ebefd41356c013bfe5
2022-05-05 15:27:29 -04:00
Sean Keely df55cb0450 Rework memory locks to allow device parallelism in alloc/free.
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel.  Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.

The fragment allocator now requires separate protection and is protected with a
mutex at the device level.  Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate.  This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim.  Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache.  So some device
level serialization is required in at least some paths.

Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00
2021-11-24 19:22:05 -06:00
Sean Keely e9a4eff8a1 Update licensing and remove duplicate licenses.
Change-Id: I0aab6f310d96bf6c5a918e7a9c03713a00dc5c4a
2020-06-22 14:19:30 -04:00
Ramesh Errabolu fa13208698 Add rocr namespace to core header and impl files
Change-Id: I1e1b33f9bba1078d049bc19797889988c3e43360
2020-06-19 22:34:21 -04:00
Sean Keely 426d41e27c Adjust signal sleep to reflect null kernel latency. Performance tested on Gromacs.
Change-Id: I3851148ee8544b15d840f2c26ca73a83f8d0df2e
2017-03-09 15:20:53 -05:00
James Edwards (xN/A) TX 7d2bc9d113 Separate open source core runtime code from DK makefiles.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1250152]
2016-03-22 18:10:13 -05:00