rocm-systems

Автор	SHA1	Сообщение	Дата
David Yat Sin	7ea25ebb85	rocr: Add thread priority for AsyncEventHandler Set priority to maximum for signal event handler and minimum for exceptions event handler. Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc	2025-01-24 10:08:12 -05:00
David Yat Sin	8d3fee5095	Use HybridMutex for signal mutexes Implement HybridMutex to improve latencies compared to KernelMutex when there is contention between several threads calling hsa_signal_create and hsa_amd_signal_async_handler. Change-Id: If53377033e749b0050727964c9303f09b02527cc	2024-01-16 21:29:39 +00:00
David Yat Sin	0ed1568afc	Add function for parse CPUID information Used to detect whether mwaitx instruction is supported Change-Id: I66fe906325aa523c8815133cf782df3a17a7edab	2023-02-22 16:55:42 +00:00
skhatri	e7fc301aa7	Adding support for rocrtracer tools loading without environment variable During hsa initializing stage, ROCr now searches all the loaded libraries for a symbol "HSA_AMD_TOOL_PRIORITY" and adds all those libraries to the tools library init list. Tools libraries listed in HSA_TOOLS_LIB env variable are also loaded in the given order and take priority over HSA_AMD_TOOL_PRIORITY. Change-Id: I739af42bbd777c44a9152c11e17dd69979b65e82	2022-06-23 20:08:30 -04:00
Sean Keely	0ee82742a7	Switch to CLOCK_BOOTTIME for HSA system clock. This is consistent with KFD and has significantly better latency. KFD is taking this as the definition of the SystemClockCounter. Change-Id: I4c1b3bc58c738206265c55ebefd41356c013bfe5	2022-05-05 15:27:29 -04:00
Sean Keely	df55cb0450	Rework memory locks to allow device parallelism in alloc/free. Prior solution used a single global lock to protect the memory tracking structures. This change protects the memory tracking structure with a shared mutex (rw lock) in shared (r) mode for memory allocations and frees so that long duration processes, calling to kfd, can be done in parallel. Operations which must modify the memory map take the mutex in exclusive mode (w) and must not call to the thunk while holding the mutex. The fragment allocator now requires separate protection and is protected with a mutex at the device level. Protecting at the device level, rather than pool, allows retention of the current recursive design and allows calling Trim from withing Allocate. This could be made finer (pool level locks) but would require backing out of Allocate entirely to call Trim. Trim and any retried Allocation must be done in isolation (per device) or we may report OOM when memory is actually available in some pool's fragment cache. So some device level serialization is required in at least some paths. Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00	2021-11-24 19:22:05 -06:00
Sean Keely	e9a4eff8a1	Update licensing and remove duplicate licenses. Change-Id: I0aab6f310d96bf6c5a918e7a9c03713a00dc5c4a	2020-06-22 14:19:30 -04:00
Ramesh Errabolu	fa13208698	Add rocr namespace to core header and impl files Change-Id: I1e1b33f9bba1078d049bc19797889988c3e43360	2020-06-19 22:34:21 -04:00
Sean Keely	426d41e27c	Adjust signal sleep to reflect null kernel latency. Performance tested on Gromacs. Change-Id: I3851148ee8544b15d840f2c26ca73a83f8d0df2e	2017-03-09 15:20:53 -05:00
James Edwards (xN/A) TX	7d2bc9d113	Separate open source core runtime code from DK makefiles. [git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1250152]	2016-03-22 18:10:13 -05:00

10 Коммитов