Граф коммитов

43 Коммитов

Автор SHA1 Сообщение Дата
Sunday Clement 1635746a9c rocr: Fix Potential Deadlock
Moved the Call to pthread_mutex_lock to an else statement for better
code readibility.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-04 10:18:09 -04:00
Sunday Clement a97b7df4b9 rocr: Fix Potential Deadlock
Because eventDescrp->mutex is a non-recursive lock attempting to
acquire the lock with pthread_mutex_lock can cause the system to hang
indefinitely if the lock was already previously aquired with the
preceeding call to pthread_mutex_trylock.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-04 10:18:09 -04:00
Sunday Clement 293092f32f rocr: Fix Resource Leak
allocated memory was previously not freed in the event of an error
with rwlock initialization.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-05-30 09:16:26 -04:00
Sv. Lockal 5d04bd42f3 Fix build issues for musl libc (#267)
Change-Id: Ia31330b0f96669966712b58986abeca754c2cbb9
2025-01-29 14:31:05 +00:00
David Yat Sin 7ea25ebb85 rocr: Add thread priority for AsyncEventHandler
Set priority to maximum for signal event handler and minimum for
exceptions event handler.

Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc
2025-01-24 10:08:12 -05:00
Apurv Mishra 89115369cc rocr: declare 'args' as class member in 'os_thread'
Removed 'args' as a unique pointer and deletion in
'ThreadTrampoline', then declared as a class member.

Change-Id: Ia52058392d0170e8b5e57cfdd2c587f47a6f93f0
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-11-27 10:27:40 -05:00
David Yat Sin f58aff630c rocr: Fix sem_post overflow errors
WaitSemaphore and PostSemaphore are used in the HybridMutex
implementation. If HybridMutex did not have to call WaitSemaphore when
acquired, then calling PostSemaphore would cause the internal count
inside sem_t to slowly grow to large values and eventually cause
overflow.

Change-Id: I173fc17c874b49926e56991405e9086ea8c138fc
2024-11-13 21:57:26 -05:00
David Yat Sin c8dd4d2b3b rocr: Handle pthread_create returning errors
Rewriting logic to fix issue where pthread_create would return errors
other than EINVAL, and these errors would be ignored.

Change-Id: I573958724dcf886c20e8c14e6a9182303b3ffa06
2024-08-22 12:15:10 -04:00
James Xu a621bca303 Fix compile errors with musl>=1.2.3
Patch submitted on behalf of user AngryLoki:

The fix repeats common pattern, used for musl, 
e.g: https://github.com/void-linux/void-packages/blob/5ccf1c66a1df2d644e1a0db0a68fca321469c57e/srcpkgs/MangoHud/patches/0001-elfhacks-d_un.d_ptr-is-relative-on-non-glibc-systems.patch#L90.

Quoting:
d_un.d_ptr is relative on non glibc systems

elf(5) documents it this way, glibc diverts from this documentation

Change-Id: I815f88f127ef00c88ae827a8ad48df0d33c92467
2024-08-19 11:02:29 -04:00
Saleel Kudchadker 26e105d9ab Initial external logging API
New API to accept a file stream for logging

Co-authored-by: David Yat Sin <David.YatSin@amd.com>

Change-Id: Ie09c35ae14ca86a97eb25f61251be287c55d7169
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-08-07 02:59:00 +00:00
David Yat Sin 2f05c2a273 Revert "Use pthread_setaffinity_np"
This reverts commit 1df7a44112e45b7fb447926778490f741601219a.

Change-Id: Ib386c8f944b6da0ef68ddd2be3f26013cd36ef5b
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Yat Sin 1cee8656df Revert "Use pthread_attr_setaffinity_np when available"
This reverts commit ef95ccf81e59b8608861e8f2f256d981eee19df7.

Reason for revert: Causing performance regressions on some systems

Change-Id: I82951350cafbd57c495852d6f90023a3373f04f6
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:27:09 -05:00
David Yat Sin 57b93e02a4 Use pthread_attr_setaffinity_np when available
If pthread_attr_setaffinity_np function exists use it instead of
pthread_setaffinity_np as pthread_setaffinity_np seems to fail to set
the affinity settings on some systems.

Change-Id: Icd8b17039699ac10d9cd5c4dbb6ac44630673949
2024-04-29 15:02:54 +00:00
Shweta.Khatri bc9cac97fe Fixing compilation errors related to MUSL libc
Fix Musl libc NULL errors and unsupported pthread funcs for compatibility.
Also ensures cleanup and error handling irrespective of CPU affinity override.

Fix submitted by github dev - AngryLoki
https://github.com/ROCm/ROCR-Runtime/issues/181

Change-Id: Ia487315e504112be5d3370756f23f6e23b9ae4be
2024-04-17 07:14:15 -04:00
David Yat Sin 8d3fee5095 Use HybridMutex for signal mutexes
Implement HybridMutex to improve latencies compared to KernelMutex when
there is contention between several threads calling hsa_signal_create
and hsa_amd_signal_async_handler.

Change-Id: If53377033e749b0050727964c9303f09b02527cc
2024-01-16 21:29:39 +00:00
David Yat Sin 6333fdecf3 Use pthread_setaffinity_np
On some systems, pthread_addr_setaffinity_np does not exist, so we need
to use pthread_setaffinity_np on thread after pthread_create

Provided by Julian Samaroo on github

https: //github.com/RadeonOpenCompute/ROCR-Runtime/pull/143
Change-Id: I4649f94333f2d7b0a5993b370a4bfc48d92acecb
2023-12-18 17:41:49 -05:00
David Yat Sin f07b8f2250 Use CPU_SET_S instead of CPU_SET
Fix incorrect use of CPU_SET on variable size cpu_set_t

Suggested by Christopher E. Moore on github
https://github.com/RadeonOpenCompute/ROCR-Runtime/issues/130

Change-Id: I710b56683ba07c08dcd83c851bf72e4f127a0ad4
2023-12-04 15:05:22 +00:00
Jeremy Newton 132a19e9c3 Fix non-x86 builds
I've just reverted some code what it was in 5.5 by wrapping new x86
specific bits with #if's, e.g.:
- CPUID is x86 specific
- mwait is x86 specific

Change-Id: I6cefae34282c777c7340daf3f934d2a11742502e
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-30 01:04:04 -04:00
Lancelot SIX 183f5d90aa linux os_thread: improve error handling
On Linux, the os_thread abstraction is built on top of pthread.  Many of
the pthread calls might fail and return error codes.  The error
conditions are only checked via assertions (if ever checked) which means
that when doing a release build, no error condition is checked.  The
same goes for dlsym/dlinfo and clock_gettime.

This commit improves the situation this by checking the error conditions
and acting accordingly.  When the error condition is detected in a
function with a mean to indicate some error to its caller, then this
patch prints some error message and returns.  If there is no way to
propagate the error up the call stack, print some error message and
abort the process.

For the os_info::os_info ctor, the only user is CreateThread, which
checks that the built thread is Valid().  If not, nullptr is returned to
the caller.

It could be possible to use exceptions when functions cannot pass
errors, but for now I only use abort as it is what abort would do with
debug build.

Change-Id: I815703c3b95777cc29bb89a7d654ac879c14a759
2023-04-17 09:48:11 -04:00
David Yat Sin 0ed1568afc Add function for parse CPUID information
Used to detect whether mwaitx instruction is supported

Change-Id: I66fe906325aa523c8815133cf782df3a17a7edab
2023-02-22 16:55:42 +00:00
Shweta Khatri 8aac885318 Fixes hang due to change in order of initialization of libraries
Fixes hang due to change in order of initialization of libraries
that have cyclical dependencies and they call hsa_init() during their
initialization phase.
This implementation looks for a symbol called "HSA_AMD_TOOL_PRIORITY"
across all loaded shared libraries using dynamic section entries of the
loaded lib instead of using dlopen and dlsym for the same purpose.

Change-Id: I4865f2fd18dd186ec311a432ec38fbb5583805d2
2023-01-26 01:17:22 -05:00
Shweta Khatri 8751e65b79 Fixed callback method for dl_iterate_phdr api which is called for each loaded shared object
Simplified the callback method. Also fixed the way, loaded shared object were getting appended into a string vector,
which was not being passed to this callback method.

Change-Id: I68661dd73f61a11c42fa92f670e8e7b6ffcb5711
2022-11-21 19:00:34 -05:00
David Yat Sin dd255d31b8 Fix uninitialized variable warning
Fix warning when using valgrind

Change-Id: Ie59eaa990b9b5d339a178a2c6f9f4fac0e34e925
2022-09-08 09:10:00 -04:00
David Yat Sin df3fe8c2fb Add env variable to disable CPU affinity override
New environment variable HSA_OVERRIDE_CPU_AFFINITY_DEBUG to
enable/disable overriding CPU affinity.

Default value is enabled(1).

This is a temporary variable and may be removed in the future.

Change-Id: Id6a7c611730471ddc276ca333fde1e57046bf32a
2022-08-19 11:07:49 -04:00
skhatri e7fc301aa7 Adding support for rocrtracer tools loading without environment variable
During hsa initializing stage, ROCr now searches all the loaded libraries
for a  symbol "HSA_AMD_TOOL_PRIORITY" and adds all those libraries to
the tools library init list.  Tools libraries listed in HSA_TOOLS_LIB
env variable are also loaded in the given order and take priority
over HSA_AMD_TOOL_PRIORITY.

Change-Id: I739af42bbd777c44a9152c11e17dd69979b65e82
2022-06-23 20:08:30 -04:00
Sean Keely 0ee82742a7 Switch to CLOCK_BOOTTIME for HSA system clock.
This is consistent with KFD and has significantly better latency.
KFD is taking this as the definition of the SystemClockCounter.

Change-Id: I4c1b3bc58c738206265c55ebefd41356c013bfe5
2022-05-05 15:27:29 -04:00
Sean Keely df55cb0450 Rework memory locks to allow device parallelism in alloc/free.
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel.  Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.

The fragment allocator now requires separate protection and is protected with a
mutex at the device level.  Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate.  This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim.  Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache.  So some device
level serialization is required in at least some paths.

Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00
2021-11-24 19:22:05 -06:00
Sean Keely 3127d1ffdc Ensure ROCr created threads have no CPU affinity.
Change-Id: I53828dbaf055b65b61bdd11f0eadfcc806596821
2021-04-19 19:47:06 -05:00
Ramesh Errabolu fa13208698 Add rocr namespace to core header and impl files
Change-Id: I1e1b33f9bba1078d049bc19797889988c3e43360
2020-06-19 22:34:21 -04:00
Sean Keely ce19721c88 Update copyright date.
Change-Id: If4bf4c20cf051878bfe759080bb7345d884dd53d
2020-06-19 22:34:01 -04:00
Ramesh Errabolu 45958c727d Extend ROCr to surface UUID of GPU devices that suppport
Change-Id: I478db68d69a01578770403fa695f9e6391637573
2020-04-08 19:19:22 -05:00
Sean Keely 0c0e634458 PTHREAD_STACK_MIN may differ from system parameters.
Restrict stack adjustment to non-default stack requests and allow
stack growth within reason (20MB cutoff).

Change-Id: I320280c711402ac29683e94c7246b7c32c797611
2019-06-17 21:04:17 -05:00
Sean Keely 9f81bdfbe1 Add exception and error safety for CreateThread.
Change-Id: I82aaf64e039ca9614b4948deec1f87147f56279a
2019-05-24 22:39:55 -04:00
Sean Keely a913549190 Correct pthread join/detach handling.
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.

Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65
2019-05-16 12:14:06 -05:00
Sean Keely ca4c884306 Report library load errors in debug builds.
Change-Id: I24e63b15ad74fb86ecfe839f543800c2140c09d9
2017-12-05 18:49:33 -05:00
Sean Keely 30fce248c6 Enable use of CLOCK_MONOTONIC_RAW for post 4.4 kernels.
Change-Id: I3c1f27c7e639df5128c36d81f715fa16e6c1cf13
2017-09-20 14:28:23 -04:00
hthangir 87d2df3da3 Use non-RAW version in clock_getres to workaround bug in older kernels
Change-Id: Ice0606a42cd7054f0804baf4af3521ffae3b7d50
2017-09-14 13:56:15 -05:00
Laurent Morichetti 0f05ef73ac Include <errno.h> for EBUSY
Change-Id: I9fa3417445866f3ce37af2169f623afa8e92e873
2017-08-31 07:32:51 -07:00
Sean Keely 29b5b5c029 Correct handling of slow clocks under linux.
Change-Id: I9a1b08d5457caa6739220603bbd37b00febc64d7
2017-07-12 12:49:49 -04:00
Sean Keely 426d41e27c Adjust signal sleep to reflect null kernel latency. Performance tested on Gromacs.
Change-Id: I3851148ee8544b15d840f2c26ca73a83f8d0df2e
2017-03-09 15:20:53 -05:00
James Edwards (xN/A) TX 7d2bc9d113 Separate open source core runtime code from DK makefiles.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1250152]
2016-03-22 18:10:13 -05:00
James Edwards (xN/A) TX 7d1e6c3a57 Remove opensrc test files.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1249961]
2016-03-22 13:39:51 -05:00
James Edwards (xN/A) TX c9ffe0004e Check open source core runtime code into perforce. This includes license and README files.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1249136]
2016-03-20 15:39:40 -05:00