- Clean up detection by using visual studio macros to detect arch; I
didn't list all possible ARM platforms (can be done later if desired)
- Fixed two incorrect uses of !defined(ATI_ARCH_ARM) to instead use
defined(ATI_ARCH_X86), as they contain X86 specific code
- Fixed one use of __ARM_ARCH_7A__ to use ATI_ARCH_ARM instead
This is an improvement to the fixes in the last patch for SWDEV-323669
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I8568167293c34ad5331902105877f3ab6e25acb3
- Fix a crash with AMD_CPU_AFFINITY=1 as numa_bitmask_alloc isnt the
right api to allocate bitmask
- Do not set affinity for ROCr thread. It worsens performance rather
than any improvement.
- Fix regression from my previous change for event handler.
Change-Id: I3ea75adc2a6333f29752283eddd5b555e9b58cc5
Set affinity to the closest node of the current GPU. This reduces
the latency to fetch kernel args since device would query the CPU cache
of core which did the dispatch. This behavior is controlled with
AMD_CPU_AFFINITY env var(disabled by default)
Change-Id: I65afba62cb818ea25a311b88d1c0dd5c51330292
Setting AMD_CPU_AFFINITY = 1 will make runtime honor core affinity that
the process may set. This is disabled by default as it can prevent
worker thread or any thread that runtime creates from getting scheduled
thus affecting performance.
Change-Id: Ibe4cc95e7b99caee5ce750b7bf66e09e999cc9a3
The os.hpp header gets added to the include path of legacy llvm via the compiler lib. Having "windows.h" included causes a lot conflicts with LLVM headers, as they forward declare many Windows types. Best to not include it here.
Change-Id: I60c44a8d28660368f1a4a95741e1053ef3528fa1
Replace amd::Atomic with std::atomic. Remove make_atomic uses by
converting the variable to std::atomic and making sure the memory
order is relaxed when synchronizes-with is not needed.
Delete utils/atomic.hpp.
Change-Id: I0b36db8d604a8510ac6e36b32885fd16a1b8ccfa
When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be
referenced but they are not initialized in that early time. The patch
will use constexpr to initialze global constant varables in compile
time.
Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef
e.g.:
warning: expression does not compute the number of elements in this
array; element type is '__cpu_mask' (aka 'unsigned long'), not
'uint32_t' (aka 'unsigned int') [-Wsizeof-array-div]
for (uint i = 0; i < sizeof(mask_.__bits) / sizeof(uint32_t); ++i) {
__bits is a __cpu_mask, which is a 64-bit type. These were accessed
through uint32_t pointers so the loop bound should have been
correct. These operations can be done directly on the 64-bit type so
we can leave the array size pattern, and eliminate the casts.
The case in getNextSet should probably be rephrased in terms of
__cpu_mask to avoid the pointer casting, but this is tricker than the
other cases so I used the easy option to quiet the warning.
Change-Id: I1332584fad58439ccd9d369589519a9918e1678e