Adding virtual memory management APIs to rocclr.
The HIP layer will handle virtual allocs on devices.
Change-Id: Ia978f105c2c3fed3959c77580ba228e845105754
[ROCm/clr commit: b5f555f9ec]
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker
Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0
[ROCm/clr commit: 8eeaa998c0]
Remove assert for kernel arg size, because COv5 reports a value
bigger than the actual usage in the most of cases
Change-Id: I8e15bc45a9e21b58a5894f9977511ca84408ce61
[ROCm/clr commit: 2be0b1e612]
- Clean up detection by using visual studio macros to detect arch; I
didn't list all possible ARM platforms (can be done later if desired)
- Fixed two incorrect uses of !defined(ATI_ARCH_ARM) to instead use
defined(ATI_ARCH_X86), as they contain X86 specific code
- Fixed one use of __ARM_ARCH_7A__ to use ATI_ARCH_ARM instead
This is an improvement to the fixes in the last patch for SWDEV-323669
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I8568167293c34ad5331902105877f3ab6e25acb3
[ROCm/clr commit: 00efdc1cd6]
With COv5 local size calculation must occur before
runtime programs kernel arguments
Change-Id: I0726c6529bde69b8fcf5360aa83986cf84e04168
[ROCm/clr commit: caa6110c29]
Adding opaque data handle to memory. This is used to look back the HIP object associated with it.
Change-Id: I1bbb14a915bed79c6c3593a29a627778c7aaf13a
[ROCm/clr commit: 867346520f]
Add ROCR memory detection and enable arena mem object for possible
access in HIP
Change-Id: Icf86ac789176bfee4ea8d36b0970a817d4c6a2f7
[ROCm/clr commit: 28597ec5b5]
- Fix a crash with AMD_CPU_AFFINITY=1 as numa_bitmask_alloc isnt the
right api to allocate bitmask
- Do not set affinity for ROCr thread. It worsens performance rather
than any improvement.
- Fix regression from my previous change for event handler.
Change-Id: I3ea75adc2a6333f29752283eddd5b555e9b58cc5
[ROCm/clr commit: 802c2c8a9f]
- Queue handler for hipEventRecord(aka marker_ts_) only if there is a
callback associated with it.
Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a
[ROCm/clr commit: 3c3c0ca4c5]
Update timeout for hostcall wait for signal. If the timeout is small it
checks frequent enough to affect performance for certain applications
which may be CPU bound.
Change-Id: I0a879559e4ad111b09a994a5b82a6faf6e4fea3f
[ROCm/clr commit: 9292abb2d8]
It can be too early to allocate memory at the begining of
Device::create() under PAL
Change-Id: I4bd76db7be3f6fb246243ea68022d8b0f860471d
[ROCm/clr commit: 3af3fe10de]
CMake assumes we're bundling on x86, but for GNU compatible compilers,
we should rely on the compiler target to set the build arch.
For non-gnu compilers, just fall back to assuming x86 (no change).
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: Iee9794e6f7c3973c781ddaf740ded77f34712c4f
[ROCm/clr commit: f2e5ef5617]