Add ROCR memory detection and enable arena mem object for possible
access in HIP
Change-Id: Icf86ac789176bfee4ea8d36b0970a817d4c6a2f7
[ROCm/clr commit: 28597ec5b5]
- Fix a crash with AMD_CPU_AFFINITY=1 as numa_bitmask_alloc isnt the
right api to allocate bitmask
- Do not set affinity for ROCr thread. It worsens performance rather
than any improvement.
- Fix regression from my previous change for event handler.
Change-Id: I3ea75adc2a6333f29752283eddd5b555e9b58cc5
[ROCm/clr commit: 802c2c8a9f]
- Queue handler for hipEventRecord(aka marker_ts_) only if there is a
callback associated with it.
Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a
[ROCm/clr commit: 3c3c0ca4c5]
Update timeout for hostcall wait for signal. If the timeout is small it
checks frequent enough to affect performance for certain applications
which may be CPU bound.
Change-Id: I0a879559e4ad111b09a994a5b82a6faf6e4fea3f
[ROCm/clr commit: 9292abb2d8]
It can be too early to allocate memory at the begining of
Device::create() under PAL
Change-Id: I4bd76db7be3f6fb246243ea68022d8b0f860471d
[ROCm/clr commit: 3af3fe10de]
CMake assumes we're bundling on x86, but for GNU compatible compilers,
we should rely on the compiler target to set the build arch.
For non-gnu compilers, just fall back to assuming x86 (no change).
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: Iee9794e6f7c3973c781ddaf740ded77f34712c4f
[ROCm/clr commit: f2e5ef5617]
Remove guarantee from AddMemObj as one can call it multiple times for
different devices
Change-Id: I49dd76068b3c4c709f17541159052302dcdb374d
[ROCm/clr commit: 3bf1d5ac97]
Currently COMGR doesn't provide global variable size and runtime
parses ELF binary directly. Avoid parsing for HIP. That can save
5% in hipModuleLoad() time.
Change-Id: I47540d1e957bdb0c2406b6b848222de2920b2504
[ROCm/clr commit: 2664d8cf9e]
Pass active queue for transfers in the cache coherency layer.
That will allow to use device transfer queue only for
cases when active queue isn't available, because using device
transfer queue from another active queue may cause a deadlock
Change-Id: Ifbe7e0303b77dbf6eeda3939ffbc25a3df7472de
[ROCm/clr commit: 95d55fdfa8]
If GlobalMemCacheLine reported is 0, runtime may run into an
infinite loop as the KernelSegmentAlignment is chosen as size of the
cache line.
Change-Id: Ide547940cc0407f16fab10ee210b4fd3ae4eaafc
[ROCm/clr commit: 041ddc0c1c]
OCL2.2 requires SPIR-V and runtime doesn't support it.
Make sure PAL backend doesn't report any SPIR-V support.
Change-Id: I8d179069674205b54f7d20d149bcb675bee5cdb0
[ROCm/clr commit: 0bf395af39]
Metadata in Codeobject version 5 is the extension of CO3 and CO4.
Add the detection of the new fields and program them in
the setup of the kernel arguments.
Change-Id: I27e58df77320ad00f4f16d35912668db803826af
[ROCm/clr commit: be6a06384e]