- hipStreamWaitEvent may not resolve streams
- Correct usage of flag passed to streamWait function
Change-Id: I2ee163615d303b98937c1035d60da283cce6f677
[ROCm/clr commit: 940347ad42]
- This change tries to save extra synchronization packets we may insert
as we didnt track the completion signals for every command. We track
the current enqueued command until it exits the enqueue stage. We also
record the exit scope to know if we flushed the caches
- Handle correct release scopes and store completion signal as HW events
- Use a new finishCommand implementation to only wait for the command
passed as the argument
Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc
[ROCm/clr commit: e03e4f3b5d]
Since hipMemMap can be called for multiple device handles on the same virtual memory, the same is true for hipMemUnmap, meaning that virtual memory can be "partially unmapped".
This means that the unmap function can be called for a specific part of the reserved address, meaning that only the designated subbuffer should be released. If unmap is called on the entire reserved memory, then all subbuffers should be released.
The main point is that for every hsa_amd_vmem_map, there should be a corresponding hsa_amd_vmem_unmap. Otherwise, if entire memory is unmapped by a single unmap call, then HSA will report the memory as "in use" if an attempt is made to delete it.
Change-Id: I039308eafb820decfb1c09f60347f26cdad1a362
[ROCm/clr commit: 3ec1d2d2f1]
- Use getBuffer/releaseBuffer in BlitManager
- Cleanup XferBuffer as we use ManagedBuffer for both reads/writes
Change-Id: I2661b85dd012763b17a38a743fec1b1d79125f67
[ROCm/clr commit: 37d606d193]
- If any kernel uses device heap, the launch needs to be preceeded by an
init kernel, Save on the extra barrier packet launch/flush between the
init heap kernel and user kernel
Change-Id: I8ebc6246188200e5f673dc464bc76a53bcb8b7c6
[ROCm/clr commit: ca530c660b]
1) Add Linker APIs to runtime to support SPIRV linking
2) Migrate Internal implementations to runtime and share with rtc
3) Add Support to bundled and unbundled SPIRV Code object linking.
Change-Id: Ic1fd4431f842a208a2468e8aec54a65b5fa6b0e3
[ROCm/clr commit: 5930f047bb]
This change removes the stream callback from hipStreamWaitEvent and
uses a stream memory wait operation instead. This allows the
hipStreamWaitEvent to be non-blocking on the host.
Change-Id: Ie5530febda5a5bcb5daa0db8a01249d6b137fd43
[ROCm/clr commit: 721c5800ca]
- Add custom compare to the map of queues, which will help with
the round-robin selection
Change-Id: Ie67a820bfb1a5b484a1b3edced967eed94228bb8
[ROCm/clr commit: ba8e740be4]
Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.
Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500
[ROCm/clr commit: 296dce5570]
Compute doesn't support IB chaining, but RGP may collect
perf counters, which require more space in CB.
Increase CB size if RGP is enabled.
Change-Id: Iaa0a620ead8541a679b0dfe5e5711af5afdba545
[ROCm/clr commit: 63cf3057ba]
- Use correct header in device_library_decl
- use std:: instead of __hip_internal:: for host compilation
- hide device specific stuff behind __clang__ and __HIP__ check
Change-Id: I2f3647e00555ed0e79f9954a459c41394c3cd49b
[ROCm/clr commit: c3f49c8788]
- Also add a cache, which allows compiled code objects to be reused
instead of compiling again. This should improve performance on
multigpu systems.
Change-Id: Ib135d616c076b77f8aaf28de275d408b38021d89
[ROCm/clr commit: 0391aec14a]
There are 2 functional changes to this patch:
* Use GPU timing for internal markers for HIP.
* Measure CPU time closer to GPU timer, to reduce delta between GPU/CPU timestamp measurements.
There are some smaller non-functional updates:
* waifForFence -> waitForFence typo
* Remove unused drmProfiling
Change-Id: I4c5fa600a842ab60e454888779edcac8449a902a
[ROCm/clr commit: 179801a750]
Resolved an issue where hipEventSynchronize and hipStreamWaitEvent APIs
did not function correctly for events created with the hipEventInterprocess flag.
The bug caused the event to be incorrectly marked as "recorded,"
leading to these APIs failing to wait for the event as expected.
Change-Id: Ic9fdfaab2393beb93d6e0b83661545e902a63499
[ROCm/clr commit: 1cdfbfd270]
- Fix regression for D2H pinned copies which adds systemscope release.
- Skip cpu wait for D2H unpinned copies as we can pass the signal of the
barrier to rocr copy.
- Fix an old bug in sdmaEngineRetainCount_ logic
- Improve logging
Change-Id: If074bddb05564b15949b0d5f9bf12acd3692174e
[ROCm/clr commit: 4c95ee5e1e]
Make ocltst -m tests/ocltst/liboclruntime.so -t OCLMemoryInfo
pass in emu where GPU memory is very big.
Cherry pick
https://gerrit-git.amd.com/c/compute/ec/clr/+/1014858
Change-Id: I0228c5e87ce7c366983fd4af71c25e7f8161c2c7
[ROCm/clr commit: de83d7a6ae]
hipGetLastError should return the error by any of the previous APIs
in the same host thread to match the CUDA behavior, whereas
hipExtGetLastError will return the error by the immediate previous API.
This Ext API was added earlier to facilitate the existing HIP apps which
are following the current behavior of hipGetLastError
Change-Id: I61e95b1fc136cc761e2434e02187b7ed2598b733
[ROCm/clr commit: 4b443f8133]
BatchMemop should be positioned before the image support kernels
because the total number of kernels is determined by BlitLinearTotal,
when there is no image support on the device.
Change-Id: I8e53caf744ba54259ac04bad1762eef21806f3f2
[ROCm/clr commit: 3e01da3dac]