The hipGraph will use VMM by default when allocating memory.
However, the handle of Phy mem has been added to Memobj by default.
Since the Memobj will track the whole address range from handle to
handle + size, this needs the system to reserve the whole address
range. If the system range have not reserved by the system, then it
will have the potential issue that clr finds the Memobj incorrectly.
This patch removes the handle from the Memobj to fix this potential
issue.
Change-Id: I2da38e6b2d11d0d48e1afe66c46899500c290624
- Refactor blit code and clean ASAN instrumentation
- Use unified function for rocr copy
- Enable shader copy path for unpinned writeBuffer/readBuffer paths
- Set GPU_FORCE_BLIT_COPY_SIZE=16 which means we will use BLIT copy for
pinned copies or unpinned H2D/D2H copies < 16KB
Change-Id: I42045cca79234b340dbf53dafb93044199736ae4
The early return if the thread is not alive causes memory leaks.
Neither doorbell_ or urilocator are released if the thread is not alive.
This change alters the logic so regardless of the thread status the
HostcallListener releases its memory.
Change-Id: Ie912360ec0e2ee257de9937b1a8d7375e6aebd83
This change replaces some asserts, that were only available in debug
mode, with standard error handling.
Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>
Change-Id: I112f9e56f921abd72daf0d11e4ecdcb7b1a9f9e6
=> If null stream is not created during sync skip nullstrm creation
=> Do cpu wait on blocking & null stream if it exists
Change-Id: I90d6ced6a2dd1782ba58f3fed4e3608fc0efa55a
hipMallocAsync/hipFreeAsync APIs should return error stating
operation is not supported, if a stream is actively capturing
and is different from the passed stream
Change-Id: I2a1b8260c5eb22d99a936ac529d6788a83f81a17
1) for case where kernelParam_.func = nullptr, the validation
fails in setParam call and memory is not alloced for kernelParams
2) destructor path segfaults trying to free the kernelParams memory.
3) copy of params is done after function validation is successful.
Change-Id: I6338e0c89f259632e4115f0508e2f240bc207fd9
hsa_amd_profiling_async_copy_enable is taking 45us for the first call. Disable sdma profiling for enqueuing captured kernel packets and for accumulate command.
Change-Id: I80b51a58c46bccc9c1025e9331515f57c97b5a2a
This reverts commit efce2f77c4.
Reason for revert: Even though this change is valid, this would break backward compatibility.
Change-Id: I9c7cab83198c8d5c8485b11194099162e3e7a874
Currently amd::Monitor can work in FILO mode for the active waits
and cause a delay in wakeup of some threads. That may have a problem
with the current sysmem pool design.
Change-Id: I145081478d1e0b282d8838855c5718f09cf54b69
1) For Dynamic CO variables, free the device pointer in
DynCO destructor instead of DeviceVar destructor.
2) For Static CO Remove Fatbinary,
only call hipFree for valid device Vars instead of all devices.
Change-Id: I84291f5371b2c05d1d0bcdb4f9c6bd122e7c9b21
Runtime may use checkGpuTime() for the wait and not just for the GPU time queries. Hence, the call can't be skipped if profiling isn't enabled.
More changes are required for this optimization.
Change-Id: I79e8918312e755d75f0d26685f2fdc604a8ffb18
Modified hipFuncSetAttribute to handle pointers to dynamic functions
returned by hipModuleGetFunction.
Change-Id: I54b98f9d31a79630dd7edcd363fad81f1d89219b
- Remove binning logic, although useful it doesnt work in current
scenario as there is no upper limit on the size of allocation. If an
app or framework uses entire VRAM and then creates suballocs, binning
would result in failure.
Change-Id: Icc27c13e433bb4a1f03e82028d8718488b43bfa5
Replaced clGetExtensionFunctionAddress calls with
clGetExtensionFunctionAddressForPlatform to ensure
interoperability with distribution ICD loaders.
Change-Id: I560a62459f2ad222750e65e869b98d6b6ec56665