Addresses the below scenarios like parameters mismatch
in memcopy node, difference in the count of nodes, difference
in the dependencies of the nodes.
Change-Id: I31c6516fb27cc1007809f1b50306fdb0c2356ccc
[ROCm/clr commit: f16d336e32]
hip headers such as hip_math_constants.h, library_types.h,
hip_common.h, channel_descriptor.h, device_functions.h, hip_complex.h,
hip_texture_types.h, math_functions.h, surface_types.h
are added in HIPRTC
Change-Id: I4a4c198449ceb609c3ff55e00b43056c1f085431
[ROCm/clr commit: d7d0f11318]
Add trap handler code into runtime and compile/load during
device initialization. The current interface for trap handler in
PAL is obsolete and the new interface will be provided later.
Change-Id: I1fa702c5d1f2e6731f781369c980d546cf422328
[ROCm/clr commit: e1d34cb24f]
Three for loops iterate over all graph nodes for UpdateStream, FillCommands and
EnqueueCommands has performance drop for large graphs.
Change-Id: I077accf3a4680d5d944b73200fd6498a7a48f25c
[ROCm/clr commit: 530dc6de2a]
When kernel function expects no parameters no error should be returned
if both kernelParams and extra arguments are set to null.
Change-Id: I5941bcc400b6fb380e623bdae0233ae3e4f73815
[ROCm/clr commit: 315082e554]
This reverts commit 58e62063f3.
Reason for revert: There are currently some outstanding issues with the COMPILE_SOURCE_WITH_DEVICE_LIBS Comgr action (https://ontrack-internal.amd.com/browse/SWDEV-386072). Once these LLVM issues have been resovled, we can safely re-apply this patch
Change-Id: I8501967af8496ea50d6e4a97399e45db51bbed1e
[ROCm/clr commit: 19526e46e6]
- Acquire and Release scopes for AQL need to be balanced. These were
missing on the AMD_OPT_FLUSH=0 path.
Change-Id: Ibf4132eb96800f155d7b664359c790d68a353e60
[ROCm/clr commit: 42c9cd0320]
hipMemcpyArrayToArray, hipMemcpyFromArrayAsync, and hipMemcpyToArrayAsync
are deprecated in cuda and are missing the headers in hip_runtime_api.h.
Removed their implementation from hip_memory.cpp.
Change-Id: I9720aec6241515c56c66b7e90a37b2ed53347eb2
[ROCm/clr commit: 2bc6661737]
This is related to SWDEV-410182, but it's not enough to fix it.
Functions from device-libs are precompiled into llvm-ir in a "target agnostic" way
(in reality, it's not 100% target agnostic, which brings us many headaches).
When linking builtins (like device-libs) from the command line, we use the flag
-mlink-builtin-bitcode. The difference between regular linking of bitcode and
this flag is that the later propagates target-specific attributes. If this
attributes are not propagated, we can end up with incosistent target attributes.
Comgr provides the action AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC
for this exact reason. The old action is currently deprecated and this one should
be used.
Change-Id: I518415214debdf4fedf0b1d81456d6e9fb8a3d19
[ROCm/clr commit: f3dc04a50d]
Restore PAL platform destruction.
Update CmdAllocatorCreateInfo::AllocInfo for the new interface.
Change-Id: Iea418eed7ee26166039a4a9cc1999438856e9097
[ROCm/clr commit: bd00826446]
Use large signal pool if profiler is connected or profiling forced
enabled. This is needed to mitigate signal creation overhead when
profiling as signals are attached to every packet and deeper batch may
show overhead of signal allocation.
Change-Id: I8034b8a20b55328b87d593bf044f59672f9653e8
[ROCm/clr commit: 1ec0ba3537]
This PR fixes warning seen on HIP headers due to -Wignored-attributes.
The warning is attributes for a variable must precede definition.
This also fixes a potential issue with C-style header where templates
were declared outside __cplusplus.
Change-Id: I78bfd391717c7891afb442ef79812ea630c0ff49
[ROCm/clr commit: 8fe5100d16]
This reverts commit cab71e6e00.
Implement the right way to make ExternalSemaphores be signalled
only after prior works on the stream have been finished.
Change-Id: I9d5974e05d5f229170b928db4566c14e40e3cbaa
[ROCm/clr commit: d433df4761]
Let ExternalSemaphores be signalled only after prior works on the
stream have been finished.
Change-Id: I856917db905f68f55fdf484f5267f7fe8ea3117f
[ROCm/clr commit: 44a3935cda]
- Program unique AQL index for debugger. The logic manages AQL array of packets per HW queue.
- Provide debug state to PAL
Change-Id: I38fa1f5435fa711fd1d44dc391f2e61eb2a25efa
[ROCm/clr commit: d97cc0abbd]
We currently have __half2 made up of unsigned short instead of __half.
This prevents users to do operation seamlessly when they want to operate on individual components.
Change-Id: I856917db905f68055fdf484f526707fe8ea3117d
[ROCm/clr commit: 19afdf719e]
This patch did not consider the dicussions in SWDEV-270908
> "we found that in GeekBench5, forcing Wave64 instead of the default
> Wave32 compute policy yields big gains in every subtest except one"
This reverts commit 10e2958197.
Change-Id: Ice1728585b9d1b2c1b36a06cfa0b8c47cb2bfa49
[ROCm/clr commit: 06dcaae619]
Seems the windows linker error seen with https://gerrit-git.amd.com/c/compute/ec/clr/+/886078
also exists for linux, but only appears with dynamic linker. This fixes both usecases.
Change-Id: I1bb6df151eb1e09f75af4d461ead28bcbaa5f236
[ROCm/clr commit: 62464ca4c3]