This is related to SWDEV-410182, but it's not enough to fix it.
Functions from device-libs are precompiled into llvm-ir in a "target agnostic" way
(in reality, it's not 100% target agnostic, which brings us many headaches).
When linking builtins (like device-libs) from the command line, we use the flag
-mlink-builtin-bitcode. The difference between regular linking of bitcode and
this flag is that the later propagates target-specific attributes. If this
attributes are not propagated, we can end up with incosistent target attributes.
Comgr provides the action AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC
for this exact reason. The old action is currently deprecated and this one should
be used.
Change-Id: I518415214debdf4fedf0b1d81456d6e9fb8a3d19
[ROCm/clr commit: f3dc04a50d]
Restore PAL platform destruction.
Update CmdAllocatorCreateInfo::AllocInfo for the new interface.
Change-Id: Iea418eed7ee26166039a4a9cc1999438856e9097
[ROCm/clr commit: bd00826446]
Use large signal pool if profiler is connected or profiling forced
enabled. This is needed to mitigate signal creation overhead when
profiling as signals are attached to every packet and deeper batch may
show overhead of signal allocation.
Change-Id: I8034b8a20b55328b87d593bf044f59672f9653e8
[ROCm/clr commit: 1ec0ba3537]
This PR fixes warning seen on HIP headers due to -Wignored-attributes.
The warning is attributes for a variable must precede definition.
This also fixes a potential issue with C-style header where templates
were declared outside __cplusplus.
Change-Id: I78bfd391717c7891afb442ef79812ea630c0ff49
[ROCm/clr commit: 8fe5100d16]
This reverts commit cab71e6e00.
Implement the right way to make ExternalSemaphores be signalled
only after prior works on the stream have been finished.
Change-Id: I9d5974e05d5f229170b928db4566c14e40e3cbaa
[ROCm/clr commit: d433df4761]
Let ExternalSemaphores be signalled only after prior works on the
stream have been finished.
Change-Id: I856917db905f68f55fdf484f5267f7fe8ea3117f
[ROCm/clr commit: 44a3935cda]
- Program unique AQL index for debugger. The logic manages AQL array of packets per HW queue.
- Provide debug state to PAL
Change-Id: I38fa1f5435fa711fd1d44dc391f2e61eb2a25efa
[ROCm/clr commit: d97cc0abbd]
We currently have __half2 made up of unsigned short instead of __half.
This prevents users to do operation seamlessly when they want to operate on individual components.
Change-Id: I856917db905f68055fdf484f526707fe8ea3117d
[ROCm/clr commit: 19afdf719e]
This patch did not consider the dicussions in SWDEV-270908
> "we found that in GeekBench5, forcing Wave64 instead of the default
> Wave32 compute policy yields big gains in every subtest except one"
This reverts commit 10e2958197.
Change-Id: Ice1728585b9d1b2c1b36a06cfa0b8c47cb2bfa49
[ROCm/clr commit: 06dcaae619]
Seems the windows linker error seen with https://gerrit-git.amd.com/c/compute/ec/clr/+/886078
also exists for linux, but only appears with dynamic linker. This fixes both usecases.
Change-Id: I1bb6df151eb1e09f75af4d461ead28bcbaa5f236
[ROCm/clr commit: 62464ca4c3]
This reverts commit 41fe945a7a.
HIP_PATH is still needed on windows as amd_build.py injects
depedency to HIP_PATH variable in hip-targets.cmake
Change-Id: I57d9aff3e8046a7381affaf5adc5b130d5702190
[ROCm/clr commit: 5262693da0]
- Return error code instead of segfault during kernel launch when library isn't built with current gpu arch
- Fix string print bug
Change-Id: I91aaa50150fee220317cd2eb4e075d825cc448b2
[ROCm/clr commit: 76c8b3157b]