- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG
Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
[ROCm/clr commit: f1dc81f427]
- Report kernel names for optimized graph path
- Refactor code so that we store profiling info in Accumulate command
Change-Id: Ib97735a0239aeb9fc3a50a4bb7126dd0bcadc8af
[ROCm/clr commit: b056686607]
__smid() needs to use both HW_ID and XCC_ID for gfx940, gfx941, and
gfx942. Previously, we only did this for gfx940 and thus XCC_ID
was incorrectly not passed back on the other two architectures.
Change-Id: I9fb13b6cef3280e15463443a180174629d03f8b2
[ROCm/clr commit: a0f29b454c]
The "long" type size seems to be platform dependent, causing hash value
overflow on implementations where "long" is 4 bytes. This addresses the
scenario.
Change-Id: I4e3c0df457e35b139dcc496d832210ba2cb849ba
[ROCm/clr commit: 1f8ead914a]
Make hipExternalMemoryGetMappedMipmappedArray() accept
hipArraySurfaceLoadStore.
Make hipCreateSurfaceObject() check hipArraySurfaceLoadStore
flag. If flag is hipArrayDefault, hipCreateSurfaceObject() will
also pass to prevent failure of catch2/swissknife tests.
Change-Id: Ifb7db2db14e0c2208a9661cfa33887ec61ab26a5
[ROCm/clr commit: 1c442658ca]
For avx build, the start address of values_ buffer in KernelParameters is not
correct as it is computed based on 16-byte alignment.
Change-Id: I3b28ae02d2c9c0517d4a348d95ae8c6721bec83d
[ROCm/clr commit: c4f773db0b]
Do not populate AMDGPU_TARGETS when there is no device
detected by amdgpu-arch on the machine
Change-Id: I87fd7081f9e0b72f057aab2eb639808d2f89cfb1
[ROCm/clr commit: e36020f339]
- Do not use extra barrier to detect graph end. If its a kernel node we
can use a completion signal for the last packet. Saves roughly 6us for
Phantom testcase per graph launch.
Change-Id: I5e0c2479d9964fbeda86ed97533f6718f49a7f91
[ROCm/clr commit: c3bd229f4f]
Compiler seem to be stricter in compiler staging builds related to
safe buffer programming when compared to other component staging builds.
This seem to result in additional errors when -Werror is enabled
in MIGraphX tests.
Removes all the clang pragmas to ignore several type of warnings in all
the headers and adds a single pragma which ignores all warnings using
#pragma clang diagnostic ignored "-Weverything" in hiprtc builtins.
Change-Id: I95f302bb285b2451b19dd5dfdb7df29164b0f750
[ROCm/clr commit: b5d286a6d3]
Add support of HIP_FORCE_DEV_KERNARG under PAL.
Fix persistent memory detection for a resource view.
Change-Id: Ifb7db2db14e0c2205a9661cfa53887ec61ab26a4
[ROCm/clr commit: 5f297d75d9]
Set flag with hipCtxCreate so that get flags works.
Validate hipHostGetDevicePointer for flags!=0.
Validate mem cpy kind and accommodate new type hipMemcpyDeviceToDeviceNoCU.
Match error code for hipGetChannelDesc.
Change-Id: If09a635ac01bc53f1fe2b7df3f3f9c1b0d69a0ab
[ROCm/clr commit: 8e8fb993de]
Build process was top down Pre CLR (23.10) vs bottom up since
CLR (>=23.20) and so BUILD_SHARED_LIBS value is not being reflected
in rocclr build process since CLR. With this change, BUILD_SHARED_LIBS
is set pre rocclr compilation.
Change-Id: Ia2cd3b8148e9df2df222c1e734d927f2c029017e
[ROCm/clr commit: 1717078536]
Add __host__ and __device__ to bunch of operator/function matching CUDA
Fix some bugs seen in __hisinf
Change-Id: I9e67e3e3eb2083b463158f3e250e5221c89b2896
[ROCm/clr commit: 86bd518981]
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues
Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f
[ROCm/clr commit: 40f41f4d0b]
- Refactor code and cleanup logic for callback saving for event records
Change-Id: I5c56aa8e9c968a5bca70fb07ad1796da318e9e89
[ROCm/clr commit: 1338ff37e8]
- This matches the CUDA behavior
- The pitch and width checks removed are already covered in ihipmemcpy2D
Change-Id: I03a6921a78b5d89723830d8dde5865fdc6db0379
[ROCm/clr commit: 6d6465cb96]
Remove duplicated operators of hipComplexFloat and
hipComplexDouble.
If users need complex number multiplication and division,
they should call
hipCmulf() and hipCdivf() for hipComplexFloat,
hipCmul() and hipCdiv() for hipComplexComplex
SWDEV-428198 - Add missing operators
Add missing operators of vectors in host
Change-Id: Ie58d1642d579e7119997db49a9fd6a6641b666fd
[ROCm/clr commit: d4799b2a3f]
Move context allocation into Device::init() method to simplify the logic and handle
HIP_VISIBLE_DEVICES properly
Change-Id: I0fc6f37c7ae39bedbdad0290295d6794c66d6c54
[ROCm/clr commit: a49d633883]
- Support graph with different types of nodes with single
branch when DEBUG_CLR_GRAPH_PACKET_CAPTURE flag is enabled
Change-Id: I149a8629769cd0d5849ffefb04f1352668a685b6
[ROCm/clr commit: 38d2c56784]
- Address corner cases that can arise with the new
hipMemcpyDeviceToDeviceNoCU enum
- Better log
Change-Id: I6035b901f8d616741054b7a5ff4f67956329ac57
[ROCm/clr commit: 5662d4037c]
If we submit a systemScope Barrier, we should reset this flag as there
is no need for dispatch AQL again to flush caches/HDP
Change-Id: I55710feb4ba6650852e785b5cadfa64c6b9ce14e
[ROCm/clr commit: de2b06a7a7]