The new copy kernel can limit the number of launched workgoups.
It can copy in chunks of 16 bytes or 4 bytes.
Workgoup size is increased to 512 or 1024
Change-Id: Ic3fefa2d5bda6afebd1acc4d41ad310b138af6df
Defining int64_t, uint64_t, int32_t, uint32_t in HIPRTC
seem to result in conflicts with some apps as they use
their own definitions for these types. NVRTC also doesn't
define these. Hence remove them to match the behavior.
Change-Id: I77ef70e846950698cb00375f5d0501b907f01fe3
It seems that due to removal of vdev()->isHandlerPending(),
Marker queued to ensure finish is not enqueued and that cause
hung at waiting event for kernel enqueue command.
Change-Id: I364abb2dcb4897b11a7eb61b5d85013b69292792
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG
Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
- Report kernel names for optimized graph path
- Refactor code so that we store profiling info in Accumulate command
Change-Id: Ib97735a0239aeb9fc3a50a4bb7126dd0bcadc8af
__smid() needs to use both HW_ID and XCC_ID for gfx940, gfx941, and
gfx942. Previously, we only did this for gfx940 and thus XCC_ID
was incorrectly not passed back on the other two architectures.
Change-Id: I9fb13b6cef3280e15463443a180174629d03f8b2
The "long" type size seems to be platform dependent, causing hash value
overflow on implementations where "long" is 4 bytes. This addresses the
scenario.
Change-Id: I4e3c0df457e35b139dcc496d832210ba2cb849ba
Make hipExternalMemoryGetMappedMipmappedArray() accept
hipArraySurfaceLoadStore.
Make hipCreateSurfaceObject() check hipArraySurfaceLoadStore
flag. If flag is hipArrayDefault, hipCreateSurfaceObject() will
also pass to prevent failure of catch2/swissknife tests.
Change-Id: Ifb7db2db14e0c2208a9661cfa33887ec61ab26a5
For avx build, the start address of values_ buffer in KernelParameters is not
correct as it is computed based on 16-byte alignment.
Change-Id: I3b28ae02d2c9c0517d4a348d95ae8c6721bec83d
- Do not use extra barrier to detect graph end. If its a kernel node we
can use a completion signal for the last packet. Saves roughly 6us for
Phantom testcase per graph launch.
Change-Id: I5e0c2479d9964fbeda86ed97533f6718f49a7f91
Compiler seem to be stricter in compiler staging builds related to
safe buffer programming when compared to other component staging builds.
This seem to result in additional errors when -Werror is enabled
in MIGraphX tests.
Removes all the clang pragmas to ignore several type of warnings in all
the headers and adds a single pragma which ignores all warnings using
#pragma clang diagnostic ignored "-Weverything" in hiprtc builtins.
Change-Id: I95f302bb285b2451b19dd5dfdb7df29164b0f750
Add support of HIP_FORCE_DEV_KERNARG under PAL.
Fix persistent memory detection for a resource view.
Change-Id: Ifb7db2db14e0c2205a9661cfa53887ec61ab26a4
Set flag with hipCtxCreate so that get flags works.
Validate hipHostGetDevicePointer for flags!=0.
Validate mem cpy kind and accommodate new type hipMemcpyDeviceToDeviceNoCU.
Match error code for hipGetChannelDesc.
Change-Id: If09a635ac01bc53f1fe2b7df3f3f9c1b0d69a0ab
Build process was top down Pre CLR (23.10) vs bottom up since
CLR (>=23.20) and so BUILD_SHARED_LIBS value is not being reflected
in rocclr build process since CLR. With this change, BUILD_SHARED_LIBS
is set pre rocclr compilation.
Change-Id: Ia2cd3b8148e9df2df222c1e734d927f2c029017e
Add __host__ and __device__ to bunch of operator/function matching CUDA
Fix some bugs seen in __hisinf
Change-Id: I9e67e3e3eb2083b463158f3e250e5221c89b2896
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues
Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f
- This matches the CUDA behavior
- The pitch and width checks removed are already covered in ihipmemcpy2D
Change-Id: I03a6921a78b5d89723830d8dde5865fdc6db0379
Remove duplicated operators of hipComplexFloat and
hipComplexDouble.
If users need complex number multiplication and division,
they should call
hipCmulf() and hipCdivf() for hipComplexFloat,
hipCmul() and hipCdiv() for hipComplexComplex
SWDEV-428198 - Add missing operators
Add missing operators of vectors in host
Change-Id: Ie58d1642d579e7119997db49a9fd6a6641b666fd
Move context allocation into Device::init() method to simplify the logic and handle
HIP_VISIBLE_DEVICES properly
Change-Id: I0fc6f37c7ae39bedbdad0290295d6794c66d6c54