Defining int64_t, uint64_t, int32_t, uint32_t in HIPRTC
seem to result in conflicts with some apps as they use
their own definitions for these types. NVRTC also doesn't
define these. Hence remove them to match the behavior.
Change-Id: I77ef70e846950698cb00375f5d0501b907f01fe3
[ROCm/clr commit: 629e279f72]
Make hipDeviceAttributeHostNativeAtomicSupported return
right value in Windows
Change-Id: I149a8629769cd0d5848ffefb24f1752678a685b8
[ROCm/clr commit: 1faf1a4cb9]
It seems that due to removal of vdev()->isHandlerPending(),
Marker queued to ensure finish is not enqueued and that cause
hung at waiting event for kernel enqueue command.
Change-Id: I364abb2dcb4897b11a7eb61b5d85013b69292792
[ROCm/clr commit: eecbc2e436]
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG
Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
[ROCm/clr commit: f1dc81f427]
- Report kernel names for optimized graph path
- Refactor code so that we store profiling info in Accumulate command
Change-Id: Ib97735a0239aeb9fc3a50a4bb7126dd0bcadc8af
[ROCm/clr commit: b056686607]
__smid() needs to use both HW_ID and XCC_ID for gfx940, gfx941, and
gfx942. Previously, we only did this for gfx940 and thus XCC_ID
was incorrectly not passed back on the other two architectures.
Change-Id: I9fb13b6cef3280e15463443a180174629d03f8b2
[ROCm/clr commit: a0f29b454c]
The "long" type size seems to be platform dependent, causing hash value
overflow on implementations where "long" is 4 bytes. This addresses the
scenario.
Change-Id: I4e3c0df457e35b139dcc496d832210ba2cb849ba
[ROCm/clr commit: 1f8ead914a]
Make hipExternalMemoryGetMappedMipmappedArray() accept
hipArraySurfaceLoadStore.
Make hipCreateSurfaceObject() check hipArraySurfaceLoadStore
flag. If flag is hipArrayDefault, hipCreateSurfaceObject() will
also pass to prevent failure of catch2/swissknife tests.
Change-Id: Ifb7db2db14e0c2208a9661cfa33887ec61ab26a5
[ROCm/clr commit: 1c442658ca]
For avx build, the start address of values_ buffer in KernelParameters is not
correct as it is computed based on 16-byte alignment.
Change-Id: I3b28ae02d2c9c0517d4a348d95ae8c6721bec83d
[ROCm/clr commit: c4f773db0b]
Do not populate AMDGPU_TARGETS when there is no device
detected by amdgpu-arch on the machine
Change-Id: I87fd7081f9e0b72f057aab2eb639808d2f89cfb1
[ROCm/clr commit: e36020f339]
- Do not use extra barrier to detect graph end. If its a kernel node we
can use a completion signal for the last packet. Saves roughly 6us for
Phantom testcase per graph launch.
Change-Id: I5e0c2479d9964fbeda86ed97533f6718f49a7f91
[ROCm/clr commit: c3bd229f4f]
Compiler seem to be stricter in compiler staging builds related to
safe buffer programming when compared to other component staging builds.
This seem to result in additional errors when -Werror is enabled
in MIGraphX tests.
Removes all the clang pragmas to ignore several type of warnings in all
the headers and adds a single pragma which ignores all warnings using
#pragma clang diagnostic ignored "-Weverything" in hiprtc builtins.
Change-Id: I95f302bb285b2451b19dd5dfdb7df29164b0f750
[ROCm/clr commit: b5d286a6d3]
Add support of HIP_FORCE_DEV_KERNARG under PAL.
Fix persistent memory detection for a resource view.
Change-Id: Ifb7db2db14e0c2205a9661cfa53887ec61ab26a4
[ROCm/clr commit: 5f297d75d9]
Set flag with hipCtxCreate so that get flags works.
Validate hipHostGetDevicePointer for flags!=0.
Validate mem cpy kind and accommodate new type hipMemcpyDeviceToDeviceNoCU.
Match error code for hipGetChannelDesc.
Change-Id: If09a635ac01bc53f1fe2b7df3f3f9c1b0d69a0ab
[ROCm/clr commit: 8e8fb993de]
Build process was top down Pre CLR (23.10) vs bottom up since
CLR (>=23.20) and so BUILD_SHARED_LIBS value is not being reflected
in rocclr build process since CLR. With this change, BUILD_SHARED_LIBS
is set pre rocclr compilation.
Change-Id: Ia2cd3b8148e9df2df222c1e734d927f2c029017e
[ROCm/clr commit: 1717078536]
Add __host__ and __device__ to bunch of operator/function matching CUDA
Fix some bugs seen in __hisinf
Change-Id: I9e67e3e3eb2083b463158f3e250e5221c89b2896
[ROCm/clr commit: 86bd518981]
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues
Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f
[ROCm/clr commit: 40f41f4d0b]
- Refactor code and cleanup logic for callback saving for event records
Change-Id: I5c56aa8e9c968a5bca70fb07ad1796da318e9e89
[ROCm/clr commit: 1338ff37e8]
- This matches the CUDA behavior
- The pitch and width checks removed are already covered in ihipmemcpy2D
Change-Id: I03a6921a78b5d89723830d8dde5865fdc6db0379
[ROCm/clr commit: 6d6465cb96]
Remove duplicated operators of hipComplexFloat and
hipComplexDouble.
If users need complex number multiplication and division,
they should call
hipCmulf() and hipCdivf() for hipComplexFloat,
hipCmul() and hipCdiv() for hipComplexComplex
SWDEV-428198 - Add missing operators
Add missing operators of vectors in host
Change-Id: Ie58d1642d579e7119997db49a9fd6a6641b666fd
[ROCm/clr commit: d4799b2a3f]
Move context allocation into Device::init() method to simplify the logic and handle
HIP_VISIBLE_DEVICES properly
Change-Id: I0fc6f37c7ae39bedbdad0290295d6794c66d6c54
[ROCm/clr commit: a49d633883]