__smid() needs to use both HW_ID and XCC_ID for gfx940, gfx941, and
gfx942. Previously, we only did this for gfx940 and thus XCC_ID
was incorrectly not passed back on the other two architectures.
Change-Id: I9fb13b6cef3280e15463443a180174629d03f8b2
[ROCm/clr commit: a0f29b454c]
The "long" type size seems to be platform dependent, causing hash value
overflow on implementations where "long" is 4 bytes. This addresses the
scenario.
Change-Id: I4e3c0df457e35b139dcc496d832210ba2cb849ba
[ROCm/clr commit: 1f8ead914a]
Make hipExternalMemoryGetMappedMipmappedArray() accept
hipArraySurfaceLoadStore.
Make hipCreateSurfaceObject() check hipArraySurfaceLoadStore
flag. If flag is hipArrayDefault, hipCreateSurfaceObject() will
also pass to prevent failure of catch2/swissknife tests.
Change-Id: Ifb7db2db14e0c2208a9661cfa33887ec61ab26a5
[ROCm/clr commit: 1c442658ca]
For avx build, the start address of values_ buffer in KernelParameters is not
correct as it is computed based on 16-byte alignment.
Change-Id: I3b28ae02d2c9c0517d4a348d95ae8c6721bec83d
[ROCm/clr commit: c4f773db0b]
Do not populate AMDGPU_TARGETS when there is no device
detected by amdgpu-arch on the machine
Change-Id: I87fd7081f9e0b72f057aab2eb639808d2f89cfb1
[ROCm/clr commit: e36020f339]
- Do not use extra barrier to detect graph end. If its a kernel node we
can use a completion signal for the last packet. Saves roughly 6us for
Phantom testcase per graph launch.
Change-Id: I5e0c2479d9964fbeda86ed97533f6718f49a7f91
[ROCm/clr commit: c3bd229f4f]
Compiler seem to be stricter in compiler staging builds related to
safe buffer programming when compared to other component staging builds.
This seem to result in additional errors when -Werror is enabled
in MIGraphX tests.
Removes all the clang pragmas to ignore several type of warnings in all
the headers and adds a single pragma which ignores all warnings using
#pragma clang diagnostic ignored "-Weverything" in hiprtc builtins.
Change-Id: I95f302bb285b2451b19dd5dfdb7df29164b0f750
[ROCm/clr commit: b5d286a6d3]
Add support of HIP_FORCE_DEV_KERNARG under PAL.
Fix persistent memory detection for a resource view.
Change-Id: Ifb7db2db14e0c2205a9661cfa53887ec61ab26a4
[ROCm/clr commit: 5f297d75d9]
Set flag with hipCtxCreate so that get flags works.
Validate hipHostGetDevicePointer for flags!=0.
Validate mem cpy kind and accommodate new type hipMemcpyDeviceToDeviceNoCU.
Match error code for hipGetChannelDesc.
Change-Id: If09a635ac01bc53f1fe2b7df3f3f9c1b0d69a0ab
[ROCm/clr commit: 8e8fb993de]
Build process was top down Pre CLR (23.10) vs bottom up since
CLR (>=23.20) and so BUILD_SHARED_LIBS value is not being reflected
in rocclr build process since CLR. With this change, BUILD_SHARED_LIBS
is set pre rocclr compilation.
Change-Id: Ia2cd3b8148e9df2df222c1e734d927f2c029017e
[ROCm/clr commit: 1717078536]
Add __host__ and __device__ to bunch of operator/function matching CUDA
Fix some bugs seen in __hisinf
Change-Id: I9e67e3e3eb2083b463158f3e250e5221c89b2896
[ROCm/clr commit: 86bd518981]
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues
Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f
[ROCm/clr commit: 40f41f4d0b]
- Refactor code and cleanup logic for callback saving for event records
Change-Id: I5c56aa8e9c968a5bca70fb07ad1796da318e9e89
[ROCm/clr commit: 1338ff37e8]
- This matches the CUDA behavior
- The pitch and width checks removed are already covered in ihipmemcpy2D
Change-Id: I03a6921a78b5d89723830d8dde5865fdc6db0379
[ROCm/clr commit: 6d6465cb96]
Remove duplicated operators of hipComplexFloat and
hipComplexDouble.
If users need complex number multiplication and division,
they should call
hipCmulf() and hipCdivf() for hipComplexFloat,
hipCmul() and hipCdiv() for hipComplexComplex
SWDEV-428198 - Add missing operators
Add missing operators of vectors in host
Change-Id: Ie58d1642d579e7119997db49a9fd6a6641b666fd
[ROCm/clr commit: d4799b2a3f]
Move context allocation into Device::init() method to simplify the logic and handle
HIP_VISIBLE_DEVICES properly
Change-Id: I0fc6f37c7ae39bedbdad0290295d6794c66d6c54
[ROCm/clr commit: a49d633883]
- Support graph with different types of nodes with single
branch when DEBUG_CLR_GRAPH_PACKET_CAPTURE flag is enabled
Change-Id: I149a8629769cd0d5849ffefb04f1352668a685b6
[ROCm/clr commit: 38d2c56784]
- Address corner cases that can arise with the new
hipMemcpyDeviceToDeviceNoCU enum
- Better log
Change-Id: I6035b901f8d616741054b7a5ff4f67956329ac57
[ROCm/clr commit: 5662d4037c]
If we submit a systemScope Barrier, we should reset this flag as there
is no need for dispatch AQL again to flush caches/HDP
Change-Id: I55710feb4ba6650852e785b5cadfa64c6b9ce14e
[ROCm/clr commit: de2b06a7a7]
Add hipMemcpyDeviceToDeviceNoCU to force a non blit copy path. This
helps in cases where an app may determine that CU may be busy and copies
with SDMA may be quicker.
Change-Id: I59b415dd8f6022c244e8d75f265464d5c635df1e
[ROCm/clr commit: f316a30e5d]
Fix wrong logic to get layer index;
Make layered image's layout match cuda spec;
Fix wrong comparision of element size.
Remove amd::BufferRect from ihipMemcpyAtoHCommand()
and ihipMemcpyHtoACommand().
Change-Id: Icc6a4233fbce2e9b2dc6feb79e6bfbd761684c7d
[ROCm/clr commit: 5a0085e516]
Pass LUID infromation from PAL to HIP device properties.
NodeMask is calculated based on the order of devices in PAL
Change-Id: Ic53707b68b16f392e5284a0105bf1f3941b46653
[ROCm/clr commit: 694fe4bedb]
Remove duplicated operators of hipComplexFloat and
hipComplexDouble.
If users need complex number multiplication and division,
they should call
hipCmulf() and hipCdivf() for hipComplexFloat,
hipCmul() and hipCdiv() for hipComplexComplex
Change-Id: Ie58d1642d578e7118997db40a9fd6a6641b666fd
[ROCm/clr commit: 0ca61130a2]