Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.
* use cache for already compiled code objects
* address review comments and use the two spirv isa names
Fix pytorch 2.5 issues, by defining reduce sync operations for type __half in amd_hip_fp16.h and not in
amd_warp_sync_functions.h which is problematic in case __half does not get included before that header.
Only define types not supported by cuda if HIP_ENABLE_EXTRA_WARP_SYNC_TYPES is defined, to avoid portability issues
Explicitly nulling the pointer causes us to report the error below
instead of keeping a dangling pointer around that will most likely lead
to a subsequent segfault.
This now has host conversions too, which is directly from Christopher's
work on fcbx.
Signed-off-by: Christopher M. Riedl
* add const to func parameter
* do not depend on builtins, use gfx950 detection
Currently, we check if there's enough system RAM even if we don't allocate on host device. This is incorrect logic.
We should not check for this size on windows because PAL checks for memory allocation. See SWDEV-467263.
Co-authored-by: Jimbo Xie <jiabaxie@amd.com>
Also removes asserts in cooperative groups shfl functions since
__hip_bfloat16 shfl is present now
Change-Id: I57578b6e68dccc10c2ddcd194e9cc18bc7732ce1
This reverts commit 57df1b348f.
Reason for revert: 6.4 Preview changes need not be merged to amd-staging as of now
Change-Id: I86452adfed14655f72d90440a486089743cc6587
This reverts commit c07468e53c.
Reason for revert: 6.4 Preview changes need not be merged to amd-staging as of now
Change-Id: Ifba0c8a248bc40deaa9c59b7f2901531300e5ea4
This reverts commit 9faaf20aae.
Reason for revert: 6.4 Preview changes need not be merged to amd-staging as of now
Change-Id: I04af8603053338f08c396e78ff8a6715e641ca19
Using target_link_libraries does not properly link the hipRTC-header.o
into libhiprtc for static build. Change to use target_sources instead.
This does not affect the linkage in the shared build.
Change-Id: I626f9eacc1637b792a50e7ddddb5db09e704ac4a
Also part of SWDEV-510994.
1. Fix atomicMin/Max_system() for float and double.
2. Remove logics of gfx941 which isn't supported.
Change-Id: Iacfdc1bc13e8da2f5df8751bb315b37d33cea667
- hipStreamWaitEvent may not resolve streams
- Correct usage of flag passed to streamWait function
Change-Id: I2ee163615d303b98937c1035d60da283cce6f677
- This change tries to save extra synchronization packets we may insert
as we didnt track the completion signals for every command. We track
the current enqueued command until it exits the enqueue stage. We also
record the exit scope to know if we flushed the caches
- Handle correct release scopes and store completion signal as HW events
- Use a new finishCommand implementation to only wait for the command
passed as the argument
Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc
Since hipMemMap can be called for multiple device handles on the same virtual memory, the same is true for hipMemUnmap, meaning that virtual memory can be "partially unmapped".
This means that the unmap function can be called for a specific part of the reserved address, meaning that only the designated subbuffer should be released. If unmap is called on the entire reserved memory, then all subbuffers should be released.
The main point is that for every hsa_amd_vmem_map, there should be a corresponding hsa_amd_vmem_unmap. Otherwise, if entire memory is unmapped by a single unmap call, then HSA will report the memory as "in use" if an attempt is made to delete it.
Change-Id: I039308eafb820decfb1c09f60347f26cdad1a362
1) Add Linker APIs to runtime to support SPIRV linking
2) Migrate Internal implementations to runtime and share with rtc
3) Add Support to bundled and unbundled SPIRV Code object linking.
Change-Id: Ic1fd4431f842a208a2468e8aec54a65b5fa6b0e3
This change removes the stream callback from hipStreamWaitEvent and
uses a stream memory wait operation instead. This allows the
hipStreamWaitEvent to be non-blocking on the host.
Change-Id: Ie5530febda5a5bcb5daa0db8a01249d6b137fd43
Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.
Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500
- Use correct header in device_library_decl
- use std:: instead of __hip_internal:: for host compilation
- hide device specific stuff behind __clang__ and __HIP__ check
Change-Id: I2f3647e00555ed0e79f9954a459c41394c3cd49b