1. Added usad, umulhi, urhadd
2. Corrected implementation of __hadd, __hradd
3. TODO: __sad(). It gets tricky as ISA sees them as unsigned
Change-Id: Ibd2c2133b462f9393f3990355706386c79256bba
1. Fixed build issues with new Integer intrinsics
2. Changed tests to work exactly as CUDA code
3. Still some integer intrinsics need to be supported
Change-Id: Ie6f4171259cf4da517436895d4f6f01e01f59b11
1. As we use holder data structure, we move all the cmp, math, cvt apis to cpp file
2. All the tests passed
3. Add more extensive testing for half
Change-Id: I92c6399dace602a0a24432728e3f2a07124e6fb1
1. Added all type conversion intrinsics
2. NO TESTS have been added. (Will add in next commit)
3. Sanatized code in hip_runtime.h
4. Added passed() to hipTestHalf to make it pass on HIT
Change-Id: I0987963c802fc7ff4d7e07d7b88d86da35da53c9
1. They use SDWA + LLVM IR
2. Added these functions to test
3. Need to do exp, exp10, log, log10, rint
Change-Id: I06176acc6cb8bb054495310531777406a41b54e4
1. Added math functions for half precision
2. HRCP is not available due to device code linking errors, will be enabled once it is fixed
3. Added math functions to half test file
Change-Id: Ie317ce70ef518a4fc3f27142143d01e0327f5df3
1. Added SDWA implementation inside IR file
2. Added device functions to header + used them in test
Change-Id: Ib4e059a58eee201cc82438689e3e9bc5f9d26653
1. Removed HIP_EXPERIMENTAL env variable so that device code will be accessed from LLVM IR
2. Removed soft support from headers and moved to hip_fp16.cpp
3. Added LLVM IR + inline asm to hip_ir.ll
4. Added test for fp16
5. Added barriers for hcc 3.5 and hcc 4.0 for half support
a. Which means, hcc 4.0 can parse __fp16 but hcc 3.5 cant
b. HCC 4.0 code is implemented now, hcc 3.5 will be added later
Change-Id: Ic37859b2688ebb02e168bab643d1882bf4727952
- hipFunction_t is now returned by value. This eliminates dynamic
allocation / memory management complexity in the module. Removed
the kernel
name so the structure is just 16 bytes now.
- Moved the hsa_executable_load_module and hsa_executable_freeze
calls to the hipModuleLoad and hipModuleLoadData calls.
- Apply sharedMemBytes in hipModuleLaunchKernel to group segment
size (not private).
Move HIP_COHERENT_HOST_ALLOC so it is read once at init time.
Add HIP_LAUNCH_BLOCKING_KERNELS, HIP_API_BLOCKING.
Update docs on debug and chicken bits.
Conflicts:
src/hip_hcc.cpp
1. Use -DHIP_FAST_MATH to make precise math functions compiled to fast math
2. Added double fast math functions for sqrt
3. Changed hipcc to parse -use_fast_math (not working)
4. Added passed tag to hipFloatMath test
Change-Id: I72884b2436b4efe61e9a9297346c1358fee38a2d
1. Added fast math intrinsics for single precision data types
2. Added test to check the intrinsics
3. Added HIP_PRECISE_MATH macro to enable precise math on fast math
Change-Id: Iadacbb6182c31252c5e3252854372d1b80dfd27b
1. Added fast math apis for sin, cos, tan, sincos
2. Added test for trig math functions
3. Added logarithm fast math
4. Changed how hipGetDevice, hipDeviceGetCacheConfig emit errors
Change-Id: Ie6ab594ddd5853cbe85e39a2f6d3479a807fa323
1. Changed test macro to emit line numbers
2. Added getcacheconfig api test for nvcc path
3. Fixed hipFuncCache_t data type
TODO: With this commit, right now there are 2 func cache datatypes
a. hipFuncCache_t for runtime API
b. hipFuncCache for driver API
Map these to a single data type
Change-Id: Ia47c9f5d7c2633638051bf17b1103048a1ede973