1. Fixed build issues produced from previous commit
2. Create new header files to manage data structures better
Change-Id: I704d82c196c1858ed7617d76e40612eb507d2aa0
[ROCm/clr commit: 5b2d4c0e60]
1. Added hip_texture.h to hip_runtime_api.h as cuda does declare array runtime apis inside cuda_runtime_api.h
2. Added nvcc backend for hipArray runtime apis
3. Didn't test on nvidia platform (should work)
Change-Id: I1a14aef41840e4f55e5535132e3443a918b55967
[ROCm/clr commit: a7fa600176]
1. Added math_functions.h to hip_runtime.h
2. Changed operator overloading classifier static to static inline
3. Added vector types test for gpu
4. Seperated __host__ and __device__ for math functions in headers
Change-Id: I499862fad5d7b10da686da9011d7ecefe523f8e2
[ROCm/clr commit: 02190736e3]
1. Moved half device functions around so that script can catch the signatures
2. Generated docs for half precision apis
Change-Id: Iee27658e3a639fdb02af135e71841dc6427f15e2
[ROCm/clr commit: 706a032a29]
1. Commented out unsupported device math functions
2. Moved function signatures to the top of implementation snippets
3. Added script to generate markdown documentation for device math apis
4. Added the generated file from the script which should be present everytime
Change-Id: Ic579dd8b8fdffa6e1b4d4f5f3fd8a803f4dcaac7
[ROCm/clr commit: 3d4dcee35d]
1. Fixed compilation issues for tests
2. Added missing intrinsics + math functions
3. Disabled some device functions as they are causing linking error with HCC
Change-Id: I79d52c4c7a539cc8ef40580247ad97ffcb975f09
[ROCm/clr commit: 41a46effef]
1. All fp32, fp64 math device/host functions should be in math_functions.h/.cpp
2. All fp32, fp64 fast math intrinsics for device/host functions should be in device_functions.h/.cpp
3. All the device code implementations should be in device_util.h/.cpp
4. Hence, made changes appropriately by moving code and creating new header files
5. Added math_functions.cpp/.h
6. Changed #ifndef signature to make sure no conflicts between headers with same names in hip/hip_runtime.h and hip/hcc_detail/hip_runtime.h
7. Changed tests to fit the code changes, making them to include appropriate headers
8. Added math_functions.cpp to CMakeLists.txt
9. Some of the tests are still broken, mostly host math functions will fix them in next commit
10. TODO: FIX compilation issues for host math functions
Change-Id: I7a17637d7e294a7d224ffba932c1a08668febd26
[ROCm/clr commit: d23b6b8694]
1. Added usad, umulhi, urhadd
2. Corrected implementation of __hadd, __hradd
3. TODO: __sad(). It gets tricky as ISA sees them as unsigned
Change-Id: Ibd2c2133b462f9393f3990355706386c79256bba
[ROCm/clr commit: 9ca135ac2e]
1. Fixed build issues with new Integer intrinsics
2. Changed tests to work exactly as CUDA code
3. Still some integer intrinsics need to be supported
Change-Id: Ie6f4171259cf4da517436895d4f6f01e01f59b11
[ROCm/clr commit: f0ea51c786]
1. As we use holder data structure, we move all the cmp, math, cvt apis to cpp file
2. All the tests passed
3. Add more extensive testing for half
Change-Id: I92c6399dace602a0a24432728e3f2a07124e6fb1
[ROCm/clr commit: e95456eee8]
1. Added all type conversion intrinsics
2. NO TESTS have been added. (Will add in next commit)
3. Sanatized code in hip_runtime.h
4. Added passed() to hipTestHalf to make it pass on HIT
Change-Id: I0987963c802fc7ff4d7e07d7b88d86da35da53c9
[ROCm/clr commit: d496576b55]
1. They use SDWA + LLVM IR
2. Added these functions to test
3. Need to do exp, exp10, log, log10, rint
Change-Id: I06176acc6cb8bb054495310531777406a41b54e4
[ROCm/clr commit: eff68c989a]
1. Added math functions for half precision
2. HRCP is not available due to device code linking errors, will be enabled once it is fixed
3. Added math functions to half test file
Change-Id: Ie317ce70ef518a4fc3f27142143d01e0327f5df3
[ROCm/clr commit: fe38e9652b]
1. Added SDWA implementation inside IR file
2. Added device functions to header + used them in test
Change-Id: Ib4e059a58eee201cc82438689e3e9bc5f9d26653
[ROCm/clr commit: eeef055469]
1. Removed HIP_EXPERIMENTAL env variable so that device code will be accessed from LLVM IR
2. Removed soft support from headers and moved to hip_fp16.cpp
3. Added LLVM IR + inline asm to hip_ir.ll
4. Added test for fp16
5. Added barriers for hcc 3.5 and hcc 4.0 for half support
a. Which means, hcc 4.0 can parse __fp16 but hcc 3.5 cant
b. HCC 4.0 code is implemented now, hcc 3.5 will be added later
Change-Id: Ic37859b2688ebb02e168bab643d1882bf4727952
[ROCm/clr commit: c286bf6f8a]
All are marked as HIP_UNSUPPORTED.
IMPORTANT:
1. libraryPropertyType_t has no cuda prefix. => TO_DO: new matcher is needed.
2. all libraries (cublas, cufft, cusolver, cusparse, nvgraph) have started to use these types (since 8.0).
[ROCm/clr commit: fd0c56a767]
Includes some tricky manipulation of the locks for contexts and streams.
issue is that stealing a stream requires we lock the context to
walk the streams to find a victim. To avoid deadlock, we can't
have a stream locked when we lock the context. This implementation
releases the stream lock, then acquires the context and selects the
victim.
A more stable implemenation might be to copy the stream list
from a context so that a lock is not required to walk all streams.
Smart shared_ptr could be used to prevent the streams from being
deallocated during the walk.
[ROCm/clr commit: b29fbf736d]