+ in hipMemcpyDtoDAsync: cuMemcpyDtoD -> cuMemcpyDtoDAsync
+ in hipMemcpyDtoHAsync: cuMemcpyDtoH -> cuMemcpyDtoHAsync
P.S.
"The types CUstream and cudaStream_t are identical and may be used interchangeably", thus explicit c-like type cast is not needed, aka CUstream(stream).
1. Split hip_ir.ll to hip_hc.ll and hip_hc_gfx803.ll
a. hip_hc.ll contains arch generic ir implementations
b. hip_hc_gfx803.ll contains gfx803 (fiji, polaris) specific ir
2. HIPCC can now parse --amdgpu-target=*.
a. Usage: hipcc --amdgpu-target=gfx803 --amdgpu-target=gfx701
b. TODO: Convert to --amdgpu-target=gfx803,gfx701
3. With LLC in HCC able to generate native f16 isa, removed inline half asm math ops
4. Fixed threadfence and threadfence_block to use functions in rocdl
Change-Id: Ic9a9e3e04139b0d75d2c2a263c030ca77adc1019
1. Fixed build issues produced from previous commit
2. Create new header files to manage data structures better
Change-Id: I704d82c196c1858ed7617d76e40612eb507d2aa0
1. Added hip_texture.h to hip_runtime_api.h as cuda does declare array runtime apis inside cuda_runtime_api.h
2. Added nvcc backend for hipArray runtime apis
3. Didn't test on nvidia platform (should work)
Change-Id: I1a14aef41840e4f55e5535132e3443a918b55967
1. Added math_functions.h to hip_runtime.h
2. Changed operator overloading classifier static to static inline
3. Added vector types test for gpu
4. Seperated __host__ and __device__ for math functions in headers
Change-Id: I499862fad5d7b10da686da9011d7ecefe523f8e2
1. Moved half device functions around so that script can catch the signatures
2. Generated docs for half precision apis
Change-Id: Iee27658e3a639fdb02af135e71841dc6427f15e2
1. Commented out unsupported device math functions
2. Moved function signatures to the top of implementation snippets
3. Added script to generate markdown documentation for device math apis
4. Added the generated file from the script which should be present everytime
Change-Id: Ic579dd8b8fdffa6e1b4d4f5f3fd8a803f4dcaac7
1. Fixed compilation issues for tests
2. Added missing intrinsics + math functions
3. Disabled some device functions as they are causing linking error with HCC
Change-Id: I79d52c4c7a539cc8ef40580247ad97ffcb975f09
1. All fp32, fp64 math device/host functions should be in math_functions.h/.cpp
2. All fp32, fp64 fast math intrinsics for device/host functions should be in device_functions.h/.cpp
3. All the device code implementations should be in device_util.h/.cpp
4. Hence, made changes appropriately by moving code and creating new header files
5. Added math_functions.cpp/.h
6. Changed #ifndef signature to make sure no conflicts between headers with same names in hip/hip_runtime.h and hip/hcc_detail/hip_runtime.h
7. Changed tests to fit the code changes, making them to include appropriate headers
8. Added math_functions.cpp to CMakeLists.txt
9. Some of the tests are still broken, mostly host math functions will fix them in next commit
10. TODO: FIX compilation issues for host math functions
Change-Id: I7a17637d7e294a7d224ffba932c1a08668febd26
1. Added usad, umulhi, urhadd
2. Corrected implementation of __hadd, __hradd
3. TODO: __sad(). It gets tricky as ISA sees them as unsigned
Change-Id: Ibd2c2133b462f9393f3990355706386c79256bba
1. Fixed build issues with new Integer intrinsics
2. Changed tests to work exactly as CUDA code
3. Still some integer intrinsics need to be supported
Change-Id: Ie6f4171259cf4da517436895d4f6f01e01f59b11
1. As we use holder data structure, we move all the cmp, math, cvt apis to cpp file
2. All the tests passed
3. Add more extensive testing for half
Change-Id: I92c6399dace602a0a24432728e3f2a07124e6fb1
1. Added all type conversion intrinsics
2. NO TESTS have been added. (Will add in next commit)
3. Sanatized code in hip_runtime.h
4. Added passed() to hipTestHalf to make it pass on HIT
Change-Id: I0987963c802fc7ff4d7e07d7b88d86da35da53c9
1. They use SDWA + LLVM IR
2. Added these functions to test
3. Need to do exp, exp10, log, log10, rint
Change-Id: I06176acc6cb8bb054495310531777406a41b54e4
1. Added math functions for half precision
2. HRCP is not available due to device code linking errors, will be enabled once it is fixed
3. Added math functions to half test file
Change-Id: Ie317ce70ef518a4fc3f27142143d01e0327f5df3
1. Added SDWA implementation inside IR file
2. Added device functions to header + used them in test
Change-Id: Ib4e059a58eee201cc82438689e3e9bc5f9d26653
1. Removed HIP_EXPERIMENTAL env variable so that device code will be accessed from LLVM IR
2. Removed soft support from headers and moved to hip_fp16.cpp
3. Added LLVM IR + inline asm to hip_ir.ll
4. Added test for fp16
5. Added barriers for hcc 3.5 and hcc 4.0 for half support
a. Which means, hcc 4.0 can parse __fp16 but hcc 3.5 cant
b. HCC 4.0 code is implemented now, hcc 3.5 will be added later
Change-Id: Ic37859b2688ebb02e168bab643d1882bf4727952