Squashed commit of the following:
commit c111b5bd10d7c2a5b0b1ad8b07f6e81185b47b39
Author: Wen-Heng (Jack) Chung <whchung@gmail.com>
Date: Sat Mar 4 17:06:46 2017 +0800
Use __device__ for all variables and functions to be used in kernel path
Abolish __device and adopt [[hc]] in HIP implementation, so __device__ can be
used on all HIP applications, no matter they are variables or functions.
Change-Id: I20ca25857ce3bc3e42a5ebf65cafea2c8492f4c7
commit 30c0e4e4701bbf6bd9a7182e0320a71ff73d3a83
Author: Wen-Heng (Jack) Chung <whchung@gmail.com>
Date: Thu Mar 2 12:14:11 2017 +0800
XXX FIXME get around LDS spills caused in Promote-free HCC
hipDynamicShared2 uses all 64KB of LDS for computation. But in Promote-free HCC
there are cases where LDS spills would occur, which would make the test case to
hang.
In this workaround commit we reduce the size of dynamic LDS used to get around
this known issue, and will revert this commit when LDS spills are resolved in
HCC.
Change-Id: If648b36200a4f9143951a8129192bcb7ed0bef5e
commit e803173be2d73e2f132a7ff7f61e7a20b4083d34
Author: Wen-Heng (Jack) Chung <whchung@gmail.com>
Date: Wed Mar 1 21:41:41 2017 +0800
Fix math functions which take pointer arguments
Change-Id: I332c997e640edbc44824691e2a9434c6b3dadefa
commit de590c469e213c42090ff83dbd060f25bb1d6047
Author: Wen-Heng (Jack) Chung <whchung@gmail.com>
Date: Wed Mar 1 18:38:54 2017 +0800
Changes to cope with Promote-free HCC
- abolish usage of address_space GNU attribute
- use __device in file-scope global variables which would be accessed by GPU kernels
- temporarily disable some math functions which take pointer arguments
Change-Id: I730311dee848e20e763e35cd3980317fce0dce0d
Change-Id: I1f6b970b53b9401eeaaab08f04a7b9fed0fb8cf0
1. Added math_functions.h to hip_runtime.h
2. Changed operator overloading classifier static to static inline
3. Added vector types test for gpu
4. Seperated __host__ and __device__ for math functions in headers
Change-Id: I499862fad5d7b10da686da9011d7ecefe523f8e2
1. Fixed compilation issues for tests
2. Added missing intrinsics + math functions
3. Disabled some device functions as they are causing linking error with HCC
Change-Id: I79d52c4c7a539cc8ef40580247ad97ffcb975f09
1. All fp32, fp64 math device/host functions should be in math_functions.h/.cpp
2. All fp32, fp64 fast math intrinsics for device/host functions should be in device_functions.h/.cpp
3. All the device code implementations should be in device_util.h/.cpp
4. Hence, made changes appropriately by moving code and creating new header files
5. Added math_functions.cpp/.h
6. Changed #ifndef signature to make sure no conflicts between headers with same names in hip/hip_runtime.h and hip/hcc_detail/hip_runtime.h
7. Changed tests to fit the code changes, making them to include appropriate headers
8. Added math_functions.cpp to CMakeLists.txt
9. Some of the tests are still broken, mostly host math functions will fix them in next commit
10. TODO: FIX compilation issues for host math functions
Change-Id: I7a17637d7e294a7d224ffba932c1a08668febd26
1. Fixed build issues with new Integer intrinsics
2. Changed tests to work exactly as CUDA code
3. Still some integer intrinsics need to be supported
Change-Id: Ie6f4171259cf4da517436895d4f6f01e01f59b11
1. Added all type conversion intrinsics
2. NO TESTS have been added. (Will add in next commit)
3. Sanatized code in hip_runtime.h
4. Added passed() to hipTestHalf to make it pass on HIT
Change-Id: I0987963c802fc7ff4d7e07d7b88d86da35da53c9
1. They use SDWA + LLVM IR
2. Added these functions to test
3. Need to do exp, exp10, log, log10, rint
Change-Id: I06176acc6cb8bb054495310531777406a41b54e4
1. Added math functions for half precision
2. HRCP is not available due to device code linking errors, will be enabled once it is fixed
3. Added math functions to half test file
Change-Id: Ie317ce70ef518a4fc3f27142143d01e0327f5df3
1. Added SDWA implementation inside IR file
2. Added device functions to header + used them in test
Change-Id: Ib4e059a58eee201cc82438689e3e9bc5f9d26653
1. Removed HIP_EXPERIMENTAL env variable so that device code will be accessed from LLVM IR
2. Removed soft support from headers and moved to hip_fp16.cpp
3. Added LLVM IR + inline asm to hip_ir.ll
4. Added test for fp16
5. Added barriers for hcc 3.5 and hcc 4.0 for half support
a. Which means, hcc 4.0 can parse __fp16 but hcc 3.5 cant
b. HCC 4.0 code is implemented now, hcc 3.5 will be added later
Change-Id: Ic37859b2688ebb02e168bab643d1882bf4727952
1. Use -DHIP_FAST_MATH to make precise math functions compiled to fast math
2. Added double fast math functions for sqrt
3. Changed hipcc to parse -use_fast_math (not working)
4. Added passed tag to hipFloatMath test
Change-Id: I72884b2436b4efe61e9a9297346c1358fee38a2d
1. Added fast math intrinsics for single precision data types
2. Added test to check the intrinsics
3. Added HIP_PRECISE_MATH macro to enable precise math on fast math
Change-Id: Iadacbb6182c31252c5e3252854372d1b80dfd27b
1. Added fast math apis for sin, cos, tan, sincos
2. Added test for trig math functions
3. Added logarithm fast math
4. Changed how hipGetDevice, hipDeviceGetCacheConfig emit errors
Change-Id: Ie6ab594ddd5853cbe85e39a2f6d3479a807fa323
1. hipDeviceGetLimit API for HCC path is added
2. Test for hipDeviceGetLimit API is added
3. The feature added only supports querying heap size
4. Corrected indents for malloc and free device functions
5. Removed redundant data structures
6. Added g_heap_malloc_size to store the heap size
Change-Id: If48d1b0ce9270e994f1c542cc283ddbb14746bbb
1. Refactored code to use HCC internal APIs rather than HCC copy APIs
2. Added hipMemcpyToSymbolAsync
3. Added test for hipMemcpyToSymbolAsync
4. Added new error hipErrorInvalidSymbol
Change-Id: I0e359b2d0ff5d682bbccdf9c2923e16b35e39497
1. Currently works only for __attribute__((addrspace(1))
2. Need to pass in string for name of the variable
3. Added test to check functionality
Change-Id: I4c3cc1bf151cb5423e4aef59fcc4ad5693b31641
1. Added feature for __threadfence and __threadfence_block
2. Added feature for using LLVM IR files directly while compilation
3. Added test for threadfence and threadfence_block
Change-Id: Ib7e5d89b4cca1a135952b317e5809cd05b56a3c9