-
-Contexts across threads are listed under device
-Device reset cleans up all contexts and re-initializes _primaryCtx
Change-Id: Ie1cfbb26d43a8dc6869be3e6ebaf7344ce374643
1. Changed test to assert for same hipFunction values
2. Added better memory management for hipModule
Change-Id: I10d7aef13c215a2211e262f3c79017f26a17d9a7
1. Split hip_ir.ll to hip_hc.ll and hip_hc_gfx803.ll
a. hip_hc.ll contains arch generic ir implementations
b. hip_hc_gfx803.ll contains gfx803 (fiji, polaris) specific ir
2. HIPCC can now parse --amdgpu-target=*.
a. Usage: hipcc --amdgpu-target=gfx803 --amdgpu-target=gfx701
b. TODO: Convert to --amdgpu-target=gfx803,gfx701
3. With LLC in HCC able to generate native f16 isa, removed inline half asm math ops
4. Fixed threadfence and threadfence_block to use functions in rocdl
Change-Id: Ic9a9e3e04139b0d75d2c2a263c030ca77adc1019
1. Added math_functions.h to hip_runtime.h
2. Changed operator overloading classifier static to static inline
3. Added vector types test for gpu
4. Seperated __host__ and __device__ for math functions in headers
Change-Id: I499862fad5d7b10da686da9011d7ecefe523f8e2
1. Fixed compilation issues for tests
2. Added missing intrinsics + math functions
3. Disabled some device functions as they are causing linking error with HCC
Change-Id: I79d52c4c7a539cc8ef40580247ad97ffcb975f09
1. All fp32, fp64 math device/host functions should be in math_functions.h/.cpp
2. All fp32, fp64 fast math intrinsics for device/host functions should be in device_functions.h/.cpp
3. All the device code implementations should be in device_util.h/.cpp
4. Hence, made changes appropriately by moving code and creating new header files
5. Added math_functions.cpp/.h
6. Changed #ifndef signature to make sure no conflicts between headers with same names in hip/hip_runtime.h and hip/hcc_detail/hip_runtime.h
7. Changed tests to fit the code changes, making them to include appropriate headers
8. Added math_functions.cpp to CMakeLists.txt
9. Some of the tests are still broken, mostly host math functions will fix them in next commit
10. TODO: FIX compilation issues for host math functions
Change-Id: I7a17637d7e294a7d224ffba932c1a08668febd26
1. As we use holder data structure, we move all the cmp, math, cvt apis to cpp file
2. All the tests passed
3. Add more extensive testing for half
Change-Id: I92c6399dace602a0a24432728e3f2a07124e6fb1
1. Added all type conversion intrinsics
2. NO TESTS have been added. (Will add in next commit)
3. Sanatized code in hip_runtime.h
4. Added passed() to hipTestHalf to make it pass on HIT
Change-Id: I0987963c802fc7ff4d7e07d7b88d86da35da53c9
1. They use SDWA + LLVM IR
2. Added these functions to test
3. Need to do exp, exp10, log, log10, rint
Change-Id: I06176acc6cb8bb054495310531777406a41b54e4
1. Added SDWA implementation inside IR file
2. Added device functions to header + used them in test
Change-Id: Ib4e059a58eee201cc82438689e3e9bc5f9d26653
1. Removed HIP_EXPERIMENTAL env variable so that device code will be accessed from LLVM IR
2. Removed soft support from headers and moved to hip_fp16.cpp
3. Added LLVM IR + inline asm to hip_ir.ll
4. Added test for fp16
5. Added barriers for hcc 3.5 and hcc 4.0 for half support
a. Which means, hcc 4.0 can parse __fp16 but hcc 3.5 cant
b. HCC 4.0 code is implemented now, hcc 3.5 will be added later
Change-Id: Ic37859b2688ebb02e168bab643d1882bf4727952
Includes some tricky manipulation of the locks for contexts and streams.
issue is that stealing a stream requires we lock the context to
walk the streams to find a victim. To avoid deadlock, we can't
have a stream locked when we lock the context. This implementation
releases the stream lock, then acquires the context and selects the
victim.
A more stable implemenation might be to copy the stream list
from a context so that a lock is not required to walk all streams.
Smart shared_ptr could be used to prevent the streams from being
deallocated during the walk.
- hipFunction_t is now returned by value. This eliminates dynamic
allocation / memory management complexity in the module. Removed
the kernel
name so the structure is just 16 bytes now.
- Moved the hsa_executable_load_module and hsa_executable_freeze
calls to the hipModuleLoad and hipModuleLoadData calls.
- Apply sharedMemBytes in hipModuleLaunchKernel to group segment
size (not private).