コミットグラフ

870 コミット

作成者 SHA1 メッセージ 日付
Aaron Enye Shi 0cfaa52d15 Guard rcp rounded implementation as well
Since rcp implementations of non-default rounded versions are not correct or supported in OCML, guard them using the same macro OCML_BASIC_ROUNDED_OPERATIONS. Also update the docs and tests.
2018-11-06 19:53:28 +00:00
Aaron Enye Shi e59c33250a Guard the OCML rounded operations instead
Instead of commenting all these functions out, guard the functions with a macro OCML_BASIC_ROUNDED_OPERATIONS.
2018-11-06 16:32:14 +00:00
Aaron Enye Shi 2b108a2cfd Remove non-working non-default-rounded math apis
In ROCm-Device-Libs, they have dropped the non-default-rounded versions of add, sub, mul, div, sqrt and fma. Therefore, ocml has removed the rte, rtp, rtn, and rtz counterparts. This will remove the same math APIs in HIP for _ru, _rd, _rn, and _rz.
2018-11-05 22:34:16 +00:00
Siu Chi Chan 0ff408a56c Move the global arrays for hip malloc/free
from a header into a source file such that
there's only an unique copy in an executable
and prevent wasting static memory on the host

Change-Id: Id5b62766f77809c8d7b47892cb7149c490dcbdb9
2018-11-01 16:20:35 -04:00
Jeff Daily bb447dd76e typedef struct hipFuncAttributes now C compatible.
Fixes #591.
Fixes #694.
2018-10-31 14:12:45 +05:30
Maneesh Gupta 0b86b421c1 Merge pull request #714 from aaronenyeshi/remove-static-hip-device
Remove libhip_device.a static library
2018-10-31 14:08:29 +05:30
Maneesh Gupta 7f3e7b5334 Merge pull request #713 from scchan/master-hcc_early_finalize_support
Support more than one device code blobs in a single .kernel section
2018-10-31 14:04:06 +05:30
Aaron Enye Shi 817b27d530 Remove libhip_device.a static library
Move remaining function definitions from device_util.cpp to hip_runtime.h header. We can now remove the static library completely as device_util.cpp was the last file part of target hip::hip_device .
2018-10-25 19:52:35 +00:00
Alex Voicu e4181b85be hipLaunchKernel, hipLaunchParm are deprecated, and shall be removed. 2018-10-25 13:32:17 +01:00
Maneesh Gupta f6f160fa6b Merge pull request #723 from mangupta/fix_double_shfl
Fix logic for double variants of __shfl*
2018-10-25 06:01:38 +05:30
Maneesh Gupta 19404e603d Fix logic for double variants of __shfl*
Change-Id: I604f00b54cf4bd9c5f26ca6fa680fca5e9629417
2018-10-24 12:39:09 +05:30
Maneesh Gupta 0703a2d0f0 Make HIP functional again with HCC from ROCm 1.9.x
Change-Id: I214acdfd0b79dcf783993e44fe31baee64fd4dc3
2018-10-24 10:41:56 +05:30
Maneesh Gupta 4a00b244a3 Merge pull request #705 from ROCm-Developer-Tools/feature_minimal_changes_for_hc_next
Feature minimal changes for hc next
2018-10-19 06:58:31 +05:30
Siu Chi Chan 30ce25e627 Support more than one bundles in a single .kernel section
When compiling with Early Finalization enabled in HCC,
the resulting .kernel section of the host object now may
contain more than one device code bundles.  This is to
teach the HIP runtime to correctly extract all the
bundles from the .kernel section.
2018-10-18 17:00:27 -04:00
Alex Voicu 5ccaf2fa7d Dumb workaround is still needed, so add it back. 2018-10-18 15:33:46 +01:00
Alex Voicu fe959f7bd7 Re-sync with upstream. 2018-10-18 12:27:03 +01:00
Maneesh Gupta 1a5025c57e Merge pull request #688 from aaronenyeshi/fix-sinf-cosf-ocml
Use sinf and cosf from ocml device libs
2018-10-18 16:39:20 +05:30
Maneesh Gupta d133493669 Merge pull request #692 from whchung/hip-reinit-take2
HIP program state re-initialization logic (take 2)
2018-10-18 12:06:41 +05:30
Maneesh Gupta c24b06fa0a Merge pull request #703 from mangupta/stream_create_with_priority
Implementation for stream priority
2018-10-17 10:53:43 +05:30
Maneesh Gupta dbe4431d98 Merge pull request #702 from aaronenyeshi/fix-missing-irif-lib
Replace IRIF fences with atomic_work_item_fence
2018-10-17 10:53:27 +05:30
Maneesh Gupta 64d1cf86b7 Add missing hipHostRegister flags on nvcc path
Change-Id: I69f09204d9c544935104d4168ab8d3626666a623
2018-10-15 15:30:24 +05:30
Alex Voicu 5312336ce2 Minimal should mean minimal. 2018-10-11 00:21:41 +01:00
Alex Voicu 3e4dbd32a1 Address Aaron's comments 2018-10-11 00:03:01 +01:00
Alex Voicu 4bc40551b5 Merge branch 'master' of https://github.com/ROCm-Developer-Tools/HIP into feature_minimal_changes_for_hc_next 2018-10-10 11:44:09 +01:00
Alex Voicu ca375cb8c5 Re-sync with upstream. 2018-10-10 11:43:49 +01:00
Maneesh Gupta da64156fb2 Implementation for stream priority
- Requires ROCm 1.9.x or higher
- Requires HCC with PR#886 merged

Change-Id: Id7c95ea091ee610e80c9ad815f1cb989cba570ca
2018-10-05 16:27:46 +05:30
Aaron Enye Shi 0787f74ac2 Replace IRIF fences with atomic_work_item_fence 2018-10-04 21:47:28 +00:00
Aaron Enye Shi 5dd35576f6 Fix hip_vector_types.h for long long vectors
There was a missing long in the declaration for [u]longlongN types.
2018-10-03 13:57:52 -04:00
Wen-Heng (Jack) Chung dab1a0f9db HIP program state re-initialization logic
This commit is to support kernels dynamically loaded thru means such as
dlopen() after HIP runtime initializes.
2018-09-26 19:48:47 +00:00
Aaron Enye Shi 77c07d4118 Use sinf and cosf from ocml device libs
Using llvm_amdgcn builtin fails to produce accurate values, we should move to using the ocml device library versions.
2018-09-25 19:31:39 +00:00
Maneesh Gupta 3d67c9f952 Merge pull request #614 from ROCm-Developer-Tools/fma
Add overloading resolution functions for fma
2018-09-20 13:38:03 +05:30
Yaxun Sam Liu a5c961e26c Silent warnings about duplicate static keyword
static is already in __DEVICE__, so should be removed.
2018-09-19 10:39:45 -04:00
Yaxun Sam Liu bd622a4b4a Add fma function with float and _Float16 arguments 2018-09-19 09:59:33 -04:00
Yaxun Sam Liu cf184460e9 Fix build failure of hipTestHalf and hipTestIncludeMath for hip-clang 2018-09-18 21:00:15 -04:00
Maneesh Gupta 9ee70fca8a Merge pull request #672 from iotamudelta/fp16_fix
Only LLVM6 and higher contain the necessary intrinsics.
2018-09-18 08:43:33 +05:30
Maneesh Gupta 32787fa1fc Merge pull request #674 from mangupta/fix_dtests_on_nvcc
[dtests] Fix hipTestClock, hipTestNew, hipTestGlobalVariable, hipSimpleAtomicsTest & hipTestIncludeMath tests on nvcc path
2018-09-18 07:50:52 +05:30
Maneesh Gupta 5cf281071d Merge pull request #677 from yxsamliu/fix-launch-decay
Fix hipLaunchKernelGGL for hip-clang
2018-09-18 07:50:37 +05:30
Yaxun Sam Liu cdfd82f1de Disable device code for gcc in hip_memory.h
These device code should only be seen by HCC or hip-clang. They causd build failure
for HIP-VDI runtime and should be disabled for gcc.
2018-09-17 16:50:42 -04:00
Yaxun Sam Liu fc228c7ea6 Fix hipLaunchKernelGGL for hip-clang
Do not decay function pointer type of the kernel argument passed to hipLaunchKernelGGL
and hipLaunchKernel, otherwise some type information is lost which may cause
type inference failure for the template.

This issue caused compilation error of FeatureLPPooling in Caffe2/PyTorch and this patch
fixes that.
2018-09-17 11:20:41 -04:00
Maneesh Gupta cef5261fa9 Add mappings for __clock* in nvcc_detail/hip_runtime.h
Change-Id: Ibcecf52f3e69298268d921efc036090544fa0ed0
2018-09-17 15:23:30 +05:30
Alex Voicu c6720e882b Align with HC Next. 2018-09-17 11:50:29 +03:00
Maneesh Gupta 66f863d1f3 Merge branch 'master' into support-malloc 2018-09-17 10:17:25 +05:30
Maneesh Gupta cb348421d7 Merge pull request #650 from ROCm-Developer-Tools/hip-clang-new
Support placement new in hip-clang
2018-09-15 11:21:01 +05:30
Maneesh Gupta 8fe4e22b19 Merge pull request #665 from aaronenyeshi/fix-min-funcs
Use templates for min to prevent ambiguity
2018-09-14 13:21:38 +05:30
Aaron Enye Shi 6b811ca6d1 Fix Tensorflow ambiguous min issue 2018-09-13 23:16:20 +00:00
Johannes M Dieterich cf12a9c049 Only LLVM6 and higher contain the necessary intrinsics. 2018-09-13 13:55:43 -05:00
Maneesh Gupta aed5ad31ba Merge pull request #669 from ROCm-Developer-Tools/feature_automatic_cast
Remove potential for mismatch between runtime passed actuals and defined formals
2018-09-13 07:54:22 +05:30
Maneesh Gupta 411e53a665 Merge pull request #661 from yxsamliu/add-empty-printf
Add empty printf for hip-clang
2018-09-13 07:54:03 +05:30
Aaron Enye Shi 894cbdd749 Avoid AMP-retrict call to CPU-restrict 2018-09-12 14:54:31 +00:00
Alex Voicu cdfea3ef7b Remove potential for mismatch between runtime passed actuals and defined formals. 2018-09-12 10:30:48 +01:00