rocm-systems

Author	SHA1	Message	Date
Evgeny Mankov	4e02b285d6	[HIP][doc] NVIDIA-nvcc -> HIP-nvcc [ROCm/clr commit: `3df22b2fde`]	2019-10-28 22:46:33 +03:00
Evgeny Mankov	935dd4ce94	[HIP][doc] AMD-hcc -> HIP-hcc [ROCm/clr commit: `d312bce79d`]	2019-10-28 21:41:12 +03:00
Evgeny Mankov	20b127bf45	[HIP][doc] Fix typo: AMD-clang -> HIP-clang HIP-clang is already used below instead of AMD-clang [ROCm/clr commit: `6284b041e5`]	2019-10-28 21:19:21 +03:00
Evgeny Mankov	bcc9d88b20	[HIPIFY][tests] Rename the ambiguous call as well [ROCm/clr commit: `f68bee02f5`]	2019-10-25 16:07:31 +03:00
Evgeny Mankov	91732f98c0	[HIPIFY][tests] Fix ambiguous call to cusparseGetErrorString declared in cusparse.h [ROCm/clr commit: `9529e1d91d`]	2019-10-25 16:04:20 +03:00
Alex Voicu	f22391c362	Add missing operators, fix GCC compilation. (#1589 ) [ROCm/clr commit: `40522e2b6a`]	2019-10-25 15:44:24 +05:30
Alex Voicu	acbee5a48b	Fix deadlock, remove old __sync_* use. (#1584 ) This fixes a deadlock introduced by the switch to TTAS loops, and is therefore mildly urgent (to prevent the CI from hoovering in the broken code). [ROCm/clr commit: `f909a393ff`]	2019-10-25 15:44:17 +05:30
Rahul Garg	9c599a3581	[dtest] Fix hipMemset2D test (#1579 ) Reverts changes made in #1399. This is a RT api test. For testing hipMemAllocPitch , a new test should be written and that should use correct memset API. [ROCm/clr commit: `66a3c874c8`]	2019-10-25 15:44:05 +05:30
Rahul Garg	7ea7a9c3b7	Add hipMemcpy2DfromArray (#1510 ) Adds hipMemcpy2DFromArray and hipMemcpy2DFromArrayAsync equivalent to cudaMemcpy2DFromArray and cudaMemcpy2DFromArrayAsync. [ROCm/clr commit: `14b870d1ce`]	2019-10-25 15:43:33 +05:30
Rahul Garg	6760e4065e	Update profiling doc (#1576 ) [ROCm/clr commit: `ff8d3fa446`]	2019-10-24 17:51:55 +05:30
Jatin Chaudhary	e7f4cf4487	Adding New Analyze Target Merging with cppcheck (#1583 ) [ROCm/clr commit: `f53b1a1755`]	2019-10-24 17:46:06 +05:30
Rahul Garg	7f429afe2e	Add HIP checks in texture driver sample (#1581 ) [ROCm/clr commit: `170c4f0270`]	2019-10-24 17:45:51 +05:30
gandryey	21a2925ee7	Hip vdi profiling header (#1577 ) Add HIP-VDI profiling interface for GPU timing collection. [ROCm/clr commit: `f25692b399`]	2019-10-24 17:45:42 +05:30
Alex Voicu	5b917afa5f	Make CAS loops use the TTAS idiom. (#1573 ) * Make CAS loops use the TTAS idiom. * More efficient re-formulation of TTAS. * Fix typo. * The typo was not quite a typo [ROCm/clr commit: `26914ec76e`]	2019-10-24 17:45:20 +05:30
satyanveshd	ad1e409a24	Fix occupany APIs (#1560 ) Addresses SWDEV-205006 [ROCm/clr commit: `6c5fbf9b4a`]	2019-10-24 17:44:47 +05:30
searlmc1	510be4b5dc	Improve performance of v2 arg handling (#1539 ) * Improve performance of v2 arg handling * Missing change to `std::string` [ROCm/clr commit: `15a699688e`]	2019-10-24 17:44:05 +05:30
Alex Voicu	fb411b56c2	Improve scalar access into vector types. (#1531 ) The improvement is based on the ideas here: https://t0rakka.silvrback.com/simd-scalar-accessor. It yields significantly better ISA when the base's .xyzw members are used. [ROCm/clr commit: `84d5b399f6`]	2019-10-24 17:43:49 +05:30
Aryan Salmanpour	9e0eaef846	[hip] add support for implicit kernel argument for multi-grid sync (#1456 ) * [hip] add support for implicit kernel argument for multi-grid sync * modified code for calculating the prev_sum * change the impCoopArg type to size_t * add memory clean up * launch init_gws and main kernels into two separate loops [ROCm/clr commit: `93c688a0c9`]	2019-10-24 17:43:30 +05:30
Rahul Garg	764135d242	Merge pull request #1559 from vsytch/win10_aligned_alloc Fixes for hipMemcpy_simple on Windows [ROCm/clr commit: `465581612e`]	2019-10-23 13:10:59 -07:00
Evgeny Mankov	50d72e13ca	[HIPIFY][cmake][#1571 ] Take into account building hipify-clang as a part of building HIP while installing [Algorithm] [Release] If CMAKE_INSTALL_PREFIX is set by the user: If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise CMAKE_INSTALL_PREFIX is used unchanged. If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT): If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise use PROJECT_BINARY_DIR/bin for installation. [Debug] If CMAKE_INSTALL_PREFIX is set by the user: CMAKE_INSTALL_PREFIX is used unchanged. If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT): use CMAKE_CURRENT_SOURCE_DIR/bin for installation. Standalone build left unchanged: CMAKE_INSTALL_PREFIX is used if set. [ROCm/clr commit: `2435567e70`]	2019-10-23 18:54:45 +03:00
Evgeny Mankov	0896e41987	[HIPIFY] Disable delayed template parsing By implicit unconditional passing -fno-delayed-template-parsing option (which appeared in LLVM 3.8.0, thus doesn't need compatibility wrapping) to hipify-clang. [Reason] To parse uncalled template functions otherwise they are not parsed without calling, thus not hipified. Affects cub_03.cu test, which has uncalled global template function. [ROCm/clr commit: `7ab06b3892`]	2019-10-22 19:07:37 +03:00
Evgeny Mankov	82222bf945	[HIPIFY][#1569 ] Fix [ROCm/clr commit: `e2191e23e6`]	2019-10-22 11:08:37 +03:00
Evgeny Mankov	e3cf10192c	[HIPIFY][tests] Set max clang's CudaArch for corresponding CUDA major.minor version [Reason] To support maximum CUDA features in offline tests + Add defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 600 restriction for atomicAdd on doubles in atomics.cu. So if LLVM < 7 and --cuda-gpu-arch doesn't work, __CUDA_ARCH__ is unset too (350 by default in clang); if LLVM >= 7 --cuda-gpu-arch is used and __CUDA_ARCH__ is set based on it. [ROCm/clr commit: `3233a845f6`]	2019-10-21 17:50:00 +03:00
Evgeny Mankov	de849a44e7	[HIPIFY][perl] Support of 'using namespace cub' [ROCm/clr commit: `9633cdbd8a`]	2019-10-21 17:15:05 +03:00
Evgeny Mankov	665a200247	[HIPIFY][tests] Set max clang's CudaArch for corresponding CUDA version [Reason] To support maximum CUDA features in offline tests + Add CUDA_VERSION >= 800 restriction for atomics.cu [TODO] Find a way to use or exclude atomicAdd for doubles if LLVM < 7, because LLVM 6.0.1 and older do not use --cuda-gpu-arch in clang's Driver code at all (option is only declared) [ROCm/clr commit: `9fc7afa738`]	2019-10-21 15:51:25 +03:00
Evgeny Mankov	3a45daed0a	[HIPIFY][tests] Set -I for CUDA path instead of --cuda-path for LLVM < 4 [ROCm/clr commit: `ff6057d1ff`]	2019-10-20 20:08:56 +03:00
Evgeny Mankov	e07be75489	[HIPIFY][tests] Exclude all CUB tests if CUDA_CUB_ROOT_DIR is not set [ROCm/clr commit: `5bf1ff19ff`]	2019-10-20 20:03:18 +03:00
Vladislav Sytchenko	33acfa17c1	Remove extra #endif. [ROCm/clr commit: `432380aa5d`]	2019-10-18 16:40:29 -04:00
Evgeny Mankov	bb20336fa6	[HIPIFY][tests] Test clean-up [ROCm/clr commit: `44a897a146`]	2019-10-18 18:55:52 +03:00
Evgeny Mankov	85281b1d86	[HIPIFY][CUB][#1460 ] Add "using namespace cub" translation support + Add cub_03.cu [ROCm/clr commit: `86f6756b02`]	2019-10-18 18:51:40 +03:00
Evgeny Mankov	a392a050d6	Merge pull request #1558 from aaronenyeshi/fix-hipify-cmake-version [HIPIFY][cmake] Make CMakeLists use default 3.5.1 for Ubuntu 16.04 [ROCm/clr commit: `eb6690bbba`]	2019-10-18 06:39:35 +03:00
Rahul Garg	30759e7c9b	Merge pull request #1550 from yxsamliu/new-launch Add -fhip-new-launch-api to hipcc for HIP/VDI [ROCm/clr commit: `07eed1e5bf`]	2019-10-17 19:07:32 -07:00
Vladislav Sytchenko	54eddfc8f0	_aligned_malloc() on Windows first takes size, then alignment, which is the opposite of how the similar function behaves on Linux. Memory allocated by it also has to be freed using _aligned_free(), unlike Linux where we can use regular free(). Edit aligned_alloc() macro and add a aligned_free() one to align with the above behaviour. [ROCm/clr commit: `f4440817cb`]	2019-10-17 18:58:32 -04:00
Aaron Enye Shi	489e3dda9a	[HIPIFY][cmake] Make CMakeLists use default 3.5.1 for Ubuntu 16.04 [ROCm/clr commit: `b3ea58abe7`]	2019-10-17 21:21:24 +00:00
Evgeny Mankov	9fb60fa36a	[HIPIFY][doc] Update README.md + Versions, testing [ROCm/clr commit: `1165e6bd71`]	2019-10-17 22:26:48 +03:00
Rahul Garg	714314fa66	Revert "hipcc defaults to code object v3 (#1298 )" This reverts commit `e5a2ba9602`. [ROCm/clr commit: `446718f990`]	2019-10-17 13:27:28 -04:00
Evgeny Mankov	c8238e1fd4	[HIPIFY][cmake] Add install rule for clang-resource-headers + Fix: set destination for all installing files to ${CMAKE_INSTALL_PREFIX} [ROCm/clr commit: `8c3dff7ab9`]	2019-10-17 15:05:55 +03:00
Rahul Garg	685a4cd182	Merge pull request #1544 from vsytch/master QoL changes to the hipMemset family [ROCm/clr commit: `a21fe1443b`]	2019-10-16 18:54:20 -07:00
Evgeny Mankov	b357301610	[HIPIFY][CUB][#1460 ] Add cub:: namespace support in TemplateInstantiation of cudaLaunchKernel + Update cub_02.cu test accordingly [ROCm/clr commit: `e557563947`]	2019-10-16 19:02:13 +03:00
Vladislav Sytchenko	577bac5de8	hipMemset2D and hipMemset3D tests should be passing by default. [ROCm/clr commit: `86d0c5fa5a`]	2019-10-16 11:02:38 -04:00
Evgeny Mankov	97f10790eb	[HIPIFY] Refactor a couple of matcher functions + Separate out GetSubstrLocation function for finding substr SourceLocation in a given SourceRange [ROCm/clr commit: `0a20048759`]	2019-10-16 13:43:56 +03:00
Evgeny Mankov	643a8bcf5b	[HIPIFY][CUB][#1460 ] Implement cubFunctionTemplateDecl matcher + Add cub_02.cu test + Partial fixes #1460 [ROCm/clr commit: `5555d46e66`]	2019-10-16 13:08:11 +03:00
kjayapra-amd	97c823d552	Use the correct return type in runTest in 11_texture_driver sample. (#1546 ) Fixes SWDEV-203394. Currently in runTest() returns true, even if the texture reference copy does not happen. Using the existing testResult Flag to return from runTest(). [ROCm/clr commit: `9d571e3c9e`]	2019-10-16 10:52:15 +05:30
vsytch	4b8d8034cf	Update hipMathFunctions, hipTestHalf and hipTestNativeHalf tests to support Navi10 and Navi14. (#1545 ) [ROCm/clr commit: `c2aadd4d12`]	2019-10-16 10:51:48 +05:30
kpyzhov	19f22b468b	[hipcc] Temporary add -D_OPENMP to clang options to workaround cmake issue (#1540 ) * Temporary add -D_OPENMP to clang options in hipcc to allow using CMake OpenMP detection with hip-clang (until updated CMake version is available). [ROCm/clr commit: `9773f94c71`]	2019-10-16 10:51:28 +05:30
Nick Curtis	a7d6c03e17	Guard against division by zero for no VGPR usage (e.g., in an empty kernel) (#1528 ) * guard against division by zero for no VGPR usage (e.g., in an empty kernel) * fix bracket format * clean up parenthesis [ROCm/clr commit: `d16963c9d5`]	2019-10-16 10:49:56 +05:30
Jatin Chaudhary	1ec284d333	Adding code object manager to rtc (#1526 ) Adding Code Object Manager file to rtc to resolve address of Bundled_code_object in libhiprtc.so [ROCm/clr commit: `b3351561c5`]	2019-10-16 10:49:16 +05:30
Xiaozhu Meng	e7fb74b07f	Fix struct declaration for C (#1524 ) This change is necessary for HPCToolkit to use Roctracer to produce code centric profiling view. [ROCm/clr commit: `f9b8a01c77`]	2019-10-16 10:48:55 +05:30
Yaxun (Sam) Liu	abfe4248da	Add -fhip-new-launch-api to hipcc for HIP/VDI [ROCm/clr commit: `739530d53b`]	2019-10-15 21:47:33 -04:00
Vladislav Sytchenko	2bc49fb55c	In the hipMemset2D and hipMemset3D tests synchronize with the default stream after performing an async memset. [ROCm/clr commit: `cc5abec092`]	2019-10-15 17:15:49 -04:00

1 2 3 4 5 ...

3655 Commits