rocm-systems

Автор	SHA1	Сообщение	Дата
Aryan Salmanpour	015895a265	[HIP] add cooperative kernel launch APIs on NVCC (#1929 )	2020-03-17 14:01:11 +05:30
Joseph Greathouse	55e55e78bb	Fix maxSharedMemoryPerMultiProcessor attribute (#1927 ) The maxSharedMemoryPerMultiProcessor attribute is meant to describe the number of bytes of shared memory (LDS space in AMD terminology) in each SM (CU in AMD terminology). For instance, on AMD GPUs this is often 64KB per CU, and some Nvidia GPUs it's 96KB per SM. This shared memory is a different address space from the normal global memory. However, the current HIP-HCC properties fill this in with a size that matches the totalGlboalMem property. This gives a drastically too-high calculation for the amount of LDS space that each CU has -- tens of GBs vs. 10s of KBs. This patch fixes this by pulling the maxSharedMemoryPerMultiProcessor property from the HSA pool that describes how much workgroup-local space is available on each CU. The HSA runtime eventually pulls this from the topology information about LDSSizeInKB, defined as "Size of Local Data Store in Kilobytes per SIMD". Previously, this HSA query was used to fill in the value of the sharedMemPerBlock property. On today's AMD GPUs, we know that the amount of LDS avaialble to the workgroup is identical to the amount of LDS space in the CU. However, in the future this may differ. As such, this patch changes around the order and fills in the "PerMultiProcessor" property from the HSA query (since what's what the query is defined to return), and then separately fills in the "PerBlock" property as we know it.	2020-03-17 14:00:51 +05:30
Joseph Greathouse	bf04d7380a	Fix errors in occupancy calculation function (#1926 ) Fix two errors in hipOccupancyMaxActiveBlocksPerMultiprocessor. 1) Fix a possible segfault if the user passed in a null pointer for the numBlocks value. 2) Handle the situation when the user is asking for a block size that is larger than what the target device can hold within a single block.	2020-03-17 14:00:38 +05:30
Yaxun (Sam) Liu	7aa9611689	Let hipcc not pass -mllvm option to HIP-Clang on Windows (#1924 ) Currently there is a clang bug on Windows causing duplicate -mllvm options in clang -cc1. Tempoarily disable -mllvm options for HIP-Clang on Windows until the bug is fixed. Change-Id: I3a4393ba7745989398dc6c6001722837dad18704	2020-03-17 14:00:20 +05:30
Maneesh Gupta	eee5cc8621	Annotate `__constant__` (#1901 )	2020-03-17 13:59:44 +05:30
mhbliao	774035d869	[hip] Improve the portability of the header for vector type support. (#1873 ) - Need to check the availability of `__has_attribute` builtin macro instead of compiler versions. That's more reliable and portable among various compilers. - Provides a very basic support of vectors for unknown compilers.	2020-03-17 13:59:24 +05:30
Evgeny Mankov	821c60a3d9	Merge pull request #1916 from asalmanp/refactor_cooperative_APIs [HIP] Refactor cooperative APIs	2020-03-12 19:12:50 +03:00
Evgeny Mankov	70f5646f8a	Merge pull request #1908 from asalmanp/prop_mulit_coop [HIP] add hip specific properties for cooperative kernel multi device	2020-03-12 19:12:11 +03:00
Alex Voicu	1c5f526e6b	Merge branch 'master' of https://github.com/ROCm-Developer-Tools/HIP into feature_robust_constant	2020-03-12 14:20:26 +00:00
Maneesh Gupta	0726abf424	Expose support for non-returning atomic FADD (#1909 ) Change-Id: If5359488324477315a9bd4f308a75f606c065b39	2020-03-11 14:33:15 +05:30
srinivamd	65a790bc08	return hipSuccess when count is zero (#1900 )	2020-03-11 14:32:54 +05:30
Evgeny Mankov	f98ce58e06	Merge pull request #1925 from arghdos/patch-3 Fix incorrect shfl_xor for Windows	2020-03-11 00:58:11 +03:00
Aryan Salmanpour	b663fccf0b	[HIP] return an error if blockDim exceeds maxThreadsPerBlock	2020-03-10 15:26:53 -04:00
Evgeny Mankov	da8669ea03	Merge pull request #1922 from emankov/HIP [HIP][doc] Update README.md	2020-03-10 22:20:00 +03:00
Nick Curtis	09edc7e49c	Fix incorrect shfl_xor for Windows copy/paste error, need __shfl_xor w/ lane_mask	2020-03-10 12:04:05 -05:00
Evgeny Mankov	fea3017168	[HIP][doc] Update README.md	2020-03-10 18:04:01 +03:00
Evgeny Mankov	cabadddc7b	Merge pull request #1919 from ssahasra/declare-printf separate printf declaration for vdi/clang	2020-03-10 12:26:54 +03:00
Aryan Salmanpour	5494f5b247	[HIP] fix formatting/code clean up and fix a bug	2020-03-09 16:03:59 -04:00
Sameer Sahasrabuddhe	09130b3b92	separate printf declaration for vdi/clang There are now two implementations of printf in HIP: 1. The implemenation for HCC is controlled by the HC_FEATURE_PRINTF macro, and it works only with the HCC compiler used in combination with the HCC runtime. 2. The implementation for hip-clang requires the VDI runtime, and is always enabled with that combination.	2020-03-09 09:40:05 +05:30
Alex Voicu	c7f7ada0e9	Merge branch 'master' of https://github.com/ROCm-Developer-Tools/HIP into feature_robust_constant	2020-03-08 18:00:14 +00:00
Aryan Salmanpour	4844fbdf0a	[HIP] Refactor cooperative APIs	2020-03-06 18:30:12 -05:00
Aryan Salmanpour	7e45c54ea6	move new enums to the end to maintain compatibility	2020-03-06 11:38:44 -05:00
Evgeny Mankov	5c036520b1	Merge pull request #1914 from emankov/hipify-clang [HIPIFY][doc] Update README.md: LLVM 10.0.0-rc3 is supported	2020-03-06 18:20:36 +03:00
Evgeny Mankov	dd5f3fd282	[HIPIFY][doc] Update README.md: LLVM 10.0.0-rc3 is supported + Add -DLLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN=ON for LLVM 10.0.0 or newer + Supported versions update	2020-03-06 18:17:05 +03:00
Alex Voicu	44e5834c8e	Merge branch 'master' of https://github.com/ROCm-Developer-Tools/HIP into feature_robust_constant	2020-03-06 12:33:31 +02:00
Maneesh Gupta	4a40010ac6	Expose support for non-returning atomic FADD Change-Id: If5359488324477315a9bd4f308a75f606c065b39	2020-03-05 10:30:52 +05:30
Evgeny Mankov	43d12b523c	Merge pull request #1905 from emankov/hipify-clang [HIP][cmake] Remove dependency from hipify-clang	2020-03-04 07:42:48 +03:00
Aryan Salmanpour	03797ae986	[HIP] add hip specific properties for cooperative kernel multi device	2020-03-03 13:25:36 -05:00
Evgeny Mankov	1561f61642	[HIP][cmake] Remove dependency from hipify-clang [Reason] Upcoming hipify-clang's splitting out into a new repository https://github.com/ROCm-Developer-Tools/HIPIFY.	2020-03-03 12:07:13 +03:00
Alex Voicu	27480ff5a2	Annotate `__constant__`	2020-02-28 22:54:00 +02:00
Jatin Chaudhary	d29ad50464	[dtests] __shfl_up and __shfl_down tests (#1899 )	2020-02-28 16:48:15 +05:30
Siu Chi Chan	57edf48191	improve code object loading error message (#1889 )	2020-02-28 16:47:40 +05:30
saleelk	3e1f41c165	Fix HIPRTC headers to export C style symbols (#1879 )	2020-02-28 16:47:29 +05:30
Rahul Garg	6c5fa32815	Remove deprecated HIP markers (#1876 )	2020-02-28 16:47:15 +05:30
Rahul Garg	edc97f3073	Add hipDrvOccupancyMaxActiveBlocksPerMultiprocessor[WithFlags] (#1854 ) Equivalent to cuOccupancyMaxActiveBlocksPerMultiprocessor[WithFlags].	2020-02-28 16:46:55 +05:30
jiabaxie	af90312867	Cleaned up error messages for HipEnvVarDriver test (#1825 ) There were several error messages that appeared even if the hipEnvVarDriver.exe test passes and executes successfully. Now it is cleaned up. The following are those instances: * When popen searches for directed_test directory but does not find it, it outputs an error, then finds the hipEnvVar at the same level. Currently the fix will prompt the test to only output an error if both searches for hipEnvVar fails. * When assertion is used towards the later half of the test, conditions were set to specifically hide the devices, resulting in No Hip Device detected in the latter half of the test. The fix will make these errors not appear as they are intended to not find any devices. Assertions themselves are untouched. HipEnvVarDriver.cpp has also been refactored. Reading HipEnvVar will now happen in a helper function for getDeviceNumber and getDevicePCIBusNumRemote, as the code to read HipEnvVar were really similar in them.	2020-02-28 16:46:12 +05:30
Alex Voicu	d830dad3be	Address post-staging issues in #1809 (#1894 ) Fixes SWDEV-223910 and SWDEV-223663	2020-02-27 16:21:12 +05:30
Maneesh Gupta	71e1f87f7e	bump version to 3.2 (#1898 ) - Bump version to 3.2 - [ci] Enable tests on ROCm 3.1	2020-02-27 16:18:31 +05:30
Nick Curtis	b7dd073d93	fix long shuffle implementations for windows (#1895 ) Fixes for SWDEV-223694	2020-02-26 15:53:56 +05:30
Yaxun (Sam) Liu	69404d8e78	Fix hipcc for extra -mllvm option (#1885 )	2020-02-26 15:53:43 +05:30
Sarbojit2019	c1a70707e0	[HIPIFY] Add back missing execute permission to hipify-perl (#1881 ) hipify-perl script lost its executable permission hence "samples/0_Intro/square" was failing. Fixes SWDEV 223433.	2020-02-19 13:48:20 +05:30
eshcherb	82ec3c1c5b	adding hipExtModuleLaunchKernel to tracing layer (#1880 )	2020-02-19 13:47:49 +05:30
Alex Voicu	9b4f39e1d8	Tweak synchronous memcpy implementation (#1809 ) The existing one can have issues on certain systems, therefore this limits use of direct memcpy via largeBAR to sizes where it is unequivocally better. Also addresses SWDEV-220030 and SWDEV-222237.	2020-02-18 20:50:27 +05:30
Yaxun (Sam) Liu	92cc29ae2b	Let HIP-Clang inline all functions by default (#1875 ) This is a quick workaround to match HCC behavior for performance since inlining usually results in more optimization opportunities therefore better performance. We will fine tuning inline threashold later.	2020-02-17 22:49:26 +05:30
Rahul Garg	8c5e5e435b	Fix hipMemcpy3D (#1798 ) Fixes #1790 and #1791. hipMemcpy3D still requires further refactoring for different input and output combinations.	2020-02-17 19:35:35 +05:30
Maneesh Gupta	854afef281	[dtests] Fix random timeout failures in hipModuleLoadDataMultThreaded (#1877 ) Limit the max threads that are launched to 16.	2020-02-17 11:16:20 +05:30
vsytch	56b8b0d80e	Add missing __hip_pinned_shadow__ attributes to the texture global vars. (#1866 )	2020-02-15 09:52:25 +05:30
Maneesh Gupta	e7120dd876	Use deque instead of vector for code readers so that the iterators and references will be stable (#1851 ) * Use deque instead of vector for code readers so that the iterators and references will be stable * Fix compile error * Assign the iterator * Add multithreaded test * Make threads a multiple of hardware concurrency * Output on failure * Add setDevice to try and initialize the context on cuda * Create context for cuda * Set context on each thread * Reduce threads on cuda * Skip test on cuda * Try to initialize the primary context on cuda * Push ctx to the stack as current * Revert "Push ctx to the stack as current" This reverts commit `bff8cbe950`. * Revert "Try to initialize the primary context on cuda" This reverts commit `fd98514113`. * updated test for nvidia path * Add c++11 option for nvcc Co-authored-by: satyanveshd <53337087+satyanveshd@users.noreply.github.com>	2020-02-15 09:51:24 +05:30
Nick Curtis	797a929a65	Implement long / long long shuffles (#1829 ) Implement additional data-types for shuffles (long and long long). Based upon the double implementation.	2020-02-15 09:51:09 +05:30
Siu Chi Chan	f2ab87d872	Disabling HCC code object v3 generation by default. Some PyTorch unit tests have regression. Disabling cov3 to allow more time to debug and unblock PyTorch Change-Id: Iba7f425ef3499c20c42ec45d9152b5d27ce97d03	2020-02-14 19:39:27 -05:00

1 2 3 4 5 ...

4425 Коммитов