Граф коммитов

960 Коммитов

Автор SHA1 Сообщение Дата
cdevadas 35f0fb2916 Increased the number of implicit-kernarg bytes to 56 (#1217)
[ROCm/clr commit: fc0aca2a7d]
2019-07-19 04:45:34 +00:00
wkwchau d20537e595 Fixed bug of determine max block size in hipOccupancyMaxPotentialBlockSize (#1235)
[ROCm/clr commit: 6ec476e50a]
2019-07-18 03:19:29 +00:00
ansurya 8b8946f78b Add Max Texture 1D,2D,3D device properties (#1226)
* Add Max Texture 1D,2D,3D device properties

* Corrected testcase to use enums defined in hipDeviceAttribute_t

* Added texture 1D,2D and 3D support for NVIDIA path


[ROCm/clr commit: 00aa42e05f]
2019-07-18 03:18:50 +00:00
Rahul Garg d92afe2277 Fix HIP_VISIBLE_DEVICES order (#1184)
* Fix HIP_VISIBLE_DEVICES order

* Fix device IDs mismatch

* Fix review comments- loop order and device range check

* Handle incomplete VISIBLE device env variable

* Revert "Handle incomplete VISIBLE device env variable"


[ROCm/clr commit: d2e8cdc8fb]
2019-07-18 03:18:04 +00:00
Aryan Salmanpour a4992850d8 [hip] fix a bug where we parse kernel's arguments layout for a given kernel multiple times (#1232)
[ROCm/clr commit: 8b90a5d274]
2019-07-17 07:29:07 +00:00
Evgeny Mankov 24af494e97 [HIP] Fix segfault on uninitialized struct members in hipArrayCreate and hipArray3DCreate
[ROCm/clr commit: 299fbd4842]
2019-07-12 16:38:26 +03:00
Evgeny Mankov b5f0cdaa7b [HIP][HIPIFY] Split HIP_ARRAY_DESCRIPTOR struct to HIP_ARRAY_DESCRIPTOR and HIP_ARRAY3D_DESCRIPTOR
[Reason] To be compatible with CUDA [#1133]

Update HIP code, hipify-clang, tests and docs

[TODO] Add support of the corresponding functions on nvcc fallback path


[ROCm/clr commit: f0832fd968]
2019-07-11 14:58:16 +03:00
Jatin Chaudhary 6e8edf8890 Adding bounds check before hipMemset (#1190)
* Adding bounds check in ihipMemset

* Adding ihipMemPtrGetInfo to hipMemPtrGetInfo


[ROCm/clr commit: fcb0a3d4e2]
2019-07-08 11:00:38 +00:00
Aryan Salmanpour 4b06a21504 [hip] Move _criticalData of ihipStream_t class to private section and use criticalData() to access it (#1177)
[ROCm/clr commit: 7e48231252]
2019-07-04 00:42:19 +00:00
Maneesh Gupta 154b861905 Added missing NULL checks and corrected API return values (#1188)
* Added missing NULL checks and corrected API return values as per validation

* Added missing NULL checks


[ROCm/clr commit: a220a8e8e9]
2019-07-03 08:51:39 +00:00
Anusha Godavarthy Surya 27722855b4 Added missing NULL checks
[ROCm/clr commit: 1a7c7e3b06]
2019-06-27 20:19:30 +05:30
Anusha Godavarthy Surya e643bae27d Added missing NULL checks and corrected API return values as per validation
[ROCm/clr commit: 4989452413]
2019-06-27 00:19:05 +05:30
wkwchau 7662c1a650 Fixed bug in hipOccupancyMaxPotentialBlockSize for the SGPRs limitation of gfx8 devices (#1176)
[ROCm/clr commit: 3742f24477]
2019-06-26 15:18:00 +05:30
Aaron Enye Shi 553caedb5c Fix dlpi_name info empty when using GCC on ub18 (#1181)
This fixes a bug where GCC++ on Ubuntu 18.04 creates failing executables compared to GCC++ on 16.04 and clang++. While creating function names on Ubuntu 18.04, dl_phdr_info seems to provide a non-zero value for dlpi_addr on initial iteration, and an empty string in dlpi_name. This is causing failure when linking with g++, since the empty string prevents the kernel function from being loaded. Clang++ and GCC on UB16 provide a zero value for dlpi_addr. To fix this, we need to verify both addr and name exists, so that /proc/self/exe can be properly loaded.

[ROCm/clr commit: f87b900f96]
2019-06-25 06:32:29 +05:30
Aryan Salmanpour 362445220a [hip] implement the hipExtLaunchMultiKernelMultiDevice API (#1165)
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API

* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5

* modified code based on the requested changes

* changes to lock all streams before launching kernels for each device and unlock them after the dispatches

* check each stream to be valid before starting to lock all the streams


[ROCm/clr commit: d6ad690cb6]
2019-06-20 05:59:05 +05:30
wkwchau 81b5ea1c4a Implement the hipOccupancyMaxPotentialBlockSize function (#1162)
* Implement the hipOccupancyMaxPotentialBlockSize function

* Replaced hipGetDeviceProperties() call by ihipGetDeviceProperties() in ihipOccupancyMaxPotentialBlockSize()

* Add test for hipOccupancyMaxPotentialBlockSize in Module API

* Added extern declaration for ihipGetDeviceProperties() to be accessed inside ihipOccupancyMaxPotentialBlockSize()

* fixed hipOccupancyMaxPotentialBlockSize test build issue

* Fix hipOccupancyMaxPotentialBlockSize dtest

* Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest

* Revert "Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest"

This reverts commit 0480ff56f1441fc515d2c26ce33783e303423938.

* Disable hipOccupancyMaxPotentialBlockSize dtest on NVCC

* move extern declaration of ihipGetDeviceProperties to hip_module.cpp

* Update the limiation of 32 wavefronts per CU and 800/512 SGPRs for VI/pre-VI chips to calculate the occupancy


[ROCm/clr commit: 28c34ead70]
2019-06-20 05:58:29 +05:30
Maneesh Gupta 0b3f5d4524 Merge pull request #1167 from eshcherb/hip_prof_refactoring_190611
prof layer includes refactoring

[ROCm/clr commit: 3b3118d459]
2019-06-19 13:36:33 +05:30
Rahul Garg effbc8b212 HACK for SWDEV-173477/SWDEV-190701
[ROCm/clr commit: 107734f7ad]
2019-06-13 18:15:31 -07:00
Evgeny 214c01e6bf prof layer includes refactoring
[ROCm/clr commit: c6600ba26b]
2019-06-11 20:13:29 -05:00
Maneesh Gupta b4fb2b0ab4 Merge pull request #1140 from scchan/program_state_stage_2-rebase-20190524
migrate more program_state logic from header into shared library (phase II)

[ROCm/clr commit: 1d5d923d36]
2019-06-05 16:09:01 +05:30
Maneesh Gupta 3d6944e0db Merge branch 'master' into implicit-kernarg
[ROCm/clr commit: d4fa74ff09]
2019-06-04 13:24:19 +05:30
Maneesh Gupta 1a9326b2dd Merge pull request #1155 from gargrahul/fix_kernel_lp_dim_trace
Fix wrong grid dim shown in trace

[ROCm/clr commit: 40a09318e4]
2019-06-04 13:21:39 +05:30
Maneesh Gupta 9f35c7bf43 Merge pull request #1130 from lmoriche/master
Add support for code object v3

[ROCm/clr commit: 4b3d59a93e]
2019-06-04 13:20:52 +05:30
cdevadas 8de283ef77 Runtime changes to append implicit kernel arguments.
Appended 48 empty bytes to the kernarg area at runtime. The implicit arguments are enabled primarily for the hostcall services
and it is completely abstracted from the user code. Enabled it for both hip-clang and hip-hcc.


[ROCm/clr commit: 214ec53da3]
2019-06-04 10:45:49 +05:30
Rahul Garg a8de3fafba Fix wrong grid dim shown in trace
[ROCm/clr commit: 7a2e3b6a1c]
2019-05-31 22:30:24 +05:30
Siu Chi Chan bafd29662c replace std::vector for kernarg
[ROCm/clr commit: 1fb9ab2d44]
2019-05-24 17:27:43 -04:00
Siu Chi Chan 304a1e2dbe move executable_cache into program_state.cpp
[ROCm/clr commit: 1a2d332e76]
2019-05-24 17:27:25 -04:00
Siu Chi Chan 305eb4239e remove executables() from program_state
[ROCm/clr commit: e2c0122892]
2019-05-24 17:27:01 -04:00
Siu Chi Chan 0cae3e06c1 moving agent_globals_impl into hip_module
[ROCm/clr commit: 6852be819f]
2019-05-24 16:43:38 -04:00
Laurent Morichetti 4c402ccfaf Add support for code object v3
Use the code object manager library to parse the code object metadata. Both
code object v2 and v3 formats are now supported for HCC generated binaries.


[ROCm/clr commit: de89102528]
2019-05-23 18:03:32 -07:00
Evgeny Mankov 204043c6e0 [HIP][HIPIFY] Make hipMemcpyParam2D coherent with cuMemcpy2D
+ Makes hip_Memcpy2D struct compatible with CUDA_MEMCPY2D struct
+ Add hipMemcpyParam2D support in nvcc fallback path
+ Update hipify-clang, tests and docs accordingly


[ROCm/clr commit: 9cb3e9aa5e]
2019-05-22 18:31:39 +03:00
Alex Voicu a4a3132c64 Add HIPRTC, glorious ersatz for NVRTC (#1097)
* Add ersatz for NVRTC.

* Fix extraneous paren and use correct namespace.

* Use lowerCamelCase (yuck, yuck) consistently.

* Link against FS when building hiprtc lib.

* Correctly mark Manipulators. Fix dual compile.

* Add unit tests. Extend HIT to accept linker options.

* Make sure the HIPRTC library is installed.

* Better logging. Try to auto-detect the target.

* Stop specifying the target explicitly.

* Add missing flavour of `hipModuleLaunchKernel`.

* Program was already destroyed.

* Don't use `--genco`. Fix mangled name trimming.

* Fix HIPRTC breakage due to upstream noise.

* [dtests] Replace RUN -> TEST in hiprtc tests

Change-Id: Ie499e92dfe4e5c94634b1c2b76cf52d241bcfea3

* [hit] Set HIP_PATH to HIP_ROOT_DIR for all tests

Change-Id: Ib0ad1f99bc71c03e363e055dd508a7a4a210680a


[ROCm/clr commit: a538eb705a]
2019-05-16 18:28:54 +05:30
Wenkai Du 3d75b10e0b Use NUMA distance for hop count calculation
[ROCm/clr commit: 56d2dc0022]
2019-05-15 21:50:35 +00:00
Maneesh Gupta e0e30536e6 Merge pull request #1083 from gargrahul/fix_hip_impl_visible_agents
Maintain HIP_VISIBLE_DEVICES for kernel launch

[ROCm/clr commit: c9fdb42b91]
2019-05-13 14:20:18 +05:30
Rahul Garg d44e800a17 Add fine grained host memory lock support (#1095)
* Add fine grained host memory lock support

* Fix default flag check


[ROCm/clr commit: e1f3dc0c80]
2019-05-13 11:48:26 +05:30
Siu Chi Chan 76f535b4ce migrate program_state logic from header into shared library (phase I) (#1077)
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""

This reverts commit f8d108a815.

* Revert "Use COMgr to read Kernel Args Metadata (#1006)"

This reverts commit 10048a5631.

* Revert "improve program state commentary"

This reverts commit 5233d41c6c.

* Revert "load program state once per agent"

This reverts commit 9cee2c5311.

* start moving function_names() into the hip shared lib

* start moving code_object_blobs to a new "state" object

* Consolidate various program state related static objects into a
single program_state object

* minor clean up

* move more stuffs from functional_grid_launch into program_state

* debug make_kernarg

* moving lookup for kernargs size_align into program_state

* clean up old code for kernarg size and alignment

* update hip_module to use newer api in program_state

* Create public member functions for program_state

* move most program state functions into shared library

* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)

* turning free functions in program state into members of program_state_impl

* change the free function globals() into a member of program_state_impl

* replace the static mutex used for populating globals

* moving associate_code_object_symbols_with_host_allocation into
program_state_impl

* move load_code_object_and_freeze_executable into program_state_impl

* moving executables and functions_names into program_state_impl

* moving kernels() into program_state_impl

* moving functions() into program_state_impl

* move get_kernargs into program_state_impl

* moving kernel_descriptor into program_state_impl

* moving kernargs_size_align calculation into program_state_impl

* Changing the handle to program_state_impl to a pointer

* moving program_state_impl into a separate inline source file

* fixing/cleaning up some header file includes

* moving member function for kernargs_size_align into program_state.cpp

* moving Kernel_descriptor into program_state.inl

* add a new class to manage agent globals

* moving all agent globals processing functions into agent_globals_impl

* load program state once per agent

re-merging PR991 against other program state changes

* fix per-agent program state member initialization

* cache executables based on elf name, isa, and agent.

This avoids program state reloading executables after a shared library is dlopened.

re-merging PR1057 against other program state changes

* protect executables cache by a global mutex

* return ref to executables cache

* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime


[ROCm/clr commit: 05a1b696da]
2019-05-12 19:24:03 +05:30
Maneesh Gupta 30c7ed3e28 Merge pull request #1081 from mangupta/swdev-181624
Implement hipExtGetLinkTypeAndHopCount for ROCm devices

[ROCm/clr commit: c6c5e4cee8]
2019-05-07 16:15:41 +05:30
wkwchau 7eaaf6f1ae Return hipErrorInsufficientDriver status when CPU device not found (#1064)
* Return hipErrorInsufficientDriver status when CPU device not found - no exception thrown

* Return hipErrorInsufficientDriver status when CPU device not found


[ROCm/clr commit: ebf986dcee]
2019-05-07 15:58:25 +05:30
Rahul Garg 3f65bec096 Maintain HIP_VISIBLE_DEVICES for kernel launch
[ROCm/clr commit: 3be54a903c]
2019-05-07 05:09:02 +05:30
Maneesh Gupta f657eba4a5 Implement hipExtGetLinkTypeAndHopCount for ROCm devices
Change-Id: Ie5bb4f640ac6d189c7fceeab22627a7494fd10bd


[ROCm/clr commit: 2f43f110d9]
2019-05-06 15:54:31 +05:30
Sameer Sahasrabuddhe 4f69390332 minor cleanup: eliminate repetition
[ROCm/clr commit: c74a97f756]
2019-04-25 20:41:16 +05:30
Rahul Garg d69edbbb7f Add hipMallocManaged default functional support (#1036)
* Add hipMallocManaged default functional support

* Fix build error

* Add dtest


[ROCm/clr commit: 94769fc8dd]
2019-04-24 16:50:03 +05:30
Yaxun (Sam) Liu cb81018121 Fix missing arg in HIP_INIT_API
[ROCm/clr commit: 710e633bdd]
2019-04-18 16:18:31 -04:00
Maneesh Gupta dac817873f Merge pull request #1019 from scchan/lazy_binding
minor workaround for lazy binding

[ROCm/clr commit: 22660bed74]
2019-04-16 08:36:10 +05:30
Jeff Daily cf4e198a91 In hipFree, synchronize owner of memory (#1018)
* In hipFree, if memory is associated with a device, synchronize that device's streams.

This changes the behavior from synchronizing the currently set TLS device.

* All devices sync in hipFree for _appId=-1 case.

* Revert "All devices sync in hipFree for _appId=-1 case."

This reverts commit 1efb34d6a8426661e45bc5f763422a1147aeac10.

* add HIP_SYNC_FREE env var


[ROCm/clr commit: cf8fb43e6b]
2019-04-16 08:35:55 +05:30
Yaxun (Sam) Liu d8acabf24c Fix regression on multi-gpu due to PR#997
[ROCm/clr commit: 5c67ee11f4]
2019-04-05 22:54:41 -04:00
Siu Chi Chan f6837a4e7f minor workaround for lazy binding
[ROCm/clr commit: b5045af7e9]
2019-04-02 17:28:06 -04:00
Yaxun Sam Liu 12ac74bad1 hip-clang: fix kernel not found on multi-gpu
__hipRegisterFunction is called during by .init functions during program initialization.
It calls hipModuleGetFunction to locate kernel symbol in code objects. hipModuleGetFunction
assumes current device when locating kernel symbols. This works for HCC but not for hip-clang,
since hip-clang needs to locate kernel symbols for different devices without switching
between devices.

This patch introduces a new hsa agent parameter to ihipModuleGetFunction, which allows
__hipRegisterFunction to choose the correct hsa agent when locating kernel symbols. By
default it uses this_agent(), therefore this patch has no impact on HCC.


[ROCm/clr commit: 8f5c812a68]
2019-03-31 10:08:20 -04:00
Wen-Heng (Jack) Chung cfe930f9d6 Make hipModuleGetGlobal be in HIP runtime so it can be discovered at runtime (#981)
* Make hipModuleGetGlobal be in HIP runtime so it can be discovered at runtime

In HIP PR #929, quite a few HIP public APIs were made as inline functions with
hidden visibility. It was necessary to support applications with shared
libraries with GPU kernels launched via hipLaunchKernelGGL(), after HIP runtime
is initialized.

In empirical tests, the implementation has been proved to be a bit too
excessive, especially for hipModuleGetGlobal(). The function is used by another
type of client applications which relies on the existence of this function
within HIP runtime so global symbols from HSA code objects loaded dynamically
at runtime can be retrieved programmtically.

This commit moves hipModuleGetGlobal() back to src/hip_module.cpp, and makes it
visible and not inline, to fulfill requirements for applications
aforementioned. It does not change the behavior of applications depending on
hipLaunchKernelGGL().

* Add HIP_INIT_API into the implementation of hipModuleGetGlobal

Address review comments.

* Fix failing HIP unit tests


[ROCm/clr commit: 04915cea2f]
2019-03-29 03:45:04 +00:00
Maneesh Gupta d99bc4c540 Merge pull request #992 from gargrahul/handle_d2d_memcpy2d
Handle D2D in memcpy2D

[ROCm/clr commit: f9f4cee347]
2019-03-28 04:41:36 +00:00