커밋 그래프

147 커밋

작성자 SHA1 메시지 날짜
satyanveshd e38db9fb6f Match Occupancy APIs syntax with CUDA (#1625)
* Match Occupancy APIs syntax with CUDA and fix tests using these APIs


[ROCm/hip commit: fa98798b63]
2020-01-29 13:05:53 -08:00
Siu Chi Chan fcf07e0b04 Detect when an explicit printf buffer flush is required (#1766)
* Detect when an explicit printf buffer flush is required
in a device/stream synchronization function.

* hip_module.cpp: add missing hc_am.hpp header


[ROCm/hip commit: f4555c835a]
2020-01-07 09:06:38 -08:00
Aryan Salmanpour ffea90f865 [hip] refactoring cooperative kernel launch APIs (#1737)
This PR is a follow-up on PR# #1698 and it makes two more APIs (hipLaunchCooperativeKernel/hipLaunchCooperativeKernelMultiDevice) inline so that they can work correctly with lazy binding.

[ROCm/hip commit: 6968aeb841]
2019-12-30 12:42:17 +05:30
Alex Voicu 1f5ecc0f6a Fix late-coming issues. (#1724)
Implementation for hipMemcpyWithStream.


[ROCm/hip commit: 75a11330aa]
2019-12-23 19:11:24 +05:30
Aryan Salmanpour abe7531676 [hip] refactoring hipExtLaunchMultiKernelMultiDevice API (#1698)
[Background] it was found that if lazy linking used for a library that calls hipExtLaunchMultiKernelMultiDevice API then this API can get the wrong program_state object for looking up device kernels leading to a "No device code available" error in this API.

To fix this issue, the API was refactored to be inline and get and pass the correct program_state to an internal hip API to request a multi-device kernel launch.

[ROCm/hip commit: 68cc787781]
2019-12-04 11:50:51 +05:30
Rahul Garg 6968362d99 Rename hip/hip_hcc.h to hip/hip_ext.h (#1341)
* Rename hip/hip_hcc.h to hip/hip_ext.h

* Deprecate hip_hcc.h


[ROCm/hip commit: 579a4f36fa]
2019-11-07 13:17:10 +05:30
Rahul Garg 70449cfa92 Revert "Fix occupany APIs (#1560)"
This reverts commit 4f23f9cb18.


[ROCm/hip commit: e4a1e44162]
2019-10-29 11:41:08 -07:00
satyanveshd 4f23f9cb18 Fix occupany APIs (#1560)
Addresses SWDEV-205006 

[ROCm/hip commit: af351d7e1b]
2019-10-24 17:44:47 +05:30
searlmc1 4d668d5a52 Improve performance of v2 arg handling (#1539)
* Improve performance of v2 arg handling

* Missing change to `std::string`


[ROCm/hip commit: c4a51f3679]
2019-10-24 17:44:05 +05:30
Aryan Salmanpour 9ab561dd66 [hip] add support for implicit kernel argument for multi-grid sync (#1456)
* [hip] add support for implicit kernel argument for multi-grid sync

* modified code for calculating the prev_sum

* change the impCoopArg type to size_t

* add memory clean up

* launch init_gws and main kernels into two separate loops


[ROCm/hip commit: 359dc79101]
2019-10-24 17:43:30 +05:30
Nick Curtis d2e9718d23 Guard against division by zero for no VGPR usage (e.g., in an empty kernel) (#1528)
* guard against division by zero for no VGPR usage (e.g., in an empty kernel)

* fix bracket format

* clean up parenthesis


[ROCm/hip commit: 73ca2b0083]
2019-10-16 10:49:56 +05:30
Siu Chi Chan 0f9074b568 fix kernel descriptor bug with code object v3
Change-Id: I9306b2baf36d338e36c5ab1226f74373a61a5ae0


[ROCm/hip commit: dcf70ff9a2]
2019-10-03 10:56:35 -04:00
Jeff Daily dcd73a1a87 hipModuleUnload should remove global variables from memtracker (#1464)
[ROCm/hip commit: 56f67e5e36]
2019-09-30 10:41:20 +05:30
Aryan Salmanpour 9e9a505b39 [hip] add initial support for hipLaunchCooperativeKernelMultiDevice API (#1368)
* [hip] add initial support for hipLaunchCooperativeKernelMultiDevice API

* fix formatting


[ROCm/hip commit: bac52d3729]
2019-09-16 08:31:17 +00:00
Sarbojit2019 74a3171c6b [HIP] Reclaiming hipLaunchKernel API (#1353)
* [HIP] Reclaiming hipLaunchKernel API

* Reclaiming hipLaunchKernel : Incorporated review comments

* Incorporated review comments

* Removed hipLaunchKernel Macro from nvcc path


[ROCm/hip commit: 5c4f78bac3]
2019-08-29 01:02:41 +00:00
Aryan Salmanpour 0fc745b3a6 [hip] add initial implementation for hipLaunchCooperativeKernel API (#1339)
* [hip] add initial implementation for hipLaunchCooperativeKernel API

* [hip] use total number of work groups to initialize the GWS resource

* [hip] use only one argument for init_gws kernel

* [hip] use the device associated with the stream for checking the device properties


[ROCm/hip commit: 5066700ace]
2019-08-23 09:19:35 +00:00
Rahul Garg 3c8f84a5c3 Fix undefined identifier issue for hipExtModuleLaunchKernel
[ROCm/hip commit: 3dd0e988b1]
2019-08-14 16:46:32 -04:00
Rahul Garg d429ba57e1 Add support for hipFuncGetAttribute (#1279)
* Add support for hipFunGetAttribute

* Support NVCC path

* Test using sample module_api_global

* Try fixing CI build failure due to hip_prof_gen scan

* Fix for CI build issue

* Resolve conflict

* Rebase and resolve conflicts with master

* Fix build error

* Fix NVCC path build error


[ROCm/hip commit: 6ce86f409d]
2019-08-08 08:27:41 +00:00
Jeff Daily 9b44993343 consolidate thread local storage (#915)
* all thread local access now through single struct

* clean up old commented-out code, more use of GET_TLS()

* fewer calls to GET_TLS by passing tls as a funtion argument

* revert unnecessary change to printf

* fix failing tests due to TLS change

* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor


[ROCm/hip commit: 1eb3dbf065]
2019-08-05 09:51:02 +00:00
Rahul Garg 8b597565c4 Fix missing logstatus in hipFuncGetAttributes
[ROCm/hip commit: 474bf0effc]
2019-08-02 11:51:34 +05:30
wkwchau 7676b86f12 Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOc… (#1240)
* Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags APIs

* Taking into account of SGPR usage to determine the max active blocks in hipOccupancyMaxActiveBlocksPerMultiprocessor()


[ROCm/hip commit: 4b18b321f7]
2019-08-01 08:58:48 +00:00
Rahul Garg 617a6d43dc Add hip init in hipExtLaunchMultiKernelMultiDevice (#1263)
* Add hip init in hipExtLaunchMultiKernelMultiDevice

* Add more logstatus for multiple return paths

* Fix missing i in function name


[ROCm/hip commit: b9e6d72ee6]
2019-07-31 15:42:29 +00:00
Rahul Garg d7973153ca Add HIP init in hipFuncGetAttributes (#1262)
* Add HIP init in hipFuncGetAttributes

* [dtest]Remove explicit hip init call in hipFuncGetAttributes dtest


[ROCm/hip commit: 0517c30507]
2019-07-31 15:42:08 +00:00
cdevadas a02f3a3655 Increased the number of implicit-kernarg bytes to 56 (#1217)
[ROCm/hip commit: d5dba47804]
2019-07-19 04:45:34 +00:00
wkwchau e61b8cec28 Fixed bug of determine max block size in hipOccupancyMaxPotentialBlockSize (#1235)
[ROCm/hip commit: 38254caf7a]
2019-07-18 03:19:29 +00:00
wkwchau 3c963cc0e1 Fixed bug in hipOccupancyMaxPotentialBlockSize for the SGPRs limitation of gfx8 devices (#1176)
[ROCm/hip commit: 47f16264ed]
2019-06-26 15:18:00 +05:30
Aryan Salmanpour 45fa752888 [hip] implement the hipExtLaunchMultiKernelMultiDevice API (#1165)
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API

* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5

* modified code based on the requested changes

* changes to lock all streams before launching kernels for each device and unlock them after the dispatches

* check each stream to be valid before starting to lock all the streams


[ROCm/hip commit: 96dc74897d]
2019-06-20 05:59:05 +05:30
wkwchau 40bd111519 Implement the hipOccupancyMaxPotentialBlockSize function (#1162)
* Implement the hipOccupancyMaxPotentialBlockSize function

* Replaced hipGetDeviceProperties() call by ihipGetDeviceProperties() in ihipOccupancyMaxPotentialBlockSize()

* Add test for hipOccupancyMaxPotentialBlockSize in Module API

* Added extern declaration for ihipGetDeviceProperties() to be accessed inside ihipOccupancyMaxPotentialBlockSize()

* fixed hipOccupancyMaxPotentialBlockSize test build issue

* Fix hipOccupancyMaxPotentialBlockSize dtest

* Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest

* Revert "Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest"

This reverts commit 0480ff56f1441fc515d2c26ce33783e303423938.

* Disable hipOccupancyMaxPotentialBlockSize dtest on NVCC

* move extern declaration of ihipGetDeviceProperties to hip_module.cpp

* Update the limiation of 32 wavefronts per CU and 800/512 SGPRs for VI/pre-VI chips to calculate the occupancy


[ROCm/hip commit: d492f1fd6b]
2019-06-20 05:58:29 +05:30
Rahul Garg 884d0fef76 HACK for SWDEV-173477/SWDEV-190701
[ROCm/hip commit: bc528b1e8b]
2019-06-13 18:15:31 -07:00
Maneesh Gupta 58caf3c615 Merge pull request #1140 from scchan/program_state_stage_2-rebase-20190524
migrate more program_state logic from header into shared library (phase II)

[ROCm/hip commit: 7013f87885]
2019-06-05 16:09:01 +05:30
Maneesh Gupta cedc88d40f Merge branch 'master' into implicit-kernarg
[ROCm/hip commit: 080e2c16ec]
2019-06-04 13:24:19 +05:30
Maneesh Gupta 40076bca45 Merge pull request #1155 from gargrahul/fix_kernel_lp_dim_trace
Fix wrong grid dim shown in trace

[ROCm/hip commit: c99d011898]
2019-06-04 13:21:39 +05:30
cdevadas 5dac708dbb Runtime changes to append implicit kernel arguments.
Appended 48 empty bytes to the kernarg area at runtime. The implicit arguments are enabled primarily for the hostcall services
and it is completely abstracted from the user code. Enabled it for both hip-clang and hip-hcc.


[ROCm/hip commit: 9c03a5f948]
2019-06-04 10:45:49 +05:30
Rahul Garg ccd7b1f120 Fix wrong grid dim shown in trace
[ROCm/hip commit: a489f583bb]
2019-05-31 22:30:24 +05:30
Siu Chi Chan 44943f5cd9 remove executables() from program_state
[ROCm/hip commit: 80fec2b477]
2019-05-24 17:27:01 -04:00
Siu Chi Chan b9b076a958 moving agent_globals_impl into hip_module
[ROCm/hip commit: 4239cfcf02]
2019-05-24 16:43:38 -04:00
Laurent Morichetti 03fec15b7c Add support for code object v3
Use the code object manager library to parse the code object metadata. Both
code object v2 and v3 formats are now supported for HCC generated binaries.


[ROCm/hip commit: 73f931bdbd]
2019-05-23 18:03:32 -07:00
Alex Voicu d5a3acfd69 Add HIPRTC, glorious ersatz for NVRTC (#1097)
* Add ersatz for NVRTC.

* Fix extraneous paren and use correct namespace.

* Use lowerCamelCase (yuck, yuck) consistently.

* Link against FS when building hiprtc lib.

* Correctly mark Manipulators. Fix dual compile.

* Add unit tests. Extend HIT to accept linker options.

* Make sure the HIPRTC library is installed.

* Better logging. Try to auto-detect the target.

* Stop specifying the target explicitly.

* Add missing flavour of `hipModuleLaunchKernel`.

* Program was already destroyed.

* Don't use `--genco`. Fix mangled name trimming.

* Fix HIPRTC breakage due to upstream noise.

* [dtests] Replace RUN -> TEST in hiprtc tests

Change-Id: Ie499e92dfe4e5c94634b1c2b76cf52d241bcfea3

* [hit] Set HIP_PATH to HIP_ROOT_DIR for all tests

Change-Id: Ib0ad1f99bc71c03e363e055dd508a7a4a210680a


[ROCm/hip commit: ccfb764a59]
2019-05-16 18:28:54 +05:30
Siu Chi Chan d0252dfa79 migrate program_state logic from header into shared library (phase I) (#1077)
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""

This reverts commit 62e96cb4cf.

* Revert "Use COMgr to read Kernel Args Metadata (#1006)"

This reverts commit 882006555b.

* Revert "improve program state commentary"

This reverts commit fb2beb0c88.

* Revert "load program state once per agent"

This reverts commit 21f5e142f5.

* start moving function_names() into the hip shared lib

* start moving code_object_blobs to a new "state" object

* Consolidate various program state related static objects into a
single program_state object

* minor clean up

* move more stuffs from functional_grid_launch into program_state

* debug make_kernarg

* moving lookup for kernargs size_align into program_state

* clean up old code for kernarg size and alignment

* update hip_module to use newer api in program_state

* Create public member functions for program_state

* move most program state functions into shared library

* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)

* turning free functions in program state into members of program_state_impl

* change the free function globals() into a member of program_state_impl

* replace the static mutex used for populating globals

* moving associate_code_object_symbols_with_host_allocation into
program_state_impl

* move load_code_object_and_freeze_executable into program_state_impl

* moving executables and functions_names into program_state_impl

* moving kernels() into program_state_impl

* moving functions() into program_state_impl

* move get_kernargs into program_state_impl

* moving kernel_descriptor into program_state_impl

* moving kernargs_size_align calculation into program_state_impl

* Changing the handle to program_state_impl to a pointer

* moving program_state_impl into a separate inline source file

* fixing/cleaning up some header file includes

* moving member function for kernargs_size_align into program_state.cpp

* moving Kernel_descriptor into program_state.inl

* add a new class to manage agent globals

* moving all agent globals processing functions into agent_globals_impl

* load program state once per agent

re-merging PR991 against other program state changes

* fix per-agent program state member initialization

* cache executables based on elf name, isa, and agent.

This avoids program state reloading executables after a shared library is dlopened.

re-merging PR1057 against other program state changes

* protect executables cache by a global mutex

* return ref to executables cache

* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime


[ROCm/hip commit: f5eb91d53d]
2019-05-12 19:24:03 +05:30
Yaxun (Sam) Liu d4bce6c019 Fix missing arg in HIP_INIT_API
[ROCm/hip commit: bb5c620b13]
2019-04-18 16:18:31 -04:00
Yaxun (Sam) Liu cf4bdb8b55 Fix regression on multi-gpu due to PR#997
[ROCm/hip commit: 271fdc4e4d]
2019-04-05 22:54:41 -04:00
Yaxun Sam Liu 5072c98f32 hip-clang: fix kernel not found on multi-gpu
__hipRegisterFunction is called during by .init functions during program initialization.
It calls hipModuleGetFunction to locate kernel symbol in code objects. hipModuleGetFunction
assumes current device when locating kernel symbols. This works for HCC but not for hip-clang,
since hip-clang needs to locate kernel symbols for different devices without switching
between devices.

This patch introduces a new hsa agent parameter to ihipModuleGetFunction, which allows
__hipRegisterFunction to choose the correct hsa agent when locating kernel symbols. By
default it uses this_agent(), therefore this patch has no impact on HCC.


[ROCm/hip commit: 98b9e92908]
2019-03-31 10:08:20 -04:00
Wen-Heng (Jack) Chung 1cc94f9369 Make hipModuleGetGlobal be in HIP runtime so it can be discovered at runtime (#981)
* Make hipModuleGetGlobal be in HIP runtime so it can be discovered at runtime

In HIP PR #929, quite a few HIP public APIs were made as inline functions with
hidden visibility. It was necessary to support applications with shared
libraries with GPU kernels launched via hipLaunchKernelGGL(), after HIP runtime
is initialized.

In empirical tests, the implementation has been proved to be a bit too
excessive, especially for hipModuleGetGlobal(). The function is used by another
type of client applications which relies on the existence of this function
within HIP runtime so global symbols from HSA code objects loaded dynamically
at runtime can be retrieved programmtically.

This commit moves hipModuleGetGlobal() back to src/hip_module.cpp, and makes it
visible and not inline, to fulfill requirements for applications
aforementioned. It does not change the behavior of applications depending on
hipLaunchKernelGGL().

* Add HIP_INIT_API into the implementation of hipModuleGetGlobal

Address review comments.

* Fix failing HIP unit tests


[ROCm/hip commit: 4b7177ac42]
2019-03-29 03:45:04 +00:00
Jeff Daily 21f5e142f5 load program state once per agent
[ROCm/hip commit: c9117de8eb]
2019-03-27 18:19:10 +00:00
Evgeny 36b5313d65 tracing callback layer update
[ROCm/hip commit: 31475c5ac8]
2019-03-14 22:43:52 -05:00
Siu Chi Chan 15061ddfcc Fix memory leak introduced by previous change to Agent_global.
Make Agent_global manage the lifetime of the name string


[ROCm/hip commit: bf1d48bf78]
2019-03-11 19:51:32 +00:00
Aaron Enye Shi 1e07be3ab3 Fix Agent_global variables failing hipTestDeviceSymbol
Issue: Header uses std::vector<Agent_global> agent_globals which is created by hip_module.cpp
  - Move iterator fails to copy Agent_global from library source into header version
  - Due to different versions of std::string name in struct Agent_global
Fix: Change Agent_global to use char* name instead of std::string name


[ROCm/hip commit: 00d24d254d]
2019-03-11 19:51:25 +00:00
Aaron Enye Shi 4b87bd25e8 Fix hash_for undefined reference in hipTestConstant test
Issue: mismatch undefined symbols in different user env
  - Binary expects modified return value std::string&
  - Fails to match libhip_hcc.so: return value is std::string& but doesn't match modified C++ env
Fix: Change return value to char*, create new key std::string in header from char*


[ROCm/hip commit: 23e9968752]
2019-03-11 19:51:18 +00:00
Maneesh Gupta 3f5e937afc Merge pull request #949 from gargrahul/single_stream_concurrent_kernels
Add extension for kernel concurrency on same stream

[ROCm/hip commit: 352b17346c]
2019-03-06 17:34:54 +05:30
Alex Voicu 0c16497abd dlopen() fixes (#929)
* Initial attempt to switch over to internally linked state.

* Add missing CMake update.

* hipLaunchKernelGGLImpl must be inline as well. Ensure internal linkage.

* Ensure global retrieval uses internally linked state.

* Hide HC in the implementation. Minimise ADL woes.

* Strange software exists, and must be catered to.

* Use a less spammy mechanism for ensuring internal linkage / non-export.

* Remove leftover internal detail.


[ROCm/hip commit: ea0fcf3e61]
2019-03-06 17:31:44 +05:30