提交線圖

157 次程式碼提交

作者 SHA1 備註 日期
jglaser b5e683a35d Implement accurate max block size in hipFuncGetAttributes() (#1676)
This PR takes ensures that the maxThreadsPerBlock returned by hipFuncGetAttributes is both a multiple of the warp size and that the register usage of the maximum block does not exceed the number of available registers.

Fixes #1662
2020-03-18 11:20:06 +05:30
Joseph Greathouse bf04d7380a Fix errors in occupancy calculation function (#1926)
Fix two errors in hipOccupancyMaxActiveBlocksPerMultiprocessor.
1) Fix a possible segfault if the user passed in a null pointer for
   the numBlocks value.
2) Handle the situation when the user is asking for a block size
   that is larger than what the target device can hold within a
   single block.
2020-03-17 14:00:38 +05:30
Aryan Salmanpour b663fccf0b [HIP] return an error if blockDim exceeds maxThreadsPerBlock 2020-03-10 15:26:53 -04:00
Aryan Salmanpour 5494f5b247 [HIP] fix formatting/code clean up and fix a bug 2020-03-09 16:03:59 -04:00
Aryan Salmanpour 4844fbdf0a [HIP] Refactor cooperative APIs 2020-03-06 18:30:12 -05:00
Rahul Garg edc97f3073 Add hipDrvOccupancyMaxActiveBlocksPerMultiprocessor[WithFlags] (#1854)
Equivalent to cuOccupancyMaxActiveBlocksPerMultiprocessor[WithFlags].
2020-02-28 16:46:55 +05:30
Sarbojit2019 1109cbff83 [hip] Fix for bug introduced in #1770 when blockSize is non-power of 2 (#1864)
Fixes SWDEV-222161
2020-02-13 14:22:46 +05:30
Maneesh Gupta f8e1c01900 Revert "Match Occupancy APIs syntax with CUDA (#1625)" (#1857)
Reverting this for now till we figure out how to avoid the build
breakage.

This reverts commit fa98798b63.
2020-02-10 10:45:28 +05:30
Siu Chi Chan bff8e15e13 Fix C-style hipLaunchKernel (#1835)
* Fix bug in LaunchKernel test
Instead of passing the address of the gpu buffer, pass the address
of the pointer that holds the address of the gpu buffer

* Fix hipLaunchKernel's kernarg buffer construction.
The hipLaunchKernel implementation should rely on ihipModuleLaunchKernel
to construct the kernarg buffer correctly based on kernel metadata.

* Fix a bug in get_functions where the Kernel_descriptor wasn't constructed with the correct kernarg layout information.

* Fix a bug in kernarg layout parsing dealing with kernel without any arg

* teach ihipModuleLaunchKernel to handle kernel without any arg

* Add a more interesting test
2020-02-04 19:37:16 +05:30
Sarbojit2019 13316f724f Added overflow check in kernel launch (#1770) 2020-02-04 09:02:16 +05:30
satyanveshd fa98798b63 Match Occupancy APIs syntax with CUDA (#1625)
* Match Occupancy APIs syntax with CUDA and fix tests using these APIs
2020-01-29 13:05:53 -08:00
Siu Chi Chan f4555c835a Detect when an explicit printf buffer flush is required (#1766)
* Detect when an explicit printf buffer flush is required
in a device/stream synchronization function.

* hip_module.cpp: add missing hc_am.hpp header
2020-01-07 09:06:38 -08:00
Aryan Salmanpour 6968aeb841 [hip] refactoring cooperative kernel launch APIs (#1737)
This PR is a follow-up on PR# #1698 and it makes two more APIs (hipLaunchCooperativeKernel/hipLaunchCooperativeKernelMultiDevice) inline so that they can work correctly with lazy binding.
2019-12-30 12:42:17 +05:30
Alex Voicu 75a11330aa Fix late-coming issues. (#1724)
Implementation for hipMemcpyWithStream.
2019-12-23 19:11:24 +05:30
Aryan Salmanpour 68cc787781 [hip] refactoring hipExtLaunchMultiKernelMultiDevice API (#1698)
[Background] it was found that if lazy linking used for a library that calls hipExtLaunchMultiKernelMultiDevice API then this API can get the wrong program_state object for looking up device kernels leading to a "No device code available" error in this API.

To fix this issue, the API was refactored to be inline and get and pass the correct program_state to an internal hip API to request a multi-device kernel launch.
2019-12-04 11:50:51 +05:30
Rahul Garg 579a4f36fa Rename hip/hip_hcc.h to hip/hip_ext.h (#1341)
* Rename hip/hip_hcc.h to hip/hip_ext.h

* Deprecate hip_hcc.h
2019-11-07 13:17:10 +05:30
Rahul Garg e4a1e44162 Revert "Fix occupany APIs (#1560)"
This reverts commit af351d7e1b.
2019-10-29 11:41:08 -07:00
satyanveshd af351d7e1b Fix occupany APIs (#1560)
Addresses SWDEV-205006
2019-10-24 17:44:47 +05:30
searlmc1 c4a51f3679 Improve performance of v2 arg handling (#1539)
* Improve performance of v2 arg handling

* Missing change to `std::string`
2019-10-24 17:44:05 +05:30
Aryan Salmanpour 359dc79101 [hip] add support for implicit kernel argument for multi-grid sync (#1456)
* [hip] add support for implicit kernel argument for multi-grid sync

* modified code for calculating the prev_sum

* change the impCoopArg type to size_t

* add memory clean up

* launch init_gws and main kernels into two separate loops
2019-10-24 17:43:30 +05:30
Nick Curtis 73ca2b0083 Guard against division by zero for no VGPR usage (e.g., in an empty kernel) (#1528)
* guard against division by zero for no VGPR usage (e.g., in an empty kernel)

* fix bracket format

* clean up parenthesis
2019-10-16 10:49:56 +05:30
Siu Chi Chan dcf70ff9a2 fix kernel descriptor bug with code object v3
Change-Id: I9306b2baf36d338e36c5ab1226f74373a61a5ae0
2019-10-03 10:56:35 -04:00
Jeff Daily 56f67e5e36 hipModuleUnload should remove global variables from memtracker (#1464) 2019-09-30 10:41:20 +05:30
Aryan Salmanpour bac52d3729 [hip] add initial support for hipLaunchCooperativeKernelMultiDevice API (#1368)
* [hip] add initial support for hipLaunchCooperativeKernelMultiDevice API

* fix formatting
2019-09-16 08:31:17 +00:00
Sarbojit2019 5c4f78bac3 [HIP] Reclaiming hipLaunchKernel API (#1353)
* [HIP] Reclaiming hipLaunchKernel API

* Reclaiming hipLaunchKernel : Incorporated review comments

* Incorporated review comments

* Removed hipLaunchKernel Macro from nvcc path
2019-08-29 01:02:41 +00:00
Aryan Salmanpour 5066700ace [hip] add initial implementation for hipLaunchCooperativeKernel API (#1339)
* [hip] add initial implementation for hipLaunchCooperativeKernel API

* [hip] use total number of work groups to initialize the GWS resource

* [hip] use only one argument for init_gws kernel

* [hip] use the device associated with the stream for checking the device properties
2019-08-23 09:19:35 +00:00
Rahul Garg 3dd0e988b1 Fix undefined identifier issue for hipExtModuleLaunchKernel 2019-08-14 16:46:32 -04:00
Rahul Garg 6ce86f409d Add support for hipFuncGetAttribute (#1279)
* Add support for hipFunGetAttribute

* Support NVCC path

* Test using sample module_api_global

* Try fixing CI build failure due to hip_prof_gen scan

* Fix for CI build issue

* Resolve conflict

* Rebase and resolve conflicts with master

* Fix build error

* Fix NVCC path build error
2019-08-08 08:27:41 +00:00
Jeff Daily 1eb3dbf065 consolidate thread local storage (#915)
* all thread local access now through single struct

* clean up old commented-out code, more use of GET_TLS()

* fewer calls to GET_TLS by passing tls as a funtion argument

* revert unnecessary change to printf

* fix failing tests due to TLS change

* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor
2019-08-05 09:51:02 +00:00
Rahul Garg 474bf0effc Fix missing logstatus in hipFuncGetAttributes 2019-08-02 11:51:34 +05:30
wkwchau 4b18b321f7 Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOc… (#1240)
* Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags APIs

* Taking into account of SGPR usage to determine the max active blocks in hipOccupancyMaxActiveBlocksPerMultiprocessor()
2019-08-01 08:58:48 +00:00
Rahul Garg b9e6d72ee6 Add hip init in hipExtLaunchMultiKernelMultiDevice (#1263)
* Add hip init in hipExtLaunchMultiKernelMultiDevice

* Add more logstatus for multiple return paths

* Fix missing i in function name
2019-07-31 15:42:29 +00:00
Rahul Garg 0517c30507 Add HIP init in hipFuncGetAttributes (#1262)
* Add HIP init in hipFuncGetAttributes

* [dtest]Remove explicit hip init call in hipFuncGetAttributes dtest
2019-07-31 15:42:08 +00:00
cdevadas d5dba47804 Increased the number of implicit-kernarg bytes to 56 (#1217) 2019-07-19 04:45:34 +00:00
wkwchau 38254caf7a Fixed bug of determine max block size in hipOccupancyMaxPotentialBlockSize (#1235) 2019-07-18 03:19:29 +00:00
wkwchau 47f16264ed Fixed bug in hipOccupancyMaxPotentialBlockSize for the SGPRs limitation of gfx8 devices (#1176) 2019-06-26 15:18:00 +05:30
Aryan Salmanpour 96dc74897d [hip] implement the hipExtLaunchMultiKernelMultiDevice API (#1165)
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API

* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5

* modified code based on the requested changes

* changes to lock all streams before launching kernels for each device and unlock them after the dispatches

* check each stream to be valid before starting to lock all the streams
2019-06-20 05:59:05 +05:30
wkwchau d492f1fd6b Implement the hipOccupancyMaxPotentialBlockSize function (#1162)
* Implement the hipOccupancyMaxPotentialBlockSize function

* Replaced hipGetDeviceProperties() call by ihipGetDeviceProperties() in ihipOccupancyMaxPotentialBlockSize()

* Add test for hipOccupancyMaxPotentialBlockSize in Module API

* Added extern declaration for ihipGetDeviceProperties() to be accessed inside ihipOccupancyMaxPotentialBlockSize()

* fixed hipOccupancyMaxPotentialBlockSize test build issue

* Fix hipOccupancyMaxPotentialBlockSize dtest

* Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest

* Revert "Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest"

This reverts commit 0480ff56f1441fc515d2c26ce33783e303423938.

* Disable hipOccupancyMaxPotentialBlockSize dtest on NVCC

* move extern declaration of ihipGetDeviceProperties to hip_module.cpp

* Update the limiation of 32 wavefronts per CU and 800/512 SGPRs for VI/pre-VI chips to calculate the occupancy
2019-06-20 05:58:29 +05:30
Rahul Garg bc528b1e8b HACK for SWDEV-173477/SWDEV-190701 2019-06-13 18:15:31 -07:00
Maneesh Gupta 7013f87885 Merge pull request #1140 from scchan/program_state_stage_2-rebase-20190524
migrate more program_state logic from header into shared library (phase II)
2019-06-05 16:09:01 +05:30
Maneesh Gupta 080e2c16ec Merge branch 'master' into implicit-kernarg 2019-06-04 13:24:19 +05:30
Maneesh Gupta c99d011898 Merge pull request #1155 from gargrahul/fix_kernel_lp_dim_trace
Fix wrong grid dim shown in trace
2019-06-04 13:21:39 +05:30
cdevadas 9c03a5f948 Runtime changes to append implicit kernel arguments.
Appended 48 empty bytes to the kernarg area at runtime. The implicit arguments are enabled primarily for the hostcall services
and it is completely abstracted from the user code. Enabled it for both hip-clang and hip-hcc.
2019-06-04 10:45:49 +05:30
Rahul Garg a489f583bb Fix wrong grid dim shown in trace 2019-05-31 22:30:24 +05:30
Siu Chi Chan 80fec2b477 remove executables() from program_state 2019-05-24 17:27:01 -04:00
Siu Chi Chan 4239cfcf02 moving agent_globals_impl into hip_module 2019-05-24 16:43:38 -04:00
Laurent Morichetti 73f931bdbd Add support for code object v3
Use the code object manager library to parse the code object metadata. Both
code object v2 and v3 formats are now supported for HCC generated binaries.
2019-05-23 18:03:32 -07:00
Alex Voicu ccfb764a59 Add HIPRTC, glorious ersatz for NVRTC (#1097)
* Add ersatz for NVRTC.

* Fix extraneous paren and use correct namespace.

* Use lowerCamelCase (yuck, yuck) consistently.

* Link against FS when building hiprtc lib.

* Correctly mark Manipulators. Fix dual compile.

* Add unit tests. Extend HIT to accept linker options.

* Make sure the HIPRTC library is installed.

* Better logging. Try to auto-detect the target.

* Stop specifying the target explicitly.

* Add missing flavour of `hipModuleLaunchKernel`.

* Program was already destroyed.

* Don't use `--genco`. Fix mangled name trimming.

* Fix HIPRTC breakage due to upstream noise.

* [dtests] Replace RUN -> TEST in hiprtc tests

Change-Id: Ie499e92dfe4e5c94634b1c2b76cf52d241bcfea3

* [hit] Set HIP_PATH to HIP_ROOT_DIR for all tests

Change-Id: Ib0ad1f99bc71c03e363e055dd508a7a4a210680a
2019-05-16 18:28:54 +05:30
Siu Chi Chan f5eb91d53d migrate program_state logic from header into shared library (phase I) (#1077)
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""

This reverts commit a3d118eaa8.

* Revert "Use COMgr to read Kernel Args Metadata (#1006)"

This reverts commit 8a548bf40b.

* Revert "improve program state commentary"

This reverts commit 7aada87cbd.

* Revert "load program state once per agent"

This reverts commit c9117de8eb.

* start moving function_names() into the hip shared lib

* start moving code_object_blobs to a new "state" object

* Consolidate various program state related static objects into a
single program_state object

* minor clean up

* move more stuffs from functional_grid_launch into program_state

* debug make_kernarg

* moving lookup for kernargs size_align into program_state

* clean up old code for kernarg size and alignment

* update hip_module to use newer api in program_state

* Create public member functions for program_state

* move most program state functions into shared library

* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)

* turning free functions in program state into members of program_state_impl

* change the free function globals() into a member of program_state_impl

* replace the static mutex used for populating globals

* moving associate_code_object_symbols_with_host_allocation into
program_state_impl

* move load_code_object_and_freeze_executable into program_state_impl

* moving executables and functions_names into program_state_impl

* moving kernels() into program_state_impl

* moving functions() into program_state_impl

* move get_kernargs into program_state_impl

* moving kernel_descriptor into program_state_impl

* moving kernargs_size_align calculation into program_state_impl

* Changing the handle to program_state_impl to a pointer

* moving program_state_impl into a separate inline source file

* fixing/cleaning up some header file includes

* moving member function for kernargs_size_align into program_state.cpp

* moving Kernel_descriptor into program_state.inl

* add a new class to manage agent globals

* moving all agent globals processing functions into agent_globals_impl

* load program state once per agent

re-merging PR991 against other program state changes

* fix per-agent program state member initialization

* cache executables based on elf name, isa, and agent.

This avoids program state reloading executables after a shared library is dlopened.

re-merging PR1057 against other program state changes

* protect executables cache by a global mutex

* return ref to executables cache

* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime
2019-05-12 19:24:03 +05:30
Yaxun (Sam) Liu bb5c620b13 Fix missing arg in HIP_INIT_API 2019-04-18 16:18:31 -04:00