提交图

375 次代码提交

作者 SHA1 备注 提交日期
ansurya 8e496c09d9 Add Max Texture 1D,2D,3D device properties (#1226)
* Add Max Texture 1D,2D,3D device properties

* Corrected testcase to use enums defined in hipDeviceAttribute_t

* Added texture 1D,2D and 3D support for NVIDIA path
2019-07-18 03:18:50 +00:00
Rahul Garg 1dcf618d20 Fix HIP_VISIBLE_DEVICES order (#1184)
* Fix HIP_VISIBLE_DEVICES order

* Fix device IDs mismatch

* Fix review comments- loop order and device range check

* Handle incomplete VISIBLE device env variable

* Revert "Handle incomplete VISIBLE device env variable"
2019-07-18 03:18:04 +00:00
Aryan Salmanpour 999f45fc11 [hip] Move _criticalData of ihipStream_t class to private section and use criticalData() to access it (#1177) 2019-07-04 00:42:19 +00:00
Aryan Salmanpour 96dc74897d [hip] implement the hipExtLaunchMultiKernelMultiDevice API (#1165)
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API

* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5

* modified code based on the requested changes

* changes to lock all streams before launching kernels for each device and unlock them after the dispatches

* check each stream to be valid before starting to lock all the streams
2019-06-20 05:59:05 +05:30
Siu Chi Chan 00824be34c move executable_cache into program_state.cpp 2019-05-24 17:27:25 -04:00
Maneesh Gupta 693bd556d4 Merge pull request #1083 from gargrahul/fix_hip_impl_visible_agents
Maintain HIP_VISIBLE_DEVICES for kernel launch
2019-05-13 14:20:18 +05:30
Siu Chi Chan f5eb91d53d migrate program_state logic from header into shared library (phase I) (#1077)
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""

This reverts commit a3d118eaa8.

* Revert "Use COMgr to read Kernel Args Metadata (#1006)"

This reverts commit 8a548bf40b.

* Revert "improve program state commentary"

This reverts commit 7aada87cbd.

* Revert "load program state once per agent"

This reverts commit c9117de8eb.

* start moving function_names() into the hip shared lib

* start moving code_object_blobs to a new "state" object

* Consolidate various program state related static objects into a
single program_state object

* minor clean up

* move more stuffs from functional_grid_launch into program_state

* debug make_kernarg

* moving lookup for kernargs size_align into program_state

* clean up old code for kernarg size and alignment

* update hip_module to use newer api in program_state

* Create public member functions for program_state

* move most program state functions into shared library

* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)

* turning free functions in program state into members of program_state_impl

* change the free function globals() into a member of program_state_impl

* replace the static mutex used for populating globals

* moving associate_code_object_symbols_with_host_allocation into
program_state_impl

* move load_code_object_and_freeze_executable into program_state_impl

* moving executables and functions_names into program_state_impl

* moving kernels() into program_state_impl

* moving functions() into program_state_impl

* move get_kernargs into program_state_impl

* moving kernel_descriptor into program_state_impl

* moving kernargs_size_align calculation into program_state_impl

* Changing the handle to program_state_impl to a pointer

* moving program_state_impl into a separate inline source file

* fixing/cleaning up some header file includes

* moving member function for kernargs_size_align into program_state.cpp

* moving Kernel_descriptor into program_state.inl

* add a new class to manage agent globals

* moving all agent globals processing functions into agent_globals_impl

* load program state once per agent

re-merging PR991 against other program state changes

* fix per-agent program state member initialization

* cache executables based on elf name, isa, and agent.

This avoids program state reloading executables after a shared library is dlopened.

re-merging PR1057 against other program state changes

* protect executables cache by a global mutex

* return ref to executables cache

* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime
2019-05-12 19:24:03 +05:30
wkwchau 29b3b46b42 Return hipErrorInsufficientDriver status when CPU device not found (#1064)
* Return hipErrorInsufficientDriver status when CPU device not found - no exception thrown

* Return hipErrorInsufficientDriver status when CPU device not found
2019-05-07 15:58:25 +05:30
Rahul Garg 620a07102d Maintain HIP_VISIBLE_DEVICES for kernel launch 2019-05-07 05:09:02 +05:30
Sameer Sahasrabuddhe abb9375707 minor cleanup: eliminate repetition 2019-04-25 20:41:16 +05:30
Jeff Daily 2b3037a6ea In hipFree, synchronize owner of memory (#1018)
* In hipFree, if memory is associated with a device, synchronize that device's streams.

This changes the behavior from synchronizing the currently set TLS device.

* All devices sync in hipFree for _appId=-1 case.

* Revert "All devices sync in hipFree for _appId=-1 case."

This reverts commit 1efb34d6a8426661e45bc5f763422a1147aeac10.

* add HIP_SYNC_FREE env var
2019-04-16 08:35:55 +05:30
Maneesh Gupta eb03d50de9 Merge pull request #962 from gargrahul/add_2d_copy_fallback
Add 2D fallback to use copy kernel
2019-03-25 07:46:43 +00:00
Rahul Garg 9bbfbceb64 2D Fallback needs hcc workweek 19101 or higher 2019-03-25 12:07:28 +05:30
Siu Chi Chan 24d08beef8 reimplement HIP_INIT as hip_impl::hip_init(), add hip_init() to some of the inlined API (#966)
* reimplement HIP_INIT as a function, expose it as hip_impl::hip_init()
so that it could be called from hipLaunchKernelGGL and other inlined
HIP functions

* Don't call hip_init from ihipPreLaunchKernel
2019-03-20 05:11:15 +00:00
Rahul Garg 918d7e3a40 Add 2D fallback to use copy kernel 2019-03-14 13:03:06 +05:30
Alex Voicu ea0fcf3e61 dlopen() fixes (#929)
* Initial attempt to switch over to internally linked state.

* Add missing CMake update.

* hipLaunchKernelGGLImpl must be inline as well. Ensure internal linkage.

* Ensure global retrieval uses internally linked state.

* Hide HC in the implementation. Minimise ADL woes.

* Strange software exists, and must be catered to.

* Use a less spammy mechanism for ensuring internal linkage / non-export.

* Remove leftover internal detail.
2019-03-06 17:31:44 +05:30
Maneesh Gupta 0dd26b4f63 Merge pull request #608 from gargrahul/add_pinned_2d_sdma_copy
Added support for pinned 2D SDMA copy
2018-12-12 07:44:16 +05:30
Maneesh Gupta 7ce082415b Merge pull request #773 from fronteer/master
Support of printing process ID for HIP tracing
2018-11-23 11:16:22 +05:30
Qianfeng Zhang 81cf7cabfa Add support of printing process ID for HIP Tracing 2018-11-22 18:58:06 +08:00
Evgeny e5ba097afd renaming HIP_INIT_CB_API to HIP_INIT_API 2018-11-13 15:33:26 +00:00
Evgeny b8b1637ef7 adding activity prof layer 2018-11-13 15:33:26 +00:00
Yaxun Sam Liu f5d8842f6a Add HIP_DUMP_CODE_OBJECT 2018-10-26 14:14:00 -04:00
Yaxun Sam Liu 1299b65e15 Add HIP_DB=fatbin for debugging fat binary issues 2018-08-17 11:53:45 -04:00
Rahul Garg 1e57764378 Added support for pinned 2D SDMA copy 2018-07-31 14:05:35 +05:30
Sarunya Pumma 8111fd3b8b Remove device mapping from shareWithAll memory
When shareWithAll memory (e.g., host memory) is allocated, set appId
in hc::AmPointerInfo to -1 to indicate that this memory is not mapped
to any device.  Peer checking in ihipStream_t::canSeeMemory is not
necessary if memory is shared with all devices.  Thus, it is skipped.

Note that earlier host memory is always mapped to device 0 and HIP
always performs peer checking for all kinds of hipMemcpy.  Since the
peer checking process requires context locking, hipMemcpy from/to host
memory always grabs device 0's context lock.  Therefore, if there is
another thread holding the context lock of device 0 (e.g.,
hipDeviceSynchronize on device 0), hipMemcpy will have to wait for the
lock until it can actually perform memcpy.  This can significantly
deteriorate execution performance.

Signed-off-by: Sarunya Pumma <sarunya.pumma@amd.com>
2018-07-28 23:15:16 -07:00
Maneesh Gupta 7311b60220 Merge pull request #491 from scchan/fix_wait
callback handling: don't need to wait for the thread to become ready
2018-06-06 14:38:25 +05:30
Siu Chi Chan a1f3b587fb remove the _ready flag in ihipStreamCallback_t and the mutex that protects it. 2018-06-04 17:29:04 -04:00
Rahul Garg 1a02bc364f Add integrated device property 2018-06-02 13:11:16 +05:30
Maneesh Gupta 1ba06f63c4 Apply .clangformat to all repo source files
Change-Id: I7e79c6058f0303f9a98911e3b7dd2e8596079344
2018-03-12 11:29:03 +05:30
Maneesh Gupta cebb070d30 Implement hipStreamAddCallback
Change-Id: Ib851e4d86ba9c8406ca37b88162ea483ccbc9d36
2017-12-19 16:06:14 +05:30
Ben Sander 9bba97fdcc Fix some cppcheck style issues. 2017-12-01 20:45:34 +00:00
Pierre 6baaed8e48 Fix missing MARKER_END
Logging status of hipCtxSynchronize was missing
Test if hip profiling is active for MARKER_END in ihipPostLaunchKernel
Add MARKER_END after the completion of a kernel launched through
the "grid launch"
2017-11-13 16:13:19 -05:00
Ben Sander 4a2e6f8955 Make hipEvent_t thread safe.
Support re-recording of same event by different threads.

- Add criticalData structure to hipEvent_t, similar to mechanism used
  for streams, contexts, device.  Events are always locked
  after streams to avoid deadlock.
- ihipEvent_t::locked_copyCrit can be used to copy critical state
  including marker.  The critical state in the event can then
  be re-recorded.
- refactor hipEventElapsedTime.  Remmove stale debug code, native signal
  refs.
2017-11-06 23:49:25 +00:00
Ben Sander 09d866a639 Merge pull request #237 from bensander/use_ctxptr_for_p2p
Use ctxptr for p2p
2017-11-01 18:55:25 +01:00
Ben Sander 7e908bdec8 Add ns-level timer for HIP API routines
Refactor some miuses of ihipLogStatus, these should only be in top-level
HIP APIs and should be paired with HIP_API_INIT calls.
2017-10-30 20:20:51 +00:00
Ben Sander 2e8ec71e40 Merge pull request #222 from bensander/fix_device_prop
Fix device prop
2017-10-30 17:58:48 +01:00
Ben Sander d610f16c47 Check for null copyEngine before looking at peers. 2017-10-30 16:58:03 +00:00
Ben Sander 7d30f32332 Fix bug with peer-to-peer combined with context API
- Store context inside the tracker rather than using int deviceID that
  was always mapped to primary context
- IsPeerWatcher now based on device IDs rather than specific peers.
2017-10-26 19:44:22 +00:00
Aditya Atluri 5d646d0fe3 Enhance debug for copy pointers
- show more pointer tracking fields
- show pointer info before and after "tailoring'
2017-10-26 19:44:22 +00:00
Siu Chi Chan 5b9ce032d6 replace __hcc_workweek__ with HC_FEATURE_PRINTF flag 2017-10-23 18:30:08 -04:00
Ben Sander dd24983571 Remove printf 2017-10-20 13:24:04 -07:00
Ben Sander acf89b43d4 Update device properties.
- clear properties to defined initial state.
- enable some property flags which are now supported.
2017-10-20 15:52:13 +00:00
Ben Sander c9f906c2ce Modify device properties to use pool API.
- Also better error code checking
2017-10-20 14:49:29 +00:00
Siu Chi Chan ccef1cbd6e hipDeviceReset(): make sure to reinitialize the printf buffer in hcc RT 2017-10-18 16:26:13 -04:00
Wen-Heng (Jack) Chung c74d3fe2cb Bump device major version from 2 to 3
This would significantly improve performance for certain apps in kernel
selection logic.
2017-09-15 15:47:39 +00:00
Ben Sander fbd22c3e49 Merge branch 'master' into hip_init_alloc 2017-09-14 11:53:33 -05:00
Ben Sander cea80cd8b3 Add HIP_INIT_ALLOC to init allocated memory. 2017-09-13 23:31:48 +00:00
Ben Sander 4ac6d643c1 hipStreamQuery uses av::is_empty. Add HIP_FORCE_NULL_STREAM. 2017-08-31 03:00:14 +00:00
Ben Sander 882dab4536 Refactor hipStreamWaitEvent
- Null streams use same flow as non-null.
- Add HIP_SYNC_STREAM_WAIT
- Resolve null stream.
2017-08-31 03:00:14 +00:00
Ben Sander 6ff74d0e97 Lock streams when waiting on event completion or querying event safety. 2017-08-28 18:40:16 -05:00