Γράφημα Υποβολών

410 Υποβολές

Συγγραφέας SHA1 Μήνυμα Ημερομηνία
Alex Xie d890d77da4 SWDEV-221166 - Detect support for large bar access through HIP runtime API
Change-Id: Iaa9756c1b5e40c1ab5afb38e44a6699fa5f6c13f
2020-05-01 20:39:52 -04:00
Jeff Daily ef596cd088 add IPC event support (#1996) 2020-04-17 10:31:22 +05:30
Aryan Salmanpour cf8589b8c8 [HIP] add support for NoPreSync/NoPostSync flags for Cooperative MultiDevice launch API (#1990) 2020-04-13 14:02:52 +05:30
Jeff Daily 01d661b159 fix hipStreamAddCallback, block future work on stream (#1934) 2020-03-19 16:16:04 +05:30
Joseph Greathouse 18e6c529bc Fix detection of support for cooperative groups (#1932)
Query ROCr to see if we have the proper lower-level support for
cooperative groups -- GWS support through the firmware, driver,
thunk, and ROCr. ROCr does these checks for us, and presents a
query that allows us to see if GWS entries are available for use.
If so, then we have all the lower-level technologies needed, and
we should enable cooperative groups support for HIP.
2020-03-17 14:01:44 +05:30
Joseph Greathouse 55e55e78bb Fix maxSharedMemoryPerMultiProcessor attribute (#1927)
The maxSharedMemoryPerMultiProcessor attribute is meant to describe
the number of bytes of shared memory (LDS space in AMD terminology)
in each SM (CU in AMD terminology). For instance, on AMD GPUs this
is often 64KB per CU, and some Nvidia GPUs it's 96KB per SM.

This shared memory is a different address space from the normal
global memory. However, the current HIP-HCC properties fill this
in with a size that matches the totalGlboalMem property. This gives
a drastically too-high calculation for the amount of LDS space that
each CU has -- tens of GBs vs. 10s of KBs.

This patch fixes this by pulling the maxSharedMemoryPerMultiProcessor
property from the HSA pool that describes how much workgroup-local
space is available on each CU. The HSA runtime eventually pulls
this from the topology information about LDSSizeInKB, defined as
"Size of Local Data Store in Kilobytes per SIMD".

Previously, this HSA query was used to fill in the value of the
sharedMemPerBlock property. On today's AMD GPUs, we know that
the amount of LDS avaialble to the workgroup is identical to the
amount of LDS space in the CU. However, in the future this may
differ. As such, this patch changes around the order and fills
in the "PerMultiProcessor" property from the HSA query (since
what's what the query is defined to return), and then separately
fills in the "PerBlock" property as we know it.
2020-03-17 14:00:51 +05:30
Evgeny Mankov 821c60a3d9 Merge pull request #1916 from asalmanp/refactor_cooperative_APIs
[HIP] Refactor cooperative APIs
2020-03-12 19:12:50 +03:00
Aryan Salmanpour 5494f5b247 [HIP] fix formatting/code clean up and fix a bug 2020-03-09 16:03:59 -04:00
Aryan Salmanpour 4844fbdf0a [HIP] Refactor cooperative APIs 2020-03-06 18:30:12 -05:00
Aryan Salmanpour 03797ae986 [HIP] add hip specific properties for cooperative kernel multi device 2020-03-03 13:25:36 -05:00
Rahul Garg 6c5fa32815 Remove deprecated HIP markers (#1876) 2020-02-28 16:47:15 +05:30
ansurya 8c6934223b Reduce GPU copying based on arch it runs on (#1751)
Implements SWDEV-213230.
2020-02-13 14:21:51 +05:30
Aryan Salmanpour 6e867eacb6 [HIP][HIPIFY] Add some missing flags for cooperative launch and occupancy APIs 2020-01-30 15:05:53 -05:00
vsytch f72a669487 Add missing texturePitchAlignment member to the hipDeviceProp_t struct. (#1802)
* Add missing texturePitchAlignment member to the hipDeviceProp_t struct.

* Add missing hipDeviceAttributeTexturePitchAlignment enumerator to the hipDeviceAttribute_t enum.

* Initialize texturePitchAlignment to 256. This works for gfx9+, but is technically overaligned in most cases for pre-gfx9.

* Add the texturePitchAlignment property to the NVCC path.
2020-01-27 16:37:00 -08:00
Siu Chi Chan f4555c835a Detect when an explicit printf buffer flush is required (#1766)
* Detect when an explicit printf buffer flush is required
in a device/stream synchronization function.

* hip_module.cpp: add missing hc_am.hpp header
2020-01-07 09:06:38 -08:00
Evgeny Mankov 0dadb23327 Merge pull request #1759 from emankov/master
[HIP] Unify hipError_t (Step 2)
2019-12-30 19:21:09 +03:00
Evgeny Mankov 4921678b6c [HIP] Clean-up deprecated HIP error codes
hipErrorMemoryAllocation -> hipErrorOutOfMemory
hipErrorInitializationError -> hipErrorNotInitialized
hipErrorMapBufferObjectFailed -> hipErrorMapFailed
hipErrorInvalidResourceHandle -> hipErrorInvalidHandle
2019-12-23 17:01:35 +03:00
Alex Voicu 75a11330aa Fix late-coming issues. (#1724)
Implementation for hipMemcpyWithStream.
2019-12-23 19:11:24 +05:30
Sarbojit2019 153a959280 Revert [HIP] Fixed hipStreamAddCallback (#1674)
This reverts commit 45613311d7.
Addresses SWDEV#212675.
2019-11-20 11:55:46 +05:30
Jeff Daily 3a7eb694f5 hipStreamSynchronize can skip marker if stream is empty (#1667) 2019-11-19 09:42:43 -08:00
Sarbojit2019 45613311d7 [HIP] Fixed hipStreamAddCallback [SWDEV#165185] (#1425)
Fixed hipStreamAddCallback() as requested in SWDEV#165185
Added unit test to test the behavior
2019-11-07 13:18:12 +05:30
Rahul Garg 579a4f36fa Rename hip/hip_hcc.h to hip/hip_ext.h (#1341)
* Rename hip/hip_hcc.h to hip/hip_ext.h

* Deprecate hip_hcc.h
2019-11-07 13:17:10 +05:30
Jeff Daily 85080905c0 hipEventRecord only needs one lock; remove locked_eventIsReady 2019-11-06 15:56:32 +00:00
Rahul Garg 96530cba3b Fix PCI Domain ID query (#1424)
* Fix PCI Domain ID query

* Update BDF comment
2019-10-07 14:11:52 +05:30
satyanveshd 3d661e4706 Reimplement hipMemGetInfo (#1447)
Addresses SWDEV-136570. hipMemGetInfo changed to compute free memory based on information from kfd instead of relying on hc::am_tracker.
2019-10-01 12:40:36 +05:30
Sarbojit2019 0fa42af08c [HIP] Add tccDriver info in hipDeviceProp
Fixes #1433.
2019-09-26 13:53:33 +05:30
ansurya ceb734b917 Added new device attributes (#1377)
* Added new device attributes

* updated comment

* updated with new device attributes supported
2019-09-16 08:31:30 +00:00
Jeff Daily 8384f487ad fix bug where HIP_DB=1 seg faults at startup (#1388) 2019-09-05 10:04:19 +00:00
Sarbojit2019 0722704f35 Updated hipErrorString and CUDAErrorTohipError (#1365) 2019-08-29 01:02:59 +00:00
Siu Chi Chan 83af327ef2 Compile HIP runtime with hidden visibility by default (#1303)
* add default visibility to most APIs in program_state

* remove unwanted C++ headers

* Add symbol visibility pragmas and compiler flags

* Add visibility attribute to APIs in channel_descriptor and hip_hcc

* remove unused headers

* simplify build flags with hcc

* add pragma visibility hidden to functional_grid_launch

* [CMake] add gfx908 back
2019-08-08 08:33:04 +00:00
Alex Voicu fbbed603ff Fix hip_throw. (#1285)
* Fix hip_throw.

* Fix typo

* No, really fix typo
2019-08-05 09:52:22 +00:00
Jeff Daily 1eb3dbf065 consolidate thread local storage (#915)
* all thread local access now through single struct

* clean up old commented-out code, more use of GET_TLS()

* fewer calls to GET_TLS by passing tls as a funtion argument

* revert unnecessary change to printf

* fix failing tests due to TLS change

* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor
2019-08-05 09:51:02 +00:00
wkwchau aaec4f73a6 Added CooperativeLaunch and CooperativeMultiDeviceLaunch flag and property for hipDeviceGetAttribute() and hipGetDeviceProperties() (#1247) 2019-08-02 10:00:25 +00:00
wkwchau e7447d5809 Added query of hipDeviceAttributeHdpMemFlushCntl and hipDeviceAttribu… (#1238)
* Added query of hipDeviceAttributeHdpMemFlushCntl and hipDeviceAttributeHdpRegFlushCntl

* Added NVCC blocker for the hip*FlushCntl test cases
2019-08-01 16:03:35 +00:00
Jeff Daily f096a3239e remove stream locks where it is safe to do so 2019-07-22 17:38:51 +00:00
ansurya 8e496c09d9 Add Max Texture 1D,2D,3D device properties (#1226)
* Add Max Texture 1D,2D,3D device properties

* Corrected testcase to use enums defined in hipDeviceAttribute_t

* Added texture 1D,2D and 3D support for NVIDIA path
2019-07-18 03:18:50 +00:00
Rahul Garg 1dcf618d20 Fix HIP_VISIBLE_DEVICES order (#1184)
* Fix HIP_VISIBLE_DEVICES order

* Fix device IDs mismatch

* Fix review comments- loop order and device range check

* Handle incomplete VISIBLE device env variable

* Revert "Handle incomplete VISIBLE device env variable"
2019-07-18 03:18:04 +00:00
Aryan Salmanpour 999f45fc11 [hip] Move _criticalData of ihipStream_t class to private section and use criticalData() to access it (#1177) 2019-07-04 00:42:19 +00:00
Aryan Salmanpour 96dc74897d [hip] implement the hipExtLaunchMultiKernelMultiDevice API (#1165)
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API

* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5

* modified code based on the requested changes

* changes to lock all streams before launching kernels for each device and unlock them after the dispatches

* check each stream to be valid before starting to lock all the streams
2019-06-20 05:59:05 +05:30
Siu Chi Chan 00824be34c move executable_cache into program_state.cpp 2019-05-24 17:27:25 -04:00
Maneesh Gupta 693bd556d4 Merge pull request #1083 from gargrahul/fix_hip_impl_visible_agents
Maintain HIP_VISIBLE_DEVICES for kernel launch
2019-05-13 14:20:18 +05:30
Siu Chi Chan f5eb91d53d migrate program_state logic from header into shared library (phase I) (#1077)
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""

This reverts commit a3d118eaa8.

* Revert "Use COMgr to read Kernel Args Metadata (#1006)"

This reverts commit 8a548bf40b.

* Revert "improve program state commentary"

This reverts commit 7aada87cbd.

* Revert "load program state once per agent"

This reverts commit c9117de8eb.

* start moving function_names() into the hip shared lib

* start moving code_object_blobs to a new "state" object

* Consolidate various program state related static objects into a
single program_state object

* minor clean up

* move more stuffs from functional_grid_launch into program_state

* debug make_kernarg

* moving lookup for kernargs size_align into program_state

* clean up old code for kernarg size and alignment

* update hip_module to use newer api in program_state

* Create public member functions for program_state

* move most program state functions into shared library

* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)

* turning free functions in program state into members of program_state_impl

* change the free function globals() into a member of program_state_impl

* replace the static mutex used for populating globals

* moving associate_code_object_symbols_with_host_allocation into
program_state_impl

* move load_code_object_and_freeze_executable into program_state_impl

* moving executables and functions_names into program_state_impl

* moving kernels() into program_state_impl

* moving functions() into program_state_impl

* move get_kernargs into program_state_impl

* moving kernel_descriptor into program_state_impl

* moving kernargs_size_align calculation into program_state_impl

* Changing the handle to program_state_impl to a pointer

* moving program_state_impl into a separate inline source file

* fixing/cleaning up some header file includes

* moving member function for kernargs_size_align into program_state.cpp

* moving Kernel_descriptor into program_state.inl

* add a new class to manage agent globals

* moving all agent globals processing functions into agent_globals_impl

* load program state once per agent

re-merging PR991 against other program state changes

* fix per-agent program state member initialization

* cache executables based on elf name, isa, and agent.

This avoids program state reloading executables after a shared library is dlopened.

re-merging PR1057 against other program state changes

* protect executables cache by a global mutex

* return ref to executables cache

* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime
2019-05-12 19:24:03 +05:30
wkwchau 29b3b46b42 Return hipErrorInsufficientDriver status when CPU device not found (#1064)
* Return hipErrorInsufficientDriver status when CPU device not found - no exception thrown

* Return hipErrorInsufficientDriver status when CPU device not found
2019-05-07 15:58:25 +05:30
Rahul Garg 620a07102d Maintain HIP_VISIBLE_DEVICES for kernel launch 2019-05-07 05:09:02 +05:30
Sameer Sahasrabuddhe abb9375707 minor cleanup: eliminate repetition 2019-04-25 20:41:16 +05:30
Jeff Daily 2b3037a6ea In hipFree, synchronize owner of memory (#1018)
* In hipFree, if memory is associated with a device, synchronize that device's streams.

This changes the behavior from synchronizing the currently set TLS device.

* All devices sync in hipFree for _appId=-1 case.

* Revert "All devices sync in hipFree for _appId=-1 case."

This reverts commit 1efb34d6a8426661e45bc5f763422a1147aeac10.

* add HIP_SYNC_FREE env var
2019-04-16 08:35:55 +05:30
Maneesh Gupta eb03d50de9 Merge pull request #962 from gargrahul/add_2d_copy_fallback
Add 2D fallback to use copy kernel
2019-03-25 07:46:43 +00:00
Rahul Garg 9bbfbceb64 2D Fallback needs hcc workweek 19101 or higher 2019-03-25 12:07:28 +05:30
Siu Chi Chan 24d08beef8 reimplement HIP_INIT as hip_impl::hip_init(), add hip_init() to some of the inlined API (#966)
* reimplement HIP_INIT as a function, expose it as hip_impl::hip_init()
so that it could be called from hipLaunchKernelGGL and other inlined
HIP functions

* Don't call hip_init from ihipPreLaunchKernel
2019-03-20 05:11:15 +00:00
Rahul Garg 918d7e3a40 Add 2D fallback to use copy kernel 2019-03-14 13:03:06 +05:30