Commit graph

410 Commits

Autor SHA1 Nachricht Datum
Alex Xie 376810dfcc SWDEV-221166 - Detect support for large bar access through HIP runtime API
Change-Id: Iaa9756c1b5e40c1ab5afb38e44a6699fa5f6c13f


[ROCm/hip commit: d890d77da4]
2020-05-01 20:39:52 -04:00
Jeff Daily e428d45eb2 add IPC event support (#1996)
[ROCm/hip commit: ef596cd088]
2020-04-17 10:31:22 +05:30
Aryan Salmanpour 462b9245ea [HIP] add support for NoPreSync/NoPostSync flags for Cooperative MultiDevice launch API (#1990)
[ROCm/hip commit: cf8589b8c8]
2020-04-13 14:02:52 +05:30
Jeff Daily a2915ae475 fix hipStreamAddCallback, block future work on stream (#1934)
[ROCm/hip commit: 01d661b159]
2020-03-19 16:16:04 +05:30
Joseph Greathouse 62b9e8f1b2 Fix detection of support for cooperative groups (#1932)
Query ROCr to see if we have the proper lower-level support for
cooperative groups -- GWS support through the firmware, driver,
thunk, and ROCr. ROCr does these checks for us, and presents a
query that allows us to see if GWS entries are available for use.
If so, then we have all the lower-level technologies needed, and
we should enable cooperative groups support for HIP.

[ROCm/hip commit: 18e6c529bc]
2020-03-17 14:01:44 +05:30
Joseph Greathouse 0d31e51bc9 Fix maxSharedMemoryPerMultiProcessor attribute (#1927)
The maxSharedMemoryPerMultiProcessor attribute is meant to describe
the number of bytes of shared memory (LDS space in AMD terminology)
in each SM (CU in AMD terminology). For instance, on AMD GPUs this
is often 64KB per CU, and some Nvidia GPUs it's 96KB per SM.

This shared memory is a different address space from the normal
global memory. However, the current HIP-HCC properties fill this
in with a size that matches the totalGlboalMem property. This gives
a drastically too-high calculation for the amount of LDS space that
each CU has -- tens of GBs vs. 10s of KBs.

This patch fixes this by pulling the maxSharedMemoryPerMultiProcessor
property from the HSA pool that describes how much workgroup-local
space is available on each CU. The HSA runtime eventually pulls
this from the topology information about LDSSizeInKB, defined as
"Size of Local Data Store in Kilobytes per SIMD".

Previously, this HSA query was used to fill in the value of the
sharedMemPerBlock property. On today's AMD GPUs, we know that
the amount of LDS avaialble to the workgroup is identical to the
amount of LDS space in the CU. However, in the future this may
differ. As such, this patch changes around the order and fills
in the "PerMultiProcessor" property from the HSA query (since
what's what the query is defined to return), and then separately
fills in the "PerBlock" property as we know it.

[ROCm/hip commit: 55e55e78bb]
2020-03-17 14:00:51 +05:30
Evgeny Mankov 8b0983389c Merge pull request #1916 from asalmanp/refactor_cooperative_APIs
[HIP] Refactor cooperative APIs

[ROCm/hip commit: 821c60a3d9]
2020-03-12 19:12:50 +03:00
Aryan Salmanpour c39d9f8f7b [HIP] fix formatting/code clean up and fix a bug
[ROCm/hip commit: 5494f5b247]
2020-03-09 16:03:59 -04:00
Aryan Salmanpour c25dd0ca3d [HIP] Refactor cooperative APIs
[ROCm/hip commit: 4844fbdf0a]
2020-03-06 18:30:12 -05:00
Aryan Salmanpour df81136734 [HIP] add hip specific properties for cooperative kernel multi device
[ROCm/hip commit: 03797ae986]
2020-03-03 13:25:36 -05:00
Rahul Garg c34c9a4b4d Remove deprecated HIP markers (#1876)
[ROCm/hip commit: 6c5fa32815]
2020-02-28 16:47:15 +05:30
ansurya fb53186682 Reduce GPU copying based on arch it runs on (#1751)
Implements SWDEV-213230.

[ROCm/hip commit: 8c6934223b]
2020-02-13 14:21:51 +05:30
Aryan Salmanpour 78a36b4fb8 [HIP][HIPIFY] Add some missing flags for cooperative launch and occupancy APIs
[ROCm/hip commit: 6e867eacb6]
2020-01-30 15:05:53 -05:00
vsytch 1af35a6044 Add missing texturePitchAlignment member to the hipDeviceProp_t struct. (#1802)
* Add missing texturePitchAlignment member to the hipDeviceProp_t struct.

* Add missing hipDeviceAttributeTexturePitchAlignment enumerator to the hipDeviceAttribute_t enum.

* Initialize texturePitchAlignment to 256. This works for gfx9+, but is technically overaligned in most cases for pre-gfx9.

* Add the texturePitchAlignment property to the NVCC path.


[ROCm/hip commit: f72a669487]
2020-01-27 16:37:00 -08:00
Siu Chi Chan fcf07e0b04 Detect when an explicit printf buffer flush is required (#1766)
* Detect when an explicit printf buffer flush is required
in a device/stream synchronization function.

* hip_module.cpp: add missing hc_am.hpp header


[ROCm/hip commit: f4555c835a]
2020-01-07 09:06:38 -08:00
Evgeny Mankov 6ac4d18753 Merge pull request #1759 from emankov/master
[HIP] Unify hipError_t (Step 2)

[ROCm/hip commit: 0dadb23327]
2019-12-30 19:21:09 +03:00
Evgeny Mankov abef353b5b [HIP] Clean-up deprecated HIP error codes
hipErrorMemoryAllocation -> hipErrorOutOfMemory
hipErrorInitializationError -> hipErrorNotInitialized
hipErrorMapBufferObjectFailed -> hipErrorMapFailed
hipErrorInvalidResourceHandle -> hipErrorInvalidHandle


[ROCm/hip commit: 4921678b6c]
2019-12-23 17:01:35 +03:00
Alex Voicu 1f5ecc0f6a Fix late-coming issues. (#1724)
Implementation for hipMemcpyWithStream.


[ROCm/hip commit: 75a11330aa]
2019-12-23 19:11:24 +05:30
Sarbojit2019 6679cd3998 Revert [HIP] Fixed hipStreamAddCallback (#1674)
This reverts commit fa1e44aa0e.
Addresses SWDEV#212675.

[ROCm/hip commit: 153a959280]
2019-11-20 11:55:46 +05:30
Jeff Daily 492248ef12 hipStreamSynchronize can skip marker if stream is empty (#1667)
[ROCm/hip commit: 3a7eb694f5]
2019-11-19 09:42:43 -08:00
Sarbojit2019 fa1e44aa0e [HIP] Fixed hipStreamAddCallback [SWDEV#165185] (#1425)
Fixed hipStreamAddCallback() as requested in SWDEV#165185
Added unit test to test the behavior


[ROCm/hip commit: 45613311d7]
2019-11-07 13:18:12 +05:30
Rahul Garg 6968362d99 Rename hip/hip_hcc.h to hip/hip_ext.h (#1341)
* Rename hip/hip_hcc.h to hip/hip_ext.h

* Deprecate hip_hcc.h


[ROCm/hip commit: 579a4f36fa]
2019-11-07 13:17:10 +05:30
Jeff Daily f9ad564380 hipEventRecord only needs one lock; remove locked_eventIsReady
[ROCm/hip commit: 85080905c0]
2019-11-06 15:56:32 +00:00
Rahul Garg 4f899d487c Fix PCI Domain ID query (#1424)
* Fix PCI Domain ID query

* Update BDF comment


[ROCm/hip commit: 96530cba3b]
2019-10-07 14:11:52 +05:30
satyanveshd bfb64c43a4 Reimplement hipMemGetInfo (#1447)
Addresses SWDEV-136570. hipMemGetInfo changed to compute free memory based on information from kfd instead of relying on hc::am_tracker.


[ROCm/hip commit: 3d661e4706]
2019-10-01 12:40:36 +05:30
Sarbojit2019 7f7a5a3712 [HIP] Add tccDriver info in hipDeviceProp
Fixes #1433.

[ROCm/hip commit: 0fa42af08c]
2019-09-26 13:53:33 +05:30
ansurya b5549f4397 Added new device attributes (#1377)
* Added new device attributes

* updated comment

* updated with new device attributes supported


[ROCm/hip commit: ceb734b917]
2019-09-16 08:31:30 +00:00
Jeff Daily 191482e1e4 fix bug where HIP_DB=1 seg faults at startup (#1388)
[ROCm/hip commit: 8384f487ad]
2019-09-05 10:04:19 +00:00
Sarbojit2019 73e5c52d0d Updated hipErrorString and CUDAErrorTohipError (#1365)
[ROCm/hip commit: 0722704f35]
2019-08-29 01:02:59 +00:00
Siu Chi Chan 83269bea32 Compile HIP runtime with hidden visibility by default (#1303)
* add default visibility to most APIs in program_state

* remove unwanted C++ headers

* Add symbol visibility pragmas and compiler flags

* Add visibility attribute to APIs in channel_descriptor and hip_hcc

* remove unused headers

* simplify build flags with hcc

* add pragma visibility hidden to functional_grid_launch

* [CMake] add gfx908 back


[ROCm/hip commit: 83af327ef2]
2019-08-08 08:33:04 +00:00
Alex Voicu 4509df8151 Fix hip_throw. (#1285)
* Fix hip_throw.

* Fix typo

* No, really fix typo


[ROCm/hip commit: fbbed603ff]
2019-08-05 09:52:22 +00:00
Jeff Daily 9b44993343 consolidate thread local storage (#915)
* all thread local access now through single struct

* clean up old commented-out code, more use of GET_TLS()

* fewer calls to GET_TLS by passing tls as a funtion argument

* revert unnecessary change to printf

* fix failing tests due to TLS change

* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor


[ROCm/hip commit: 1eb3dbf065]
2019-08-05 09:51:02 +00:00
wkwchau b663dbc5ce Added CooperativeLaunch and CooperativeMultiDeviceLaunch flag and property for hipDeviceGetAttribute() and hipGetDeviceProperties() (#1247)
[ROCm/hip commit: aaec4f73a6]
2019-08-02 10:00:25 +00:00
wkwchau c666fdaa08 Added query of hipDeviceAttributeHdpMemFlushCntl and hipDeviceAttribu… (#1238)
* Added query of hipDeviceAttributeHdpMemFlushCntl and hipDeviceAttributeHdpRegFlushCntl

* Added NVCC blocker for the hip*FlushCntl test cases


[ROCm/hip commit: e7447d5809]
2019-08-01 16:03:35 +00:00
Jeff Daily 4094b62407 remove stream locks where it is safe to do so
[ROCm/hip commit: f096a3239e]
2019-07-22 17:38:51 +00:00
ansurya df4dee39f4 Add Max Texture 1D,2D,3D device properties (#1226)
* Add Max Texture 1D,2D,3D device properties

* Corrected testcase to use enums defined in hipDeviceAttribute_t

* Added texture 1D,2D and 3D support for NVIDIA path


[ROCm/hip commit: 8e496c09d9]
2019-07-18 03:18:50 +00:00
Rahul Garg 7629cdd2cf Fix HIP_VISIBLE_DEVICES order (#1184)
* Fix HIP_VISIBLE_DEVICES order

* Fix device IDs mismatch

* Fix review comments- loop order and device range check

* Handle incomplete VISIBLE device env variable

* Revert "Handle incomplete VISIBLE device env variable"


[ROCm/hip commit: 1dcf618d20]
2019-07-18 03:18:04 +00:00
Aryan Salmanpour a2655fd90a [hip] Move _criticalData of ihipStream_t class to private section and use criticalData() to access it (#1177)
[ROCm/hip commit: 999f45fc11]
2019-07-04 00:42:19 +00:00
Aryan Salmanpour 45fa752888 [hip] implement the hipExtLaunchMultiKernelMultiDevice API (#1165)
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API

* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5

* modified code based on the requested changes

* changes to lock all streams before launching kernels for each device and unlock them after the dispatches

* check each stream to be valid before starting to lock all the streams


[ROCm/hip commit: 96dc74897d]
2019-06-20 05:59:05 +05:30
Siu Chi Chan 12d457cb4d move executable_cache into program_state.cpp
[ROCm/hip commit: 00824be34c]
2019-05-24 17:27:25 -04:00
Maneesh Gupta 384b4554a2 Merge pull request #1083 from gargrahul/fix_hip_impl_visible_agents
Maintain HIP_VISIBLE_DEVICES for kernel launch

[ROCm/hip commit: 693bd556d4]
2019-05-13 14:20:18 +05:30
Siu Chi Chan d0252dfa79 migrate program_state logic from header into shared library (phase I) (#1077)
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""

This reverts commit 62e96cb4cf.

* Revert "Use COMgr to read Kernel Args Metadata (#1006)"

This reverts commit 882006555b.

* Revert "improve program state commentary"

This reverts commit fb2beb0c88.

* Revert "load program state once per agent"

This reverts commit 21f5e142f5.

* start moving function_names() into the hip shared lib

* start moving code_object_blobs to a new "state" object

* Consolidate various program state related static objects into a
single program_state object

* minor clean up

* move more stuffs from functional_grid_launch into program_state

* debug make_kernarg

* moving lookup for kernargs size_align into program_state

* clean up old code for kernarg size and alignment

* update hip_module to use newer api in program_state

* Create public member functions for program_state

* move most program state functions into shared library

* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)

* turning free functions in program state into members of program_state_impl

* change the free function globals() into a member of program_state_impl

* replace the static mutex used for populating globals

* moving associate_code_object_symbols_with_host_allocation into
program_state_impl

* move load_code_object_and_freeze_executable into program_state_impl

* moving executables and functions_names into program_state_impl

* moving kernels() into program_state_impl

* moving functions() into program_state_impl

* move get_kernargs into program_state_impl

* moving kernel_descriptor into program_state_impl

* moving kernargs_size_align calculation into program_state_impl

* Changing the handle to program_state_impl to a pointer

* moving program_state_impl into a separate inline source file

* fixing/cleaning up some header file includes

* moving member function for kernargs_size_align into program_state.cpp

* moving Kernel_descriptor into program_state.inl

* add a new class to manage agent globals

* moving all agent globals processing functions into agent_globals_impl

* load program state once per agent

re-merging PR991 against other program state changes

* fix per-agent program state member initialization

* cache executables based on elf name, isa, and agent.

This avoids program state reloading executables after a shared library is dlopened.

re-merging PR1057 against other program state changes

* protect executables cache by a global mutex

* return ref to executables cache

* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime


[ROCm/hip commit: f5eb91d53d]
2019-05-12 19:24:03 +05:30
wkwchau 236834c679 Return hipErrorInsufficientDriver status when CPU device not found (#1064)
* Return hipErrorInsufficientDriver status when CPU device not found - no exception thrown

* Return hipErrorInsufficientDriver status when CPU device not found


[ROCm/hip commit: 29b3b46b42]
2019-05-07 15:58:25 +05:30
Rahul Garg f8386e274a Maintain HIP_VISIBLE_DEVICES for kernel launch
[ROCm/hip commit: 620a07102d]
2019-05-07 05:09:02 +05:30
Sameer Sahasrabuddhe c3db7b929a minor cleanup: eliminate repetition
[ROCm/hip commit: abb9375707]
2019-04-25 20:41:16 +05:30
Jeff Daily a0172ca884 In hipFree, synchronize owner of memory (#1018)
* In hipFree, if memory is associated with a device, synchronize that device's streams.

This changes the behavior from synchronizing the currently set TLS device.

* All devices sync in hipFree for _appId=-1 case.

* Revert "All devices sync in hipFree for _appId=-1 case."

This reverts commit 1efb34d6a8426661e45bc5f763422a1147aeac10.

* add HIP_SYNC_FREE env var


[ROCm/hip commit: 2b3037a6ea]
2019-04-16 08:35:55 +05:30
Maneesh Gupta 8c6b5bf266 Merge pull request #962 from gargrahul/add_2d_copy_fallback
Add 2D fallback to use copy kernel

[ROCm/hip commit: eb03d50de9]
2019-03-25 07:46:43 +00:00
Rahul Garg 63e926cec7 2D Fallback needs hcc workweek 19101 or higher
[ROCm/hip commit: 9bbfbceb64]
2019-03-25 12:07:28 +05:30
Siu Chi Chan 99c4f43a6c reimplement HIP_INIT as hip_impl::hip_init(), add hip_init() to some of the inlined API (#966)
* reimplement HIP_INIT as a function, expose it as hip_impl::hip_init()
so that it could be called from hipLaunchKernelGGL and other inlined
HIP functions

* Don't call hip_init from ihipPreLaunchKernel


[ROCm/hip commit: 24d08beef8]
2019-03-20 05:11:15 +00:00
Rahul Garg a3fb908a0a Add 2D fallback to use copy kernel
[ROCm/hip commit: 918d7e3a40]
2019-03-14 13:03:06 +05:30