- Added NVCC support for module APIs
- Changed hipFunction and hipModule data types to hipFunction_t and hipModule_t
- Created new intenal ihipModuleGetFunction as it is used twice
- Changed test to match with the new data types
Change-Id: I300a1c7fd40ed7065b1b8b9de97e3a06b96ed729
- Corrected the hipModule.cpp test to minimal code
- Added hipModuleUnload API
- Added hipModuleUnload API test
Change-Id: I9c40337043d7972a570b795e1bfc104bd2c4d8aa
- The module kernel launch is now in sync with commands in its stream
- Moved launch kernel inside ihipStream
Change-Id: Ic00cfcf4882bf81b6203c36881a52575ea68b529
- symbol handle is added to hipFunction
- executable handle is added to hipModule
- This way, the APIs doesn't need to track the values
Change-Id: I7cf05329cf79fe946319d7746bd9f5503268fda4
- hipLaunchModuleKernel maps to cuLaunchKernel
- Whole lot of new error codes added for the use of driver api
- KernelParams arguments is not yet supported
- hipLaunchModuleKernel is a synchronous api (will change eventually)
- All the commands in a stream will wait on host when hipLaunchModuleKernel is called on it
Change-Id: Ib4a4fae1db06fbb3a81d5a5575b026aa821264ed
1. Added 2 new driver apis, hipModuleLoad, hipModuleGetFunction
Change-Id: If464a7fad178121e3da791c7ac9e17ebc01a9cd0
Issues: When a sample written with them shows Aborted (core dumped) when exiting
APIs: hipInit, hipCtxCreate.
Track TLS default ctx. Set deviceID now changes the ctx.
Add first context test.
Change-Id: If1cb9989b5a04a36147e25e84904336c7b6f3d88
Change ihipDevice_t -> ihipCtx_t (new)
Change ihipGetTlsDefaultDevice->ihipGetTlsDefaultCtx
Some other changes from device->ctx where appropriate.
Change-Id: I5c4ae93b2fd42c6303aa23d748eb166b7431925d
1. Before, the signal pool is increased depending on the usage
2. After, a static number of signals are allocated to the pool
Only these are used by hip in a stream
3. If the signals required are more than the pool size, the
stream has to wait to make sure all the signals are available
4. Once they are available, the stream can use them
5. Removed HIP_NUM_SIGNALS_PER_STREAM because of redundancy with HIP_STREAM_SIGNALS
6. Increased signal count from 2 to 32.
Future Work: Dynamically increase the pool size depending on the number of
streams allocated by the application. And, null stream should have more signals
Change-Id: I6be36e084f26bb04766fabf776c7210aee0f9e91
Remove dead depFutures, enqueueBarrier call.
Rename some parms to reflect usage.
Add comments to better explain tricky parts of sync code.
Change-Id: I763296421d9c2b3b58fc8cef5f010b12ab49553c
1. The number of kernels that can use signals are increased to 128
2. The kernel count is now specific to the stream
Change-Id: Ie6d1aa3f437aad8f08c3333fe48bd3f46e551e60
1. The patch uses HIP signal pools to sync between copy and kernel commands
2. The hsa_signal_create is removed
3. Left the redundant enqueueBarrier method just in case
Change-Id: I3dff3e8ee57fff3cd49bec802ff735ed128e5ca1
- stubs and documentation in include/hcc_details/hip_runtime.h
- stubs with "no-op" in src/hip_memory.cpp
- document update in hip_kernel_language.md, add suggestions to
disable L1 and L2 caches when using the threadfence routines.
Change-Id: Ic0753170f802003055bca9d7476d7f48817b98b7
ihipStream_t::copySync use GPU agent in memory async copy API, even
if the src/dst memory does not belong to GPU, which cause the hsa
runtime to choose a slower copy engine.
SWDEV-95191
Change-Id: If3cab3d493c0c96ed63721cdcf28247a1193887c