1. Current implementation checks both env var and value in hipconfig and reports error
2. New implementation gives value in hipconfig with highest priority
3. If hipconfig is not present, fall back to env variables.
To Devs: No need to switch between environment variables for different HCC + different HIP.
Change-Id: I6cdf37e1429d7f07be3a68c7e5ba1533d832962b
Fixes bug “HIPIFY: nested macro is not hipified”
https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/issues/33
Example:
#include "cuda_runtime.h"
#define MY_MACRO(func, flags) (func, flags)
...
cudaEvent_t *event = NULL;
MY_MACRO(cudaEventCreateWithFlags(event, cudaEventDisableTiming), NULL);
where cudaEventDisableTiming is a defined numeric literal and thus a nested MACRO:
#define cudaEventDisableTiming 0x02 /**< Event will not record timing data */
After hipifying now:
MY_MACRO(hipEventCreateWithFlags(event, cudaEventDisableTiming), NULL);
Should be:
MY_MACRO(hipEventCreateWithFlags(event, hipEventDisableTiming), NULL);
APIs: hipInit, hipCtxCreate.
Track TLS default ctx. Set deviceID now changes the ctx.
Add first context test.
Change-Id: If1cb9989b5a04a36147e25e84904336c7b6f3d88
Change ihipDevice_t -> ihipCtx_t (new)
Change ihipGetTlsDefaultDevice->ihipGetTlsDefaultCtx
Some other changes from device->ctx where appropriate.
Change-Id: I5c4ae93b2fd42c6303aa23d748eb166b7431925d
1. CMake will create .hip-config file in bin directory
Future Work: Need to make changes to hipcc to read the file
Change-Id: Ia7dc48d43787921d5af4ab07d7a5befbcf904465
1. Before, the signal pool is increased depending on the usage
2. After, a static number of signals are allocated to the pool
Only these are used by hip in a stream
3. If the signals required are more than the pool size, the
stream has to wait to make sure all the signals are available
4. Once they are available, the stream can use them
5. Removed HIP_NUM_SIGNALS_PER_STREAM because of redundancy with HIP_STREAM_SIGNALS
6. Increased signal count from 2 to 32.
Future Work: Dynamically increase the pool size depending on the number of
streams allocated by the application. And, null stream should have more signals
Change-Id: I6be36e084f26bb04766fabf776c7210aee0f9e91
Remove dead depFutures, enqueueBarrier call.
Rename some parms to reflect usage.
Add comments to better explain tricky parts of sync code.
Change-Id: I763296421d9c2b3b58fc8cef5f010b12ab49553c
1. The number of kernels that can use signals are increased to 128
2. The kernel count is now specific to the stream
Change-Id: Ie6d1aa3f437aad8f08c3333fe48bd3f46e551e60
1. The patch uses HIP signal pools to sync between copy and kernel commands
2. The hsa_signal_create is removed
3. Left the redundant enqueueBarrier method just in case
Change-Id: I3dff3e8ee57fff3cd49bec802ff735ed128e5ca1
- stubs and documentation in include/hcc_details/hip_runtime.h
- stubs with "no-op" in src/hip_memory.cpp
- document update in hip_kernel_language.md, add suggestions to
disable L1 and L2 caches when using the threadfence routines.
Change-Id: Ic0753170f802003055bca9d7476d7f48817b98b7