1. The variable is brought outside the conditional so that its scope is increased
Change-Id: I2d2689553e67930050fe5b3648739f0f72c3bbc8
[ROCm/hip commit: 3be747c41e]
1. Current implementation checks both env var and value in hipconfig and reports error
2. New implementation gives value in hipconfig with highest priority
3. If hipconfig is not present, fall back to env variables.
To Devs: No need to switch between environment variables for different HCC + different HIP.
Change-Id: I6cdf37e1429d7f07be3a68c7e5ba1533d832962b
[ROCm/hip commit: ef68f2f293]
Fixes bug “HIPIFY: nested macro is not hipified”
https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/issues/33
Example:
#include "cuda_runtime.h"
#define MY_MACRO(func, flags) (func, flags)
...
cudaEvent_t *event = NULL;
MY_MACRO(cudaEventCreateWithFlags(event, cudaEventDisableTiming), NULL);
where cudaEventDisableTiming is a defined numeric literal and thus a nested MACRO:
#define cudaEventDisableTiming 0x02 /**< Event will not record timing data */
After hipifying now:
MY_MACRO(hipEventCreateWithFlags(event, cudaEventDisableTiming), NULL);
Should be:
MY_MACRO(hipEventCreateWithFlags(event, hipEventDisableTiming), NULL);
[ROCm/hip commit: 2aacb02358]
Change ihipDevice_t -> ihipCtx_t (new)
Change ihipGetTlsDefaultDevice->ihipGetTlsDefaultCtx
Some other changes from device->ctx where appropriate.
Change-Id: I5c4ae93b2fd42c6303aa23d748eb166b7431925d
[ROCm/hip commit: 2dc3d3238b]
Replace with direct pointer to device. Cleaner, and prep
for transition to contexts.
Change-Id: I0e550f34412923d46c541c0a14bb7d29c3fd4b11
[ROCm/hip commit: e7d7c5cbe8]
1. CMake will create .hip-config file in bin directory
Future Work: Need to make changes to hipcc to read the file
Change-Id: Ia7dc48d43787921d5af4ab07d7a5befbcf904465
[ROCm/hip commit: 9c45d9eaed]
1. Before, the signal pool is increased depending on the usage
2. After, a static number of signals are allocated to the pool
Only these are used by hip in a stream
3. If the signals required are more than the pool size, the
stream has to wait to make sure all the signals are available
4. Once they are available, the stream can use them
5. Removed HIP_NUM_SIGNALS_PER_STREAM because of redundancy with HIP_STREAM_SIGNALS
6. Increased signal count from 2 to 32.
Future Work: Dynamically increase the pool size depending on the number of
streams allocated by the application. And, null stream should have more signals
Change-Id: I6be36e084f26bb04766fabf776c7210aee0f9e91
[ROCm/hip commit: 9062ebcf3a]
Remove dead depFutures, enqueueBarrier call.
Rename some parms to reflect usage.
Add comments to better explain tricky parts of sync code.
Change-Id: I763296421d9c2b3b58fc8cef5f010b12ab49553c
[ROCm/hip commit: 02dd7a7399]
1. Did not change the logic in allocSignal
2. Added guard to wait on signal limit
Change-Id: I78f29097e6a584b3c3d78319dac19869067bd1fe
[ROCm/hip commit: 1859c6e515]
1. The number of kernels that can use signals are increased to 128
2. The kernel count is now specific to the stream
Change-Id: Ie6d1aa3f437aad8f08c3333fe48bd3f46e551e60
[ROCm/hip commit: 53d7629a85]
1. The patch uses HIP signal pools to sync between copy and kernel commands
2. The hsa_signal_create is removed
3. Left the redundant enqueueBarrier method just in case
Change-Id: I3dff3e8ee57fff3cd49bec802ff735ed128e5ca1
[ROCm/hip commit: 4bdf26a82e]
- stubs and documentation in include/hcc_details/hip_runtime.h
- stubs with "no-op" in src/hip_memory.cpp
- document update in hip_kernel_language.md, add suggestions to
disable L1 and L2 caches when using the threadfence routines.
Change-Id: Ic0753170f802003055bca9d7476d7f48817b98b7
[ROCm/hip commit: f31668fee4]