122cecc59d
Change-Id: Ib5297fdda2e05795b3b20436cc1de962e310b08b
[ROCm/hip commit: 3d60bd3a64]
1.6 KiB
1.6 KiB
Table comparing syntax for different compute APIs
| Term | CUDA | HIP | OpenCL |
|---|---|---|---|
| Device | int deviceId |
int deviceId |
cl_device |
| Queue | cudaStream_t |
hipStream_t |
cl_command_queue |
| Event | cudaEvent_t |
hipEvent_t |
cl_event |
| Memory | void * |
void * |
cl_mem |
| grid | grid | NDRange | |
| block | block | work-group | |
| thread | thread | work-item | |
| warp | warp | sub-group | |
| Thread- index |
threadIdx.x |
threadIdx.x |
get_local_id(0) |
| Block- index |
blockIdx.x |
blockIdx.x |
get_group_id(0) |
| Block- dim |
blockDim.x |
blockDim.x |
get_local_size(0) |
| Grid-dim | gridDim.x |
gridDim.x |
get_num_groups(0) |
| Device Kernel | __global__ |
__global__ |
__kernel |
| Device Function | __device__ |
__device__ |
Implied in device compilation |
| Host Function | __host_ (default) |
__host_ (default) |
Implied in host compilation |
| Host + Device Function | __host__ __device__ |
__host__ __device__ |
No equivalent |
| Kernel Launch | <<< >>> |
hipLaunchKernel/hipLaunchKernelGGL/<<< >>> |
clEnqueueNDRangeKernel |
| Global Memory | __global__ |
__global__ |
__global |
| Group Memory | __shared__ |
__shared__ |
__local |
| Constant | __constant__ |
__constant__ |
__constant |
__syncthreads |
__syncthreads |
barrier(CLK_LOCAL_MEMFENCE) |
|
| Atomic Builtins | atomicAdd |
atomicAdd |
atomic_add |
| Precise Math | cos(f) |
cos(f) |
cos(f) |
| Fast Math | __cos(f) |
__cos(f) |
native_cos(f) |
| Vector | float4 |
float4 |
float4 |
Notes
The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.