Documentation review and update
Change-Id: If40d096646014d70a198db3532758028abe6a93f
[ROCm/hip commit: b1b099941d]
This commit is contained in:
@@ -94,7 +94,7 @@ Differences or limitations of HIP APIs as compared to CUDA APIs should be clearl
|
||||
- Code Indentation:
|
||||
- Tabs should be expanded to spaces.
|
||||
- Use 4 spaces indendation.
|
||||
- Capitaliziation and Naming
|
||||
- Capitalization and Naming
|
||||
- Prefer camelCase for HIP interfaces and internal symbols. Note HCC uses _ for separator.
|
||||
This guideline is not yet consistently followed in HIP code - eventual compliance is aspirational.
|
||||
- Member variables should begin with a leading "_". This allows them to be easily distinguished from other variables or functions.
|
||||
@@ -110,6 +110,7 @@ Differences or limitations of HIP APIs as compared to CUDA APIs should be clearl
|
||||
doFooElse();
|
||||
}
|
||||
'''
|
||||
- namespace should be on same line as { and separated by a space.
|
||||
- Single-line if statement should still use {/} pair (even though C++ does not require).
|
||||
- Miscellaneous
|
||||
- All references in function parameter lists should be const.
|
||||
|
||||
@@ -669,20 +669,23 @@ The following C++ features are not supported:
|
||||
- Try/catch
|
||||
|
||||
## Kernel Compilation
|
||||
HIP now supports compiling C++/HIP kernels to binary. Eventhough HIP does not support fatbinary (yet), the user can specify the target for which the binary can be generated. The file format for binary is `.co` which means Code Object. The following command builds the binary using `hipcc`.
|
||||
hipcc now supports compiling C++/HIP kernels to binary code objects.
|
||||
The user can specify the target for which the binary can be generated. HIP/HCC does not yet support fat binaries so only a single target may be specified.
|
||||
The file format for binary is `.co` which means Code Object. The following command builds the code object using `hipcc`.
|
||||
|
||||
`hipcc --genisa --target-isa=[TARGET GPU] [INPUT FILE] -o [OUTPUT FILE]`
|
||||
`hipcc --genco --target-isa=[TARGET GPU] [INPUT FILE] -o [OUTPUT FILE]`
|
||||
```[TARGET GPU] = fiji/hawaii
|
||||
[INPUT FILE] = Name of the file containing kernels
|
||||
[OUTPUT FILE] = Name of the generated code object file```
|
||||
|
||||
Note that the kernel file should have `int main(){}` at the end it so that the binary is generated. This happens because HCC generates binaries at linking time rather than compilation
|
||||
Note that the kernel file should have `int main(){}` at the end it so that the binary is generated. This happens because HCC generates binaries at linking time rather than compilation.
|
||||
|
||||
You need 3 things to run kernel in binary.
|
||||
1. Kernel Binary
|
||||
2. Name of kernel binary
|
||||
3. Name of the kernel
|
||||
|
||||
We already got first two of them. In order to get name of the kernel, try `objdump -x [OUTPUT FILE]`. OUTPUT FILE is file generated by hipcc during kernel compilation. The output from objdump has symbol to the kernel whose name is mangled with `grid_launch_parm`, `__functor`, `__cxxamp_trampoline`. An example of how it looks is `ZN12_GLOBAL__N_137_Z3Cpy16grid_launch_parmPfS0__functor19__cxxamp_trampolineEiiiiiiPKfPf` where `Cpy` is the name of the kernel written in C++.
|
||||
To load a kernel into HIP, we need both the code object and the name of the kernel stored within the code object.
|
||||
In order to get name of the kernel, use:
|
||||
```
|
||||
$ objdump -x [CODE_OBJECT_FILE]`.
|
||||
```
|
||||
CODE_OBJECT_FILE is file generated by hipcc during kernel compilation. The output from objdump has symbol to the kernel whose name is mangled with `grid_launch_parm`, `__functor`, `__cxxamp_trampoline`. An example of how it looks is `ZN12_GLOBAL__N_137_Z3Cpy16grid_launch_parmPfS0__functor19__cxxamp_trampolineEiiiiiiPKfPf` where `Cpy` is the name of the kernel written in C++. The hipLoadKernelModule API needs to specify this mangled name on the HIP/hcc path.
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -12,7 +12,7 @@ There are two possible ways to transfer data from Host to Device (H2D) and Devic
|
||||
|
||||
#### On Large BAR Setup
|
||||
|
||||
There are two possible ways to transfer data from Host to Device (H2D)
|
||||
There are three possible ways to transfer data from Host to Device (H2D)
|
||||
* Using Staging Buffers
|
||||
* Using PinInPlace
|
||||
* Direct Memcpy
|
||||
@@ -24,12 +24,9 @@ There are two possible ways to transfer data from Host to Device (H2D)
|
||||
Some GPUs may not be able to directly access host memory, and in these cases we need to
|
||||
stage the copy through an optimized pinned staging buffer, to implement H2D and D2H copies.The copy is broken into buffer-sized chunks to limit the size of the buffer and also to provide better performance by overlapping the CPU copies with the DMA copies.
|
||||
|
||||
PinInPlace is another algorithm which pins the host memory "in-place", and copies it with the DMA
|
||||
engine.
|
||||
PinInPlace is another algorithm which pins the host memory "in-place", and copies it with the DMA engine.
|
||||
|
||||
By default staging buffers are used for unpinned memory transfers, however other ways can be used by enabling few environment variables (so no need to build the code again!!!)
|
||||
|
||||
Following environment variables can be used:
|
||||
By default staging buffers are used for unpinned memory transfers. Environment variables allow control over the unpinned copy algorithm and parameters:
|
||||
|
||||
- HIP_PININPLACE - This environment variable forces the use of PinInPlace logic for all unpinned memory copies
|
||||
|
||||
|
||||
@@ -726,7 +726,7 @@ hipError_t hipHostAlloc(void** ptr, size_t size, unsigned int flags) __attribute
|
||||
hipError_t hipHostGetDevicePointer(void** devPtr, void* hstPtr, unsigned int flags) ;
|
||||
|
||||
/**
|
||||
* @brief Get flags associated with host pointer
|
||||
* @brief Return flags associated with host pointer
|
||||
*
|
||||
* @param[out] flagsPtr Memory location to store flags
|
||||
* @param[in] hostPtr Host Pointer allocated through hipHostMalloc
|
||||
@@ -1186,13 +1186,12 @@ hipError_t hipCtxGetSharedMemConfig ( hipSharedMemConfig * pConfig );
|
||||
hipError_t hipCtxSynchronize ( void );
|
||||
|
||||
/**
|
||||
* @brief Get flags used for creating current/default context.
|
||||
* @brief Return flags used for creating default context.
|
||||
*
|
||||
* @param [out] flags
|
||||
*
|
||||
* @returns #hipSuccess.
|
||||
*/
|
||||
|
||||
hipError_t hipCtxGetFlags ( unsigned int* flags );
|
||||
|
||||
/**
|
||||
|
||||
Reference in New Issue
Block a user