SWDEV-436821 Update hip samples Readme files
Change-Id: I6bf3a72eac4a4242cb2dbf4e6eee73e0e1bef2ef
[ROCm/hip-tests commit: 76dd8ea569]
Этот коммит содержится в:
@@ -4,3 +4,26 @@ Show an application written directly in HIP which uses platform-specific check o
|
||||
an instruction that only exists on the AMD platform.
|
||||
|
||||
See related [blog](http://gpuopen.com/platform-aware-coding-inside-hip/) demonstrating platform specialization.
|
||||
|
||||
- Steps to build this sample:
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
|
||||
- Execute File
|
||||
```
|
||||
$ ./bit_extract
|
||||
|
||||
pch size: 11743288
|
||||
__hipGetPCH succeeded!
|
||||
info: running on device #0
|
||||
info: allocate host mem ( 7.63 MB)
|
||||
info: allocate device mem ( 7.63 MB)
|
||||
info: copy Host2Device
|
||||
info: launch 'bit_extract_kernel'
|
||||
info: copy Device2Host
|
||||
info: check result
|
||||
PASSED!
|
||||
```
|
||||
|
||||
@@ -0,0 +1,19 @@
|
||||
# module_api
|
||||
|
||||
- Steps to build this sample
|
||||
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
|
||||
- Execute Code
|
||||
```
|
||||
$ ./launchKernelHcc.hip.out
|
||||
PASSED!
|
||||
$ ./runKernel.hip.out
|
||||
PASSED!
|
||||
$ ./defaultDriver.hip.out
|
||||
PASSED!
|
||||
```
|
||||
@@ -0,0 +1,17 @@
|
||||
# module_api_global
|
||||
|
||||
- Steps to build this sample
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
|
||||
- Execute Code
|
||||
```
|
||||
$ ./runKernel1.hip.out
|
||||
PASSED!
|
||||
Shared Size Bytes = 0
|
||||
Num Regs = 3
|
||||
PASSED!
|
||||
```
|
||||
@@ -3,19 +3,16 @@
|
||||
Simple test below is an example, shows how to use hipify-perl to port CUDA code to HIP:
|
||||
|
||||
- Add hip/bin path to the PATH
|
||||
|
||||
```
|
||||
$ export PATH=$PATH:[MYHIP]/bin
|
||||
```
|
||||
|
||||
- Define environment variable
|
||||
|
||||
```
|
||||
$ export HIP_PATH=[MYHIP]
|
||||
```
|
||||
|
||||
- Build executable file
|
||||
|
||||
```
|
||||
$ cd ~/hip/samples/0_Intro/square
|
||||
mkdir -p build && cd build
|
||||
|
||||
@@ -0,0 +1,26 @@
|
||||
# hipDispatchLatency.cpp
|
||||
|
||||
- Steps to build this sample
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
|
||||
- Execute Code
|
||||
```
|
||||
$ ./hipDispatchEnqueueRateMT 1 0
|
||||
Thread ID : 0 , hipModuleLaunchKernel enqueue rate: 0.8 us, std: 0.1 us
|
||||
|
||||
$ ./hipDispatchEnqueueRateMT 1 1
|
||||
Thread ID : 0 , hipLaunchKernelGGL enqueue rate: 1.0 us, std: 0.1 us
|
||||
|
||||
$ ./hipDispatchLatency
|
||||
hipModuleLaunchKernel enqueue rate: 0.8 us, std: 0.1 us
|
||||
|
||||
hipLaunchKernelGGL enqueue rate: 1.0 us, std: 0.1 us
|
||||
|
||||
Timing around single dispatch latency: 8.1 us, std: 4.7 us
|
||||
|
||||
Batch dispatch latency: 1.4 us, std: 0.0 us
|
||||
```
|
||||
@@ -4,3 +4,82 @@ Simple tool that prints properties for each device (from hipGetDeviceProperties)
|
||||
Properties includes all of the architectural feature flags for each device.
|
||||
|
||||
Also demonstrates how to use platform-specific compilation path (testing `__HIP_PLATFORM_AMD__` or `__HIP_PLATFORM_NVIDIA__`)
|
||||
|
||||
|
||||
- Steps to build this sample
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
|
||||
- Execute Code
|
||||
```
|
||||
$ ./hipInfo
|
||||
--------------------------------------------------------------------------------
|
||||
device# 0
|
||||
Name:
|
||||
pciBusID: 103
|
||||
pciDeviceID: 0
|
||||
pciDomainID: 0
|
||||
multiProcessorCount: 64
|
||||
maxThreadsPerMultiProcessor: 2560
|
||||
isMultiGpuBoard: 0
|
||||
clockRate: 1800 Mhz
|
||||
memoryClockRate: 1000 Mhz
|
||||
memoryBusWidth: 4096
|
||||
totalGlobalMem: 31.98 GB
|
||||
totalConstMem: 2147483647
|
||||
sharedMemPerBlock: 64.00 KB
|
||||
canMapHostMemory: 1
|
||||
regsPerBlock: 65536
|
||||
warpSize: 64
|
||||
l2CacheSize: 8388608
|
||||
computeMode: 0
|
||||
maxThreadsPerBlock: 1024
|
||||
maxThreadsDim.x: 1024
|
||||
maxThreadsDim.y: 1024
|
||||
maxThreadsDim.z: 1024
|
||||
maxGridSize.x: 2147483647
|
||||
maxGridSize.y: 65536
|
||||
maxGridSize.z: 65536
|
||||
major: 9
|
||||
minor: 0
|
||||
concurrentKernels: 1
|
||||
cooperativeLaunch: 1
|
||||
cooperativeMultiDeviceLaunch: 1
|
||||
isIntegrated: 0
|
||||
maxTexture1D: 16384
|
||||
maxTexture2D.width: 16384
|
||||
maxTexture2D.height: 16384
|
||||
maxTexture3D.width: 16384
|
||||
maxTexture3D.height: 16384
|
||||
maxTexture3D.depth: 8192
|
||||
hostNativeAtomicSupported: 1
|
||||
isLargeBar: 1
|
||||
asicRevision: 1
|
||||
maxSharedMemoryPerMultiProcessor: 64.00 KB
|
||||
clockInstructionRate: 1000.00 Mhz
|
||||
arch.hasGlobalInt32Atomics: 1
|
||||
arch.hasGlobalFloatAtomicExch: 1
|
||||
arch.hasSharedInt32Atomics: 1
|
||||
arch.hasSharedFloatAtomicExch: 1
|
||||
arch.hasFloatAtomicAdd: 1
|
||||
arch.hasGlobalInt64Atomics: 1
|
||||
arch.hasSharedInt64Atomics: 1
|
||||
arch.hasDoubles: 1
|
||||
arch.hasWarpVote: 1
|
||||
arch.hasWarpBallot: 1
|
||||
arch.hasWarpShuffle: 1
|
||||
arch.hasFunnelShift: 0
|
||||
arch.hasThreadFenceSystem: 1
|
||||
arch.hasSyncThreadsExt: 0
|
||||
arch.hasSurfaceFuncs: 0
|
||||
arch.has3dGrid: 1
|
||||
arch.hasDynamicParallelism: 0
|
||||
gcnArchName: gfx906:sramecc+:xnack-
|
||||
peers:
|
||||
non-peers: device#0
|
||||
memInfo.total: 31.98 GB
|
||||
memInfo.free: 31.96 GB (100%)
|
||||
```
|
||||
|
||||
@@ -87,8 +87,19 @@ After, copying the data from device to memory, we will verify it with the one we
|
||||
Finally, we will free the memory allocated earlier by using free() for host while for devices we will use `hipFree`.
|
||||
|
||||
## How to build and run:
|
||||
Use the make command and execute it using ./exe
|
||||
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./MatrixTranspose
|
||||
Device name Navi 14 [Radeon Pro W5500]
|
||||
PASSED!
|
||||
```
|
||||
|
||||
|
||||
## More Info:
|
||||
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
|
||||
|
||||
@@ -45,9 +45,21 @@ Index for the respective operand in the ordered fashion is provided by `%` follo
|
||||
Output Constraints are specified by an `"="` prefix as shown above ("=v"). This indicate that assemby will write to this operand, and the operand will then be made available as a return value of the asm expression. Input constraints do not have a prefix - just the constraint code. The constraint string of `"0"` says to use the assigned register for output as an input as well (it being the 0'th constraint).
|
||||
|
||||
## How to build and run:
|
||||
Use the make command and execute it using ./exe
|
||||
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
|
||||
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./inline_asm
|
||||
Device name
|
||||
hipMemcpyHostToDevice time taken = 1.057ms
|
||||
kernel Execution time = 0.509ms
|
||||
hipMemcpyDeviceToHost time taken = 1.254ms
|
||||
PASSED!
|
||||
```
|
||||
|
||||
## More Info:
|
||||
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
|
||||
|
||||
@@ -0,0 +1,21 @@
|
||||
# texture_driver
|
||||
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./texture2dDrv
|
||||
tex2dKernelChar test PASSED ...
|
||||
tex2dKernelShort test PASSED ...
|
||||
tex2dKernelInt test PASSED ...
|
||||
tex2dKernelFloat test PASSED ...
|
||||
tex2dKernelChar4 test PASSED ...
|
||||
tex2dKernelShort4 test PASSED ...
|
||||
tex2dKernelInt4 test PASSED ...
|
||||
tex2dKernelFloat4 test PASSED ...
|
||||
texture2dDrv PASSED ...
|
||||
```
|
||||
+13
-12
@@ -32,20 +32,21 @@ The macro supports specifying CLANG-specific, NVCC-specific compiler options usi
|
||||
Common options targeting both compilers can be specificed after the ```HIPCC_OPTIONS``` keyword.
|
||||
|
||||
## How to build and run:
|
||||
Use the following commands to build and execute the sample
|
||||
|
||||
- Build sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
# For shared lib of hip rt,
|
||||
$ cmake ..
|
||||
# Or for static lib of hip rt,
|
||||
$ cmake -DCMAKE_PREFIX_PATH="/opt/rocm/llvm/lib/cmake" ..
|
||||
$ make
|
||||
```
|
||||
mkdir build
|
||||
cd build
|
||||
|
||||
For shared lib of hip rt,
|
||||
cmake ..
|
||||
Or for static lib of hip rt,
|
||||
cmake -DCMAKE_PREFIX_PATH="/opt/rocm/llvm/lib/cmake" ..
|
||||
|
||||
Then,
|
||||
make
|
||||
./MatrixTranspose
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./MatrixTranspose
|
||||
Device name
|
||||
PASSED!
|
||||
```
|
||||
|
||||
## More Info:
|
||||
|
||||
@@ -0,0 +1,24 @@
|
||||
# occupancy
|
||||
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./occupancy
|
||||
Manual Configuration with block size 32
|
||||
kernel Execution time = 0.433ms
|
||||
Theoretical Occupancy is 40%
|
||||
|
||||
Automatic Configuation based on hipOccupancyMaxPotentialBlockSize
|
||||
Suggested blocksize is 1024, Minimum gridsize is 128
|
||||
kernel Execution time = 0.037ms
|
||||
Theoretical Occupancy is 80%
|
||||
|
||||
Manual Test PASSED!
|
||||
|
||||
Automatic Test PASSED!
|
||||
```
|
||||
@@ -0,0 +1,15 @@
|
||||
# gpu_arch
|
||||
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./gpuarch
|
||||
success
|
||||
```
|
||||
|
||||
## Note : This sample works on architectures gfx908 and above
|
||||
@@ -1,179 +1,81 @@
|
||||
# Emitting Static Library
|
||||
|
||||
This sample shows how to generate a static library for a simple HIP application. We will evaluate two types of static libraries: the first type exports host functions in a static library generated with --emit-static-lib and is compatible with host linkers, and second type exports device functions in a static library made with system ar.
|
||||
|
||||
Please refer to the hip_programming_guide for limitations.
|
||||
|
||||
## Static libraries with host functions
|
||||
|
||||
### Source files
|
||||
The static library source files may contain host functions and kernel `__global__` and `__device__` functions. Here is an example (please refer to the directory host_functions).
|
||||
|
||||
hipOptLibrary.cpp:
|
||||
```
|
||||
#define HIP_ASSERT(status) assert(status == hipSuccess)
|
||||
#define LEN 512
|
||||
|
||||
__global__ void copy(uint32_t* A, uint32_t* B) {
|
||||
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
B[tid] = A[tid];
|
||||
}
|
||||
|
||||
void run_test1() {
|
||||
uint32_t *A_h, *B_h, *A_d, *B_d;
|
||||
size_t valbytes = LEN * sizeof(uint32_t);
|
||||
|
||||
A_h = (uint32_t*)malloc(valbytes);
|
||||
B_h = (uint32_t*)malloc(valbytes);
|
||||
for (uint32_t i = 0; i < LEN; i++) {
|
||||
A_h[i] = i;
|
||||
B_h[i] = 0;
|
||||
}
|
||||
|
||||
HIP_ASSERT(hipMalloc((void**)&A_d, valbytes));
|
||||
HIP_ASSERT(hipMalloc((void**)&B_d, valbytes));
|
||||
|
||||
HIP_ASSERT(hipMemcpy(A_d, A_h, valbytes, hipMemcpyHostToDevice));
|
||||
hipLaunchKernelGGL(copy, dim3(LEN/64), dim3(64), 0, 0, A_d, B_d);
|
||||
HIP_ASSERT(hipMemcpy(B_h, B_d, valbytes, hipMemcpyDeviceToHost));
|
||||
|
||||
for (uint32_t i = 0; i < LEN; i++) {
|
||||
assert(A_h[i] == B_h[i]);
|
||||
}
|
||||
|
||||
HIP_ASSERT(hipFree(A_d));
|
||||
HIP_ASSERT(hipFree(B_d));
|
||||
free(A_h);
|
||||
free(B_h);
|
||||
std::cout << "Test Passed!\n";
|
||||
}
|
||||
```
|
||||
|
||||
The above source file can be compiled into a static library, libHipOptLibrary.a, using the --emit-static-lib flag, like so:
|
||||
```
|
||||
hipcc hipOptLibrary.cpp --emit-static-lib -fPIC -o libHipOptLibrary.a
|
||||
```
|
||||
|
||||
### Main source files
|
||||
The main() program source file may link with the above static library using either hipcc or a host compiler (such as g++). A simple source file that calls the host function inside libHipOptLibrary.a:
|
||||
|
||||
hipMain1.cpp:
|
||||
```
|
||||
extern void run_test1();
|
||||
|
||||
int main(){
|
||||
run_test1();
|
||||
}
|
||||
```
|
||||
|
||||
To link to the static library:
|
||||
|
||||
Using hipcc:
|
||||
```
|
||||
hipcc hipMain1.cpp -L. -lHipOptLibrary -o test_emit_static_hipcc_linker.out
|
||||
```
|
||||
Using g++:
|
||||
```
|
||||
ROCM_PATH is the path where ROCM is installed. default path is /opt/rocm.
|
||||
g++ hipMain1.cpp -L. -lHipOptLibrary -L<ROCM_PATH>/hip/lib -lamdhip64 -o test_emit_static_host_linker.out
|
||||
# Compile to assembly and create an executable from modified asm
|
||||
|
||||
This sample shows how to generate the assembly code for a simple HIP source application, then re-compiling it and generating a valid HIP executable.
|
||||
|
||||
This sample uses a previous HIP application sample, please see [0_Intro/square](https://github.com/ROCm-Developer-Tools/HIP/blob/master/samples/0_Intro/square).
|
||||
|
||||
## Compiling the HIP source into assembly
|
||||
Using HIP flags `-c -S` will help generate the host x86_64 and the device AMDGCN assembly code when paired with `--cuda-host-only` and `--cuda-device-only` respectively. In this sample we use these commands:
|
||||
```
|
||||
<ROCM_PATH>/hip/bin/hipcc -c -S --cuda-host-only -target x86_64-linux-gnu -o square_host.s square.cpp
|
||||
<ROCM_PATH>/hip/bin/hipcc -c -S --cuda-device-only --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx1010 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 square.cpp
|
||||
```
|
||||
|
||||
## Static libraries with device functions
|
||||
The device assembly will be output into two separate files:
|
||||
- square-hip-amdgcn-amd-amdhsa-gfx900.s
|
||||
- square-hip-amdgcn-amd-amdhsa-gfx906.s
|
||||
- square-hip-amdgcn-amd-amdhsa-gfx908.s
|
||||
- square-hip-amdgcn-amd-amdhsa-gfx1010.s
|
||||
- square-hip-amdgcn-amd-amdhsa-gfx1030.s
|
||||
- square-hip-amdgcn-amd-amdhsa-gfx1100.s
|
||||
- square-hip-amdgcn-amd-amdhsa-gfx1101.s
|
||||
- square-hip-amdgcn-amd-amdhsa-gfx1102.s
|
||||
- square-hip-amdgcn-amd-amdhsa-gfx1103.s
|
||||
|
||||
### Source files
|
||||
The static library source files which contain only `__device__` functions need to be created using ar. Here is an example (please refer to the directory device_functions).
|
||||
You may modify `--offload-arch` flag to build other archs and choose to enable or disable xnack and sram-ecc.
|
||||
|
||||
hipDevice.cpp:
|
||||
**Note:** At this point, you may evaluate the assembly code, and make modifications if you are familiar with the AMDGCN assembly language and architecture.
|
||||
|
||||
## Compiling the assembly into a valid HIP executable
|
||||
If valid, the modified host and device assembly may be compiled into a HIP executable. The host assembly can be compiled into an object using this command:
|
||||
```
|
||||
#include <hip/hip_runtime.h>
|
||||
|
||||
__device__ int square_me(int A) {
|
||||
return A*A;
|
||||
}
|
||||
<ROCM_PATH>/hip/bin/hipcc -c square_host.s -o square_host.o
|
||||
```
|
||||
|
||||
The above source file may be compiled into a static library, libHipDevice.a, by first compiling into a relocatable object, and then placed in an archive using ar:
|
||||
However, the device assembly code will require a few extra steps. The device assemblies needs to be compiled into device objects, then offload-bundled into a HIP fat binary using the clang-offload-bundler, then llvm-mc embeds the binary inside of a host object using the MC directives provided in `hip_obj_gen.mcin`. The output is a host object with an embedded device object. Here are the steps for device side compilation into an object:
|
||||
```
|
||||
hipcc hipDevice.cpp -c -fgpu-rdc -fPIC -o hipDevice.o
|
||||
ar rcsD libHipDevice.a hipDevice.o
|
||||
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx900 square-hip-amdgcn-amd-amdhsa-gfx900.s -o square-hip-amdgcn-amd-amdhsa-gfx900.o
|
||||
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx906 square-hip-amdgcn-amd-amdhsa-gfx906.s -o square-hip-amdgcn-amd-amdhsa-gfx906.o
|
||||
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx908 square-hip-amdgcn-amd-amdhsa-gfx908.s -o square-hip-amdgcn-amd-amdhsa-gfx908.o
|
||||
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1010 square-hip-amdgcn-amd-amdhsa-gfx1010.s -o square-hip-amdgcn-amd-amdhsa-gfx1010.o
|
||||
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1030 square-hip-amdgcn-amd-amdhsa-gfx1030.s -o square-hip-amdgcn-amd-amdhsa-gfx1030.o
|
||||
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1100 square-hip-amdgcn-amd-amdhsa-gfx1100.s -o square-hip-amdgcn-amd-amdhsa-gfx1100.o
|
||||
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1101 square-hip-amdgcn-amd-amdhsa-gfx1101.s -o square-hip-amdgcn-amd-amdhsa-gfx1101.o
|
||||
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1102 square-hip-amdgcn-amd-amdhsa-gfx1102.s -o square-hip-amdgcn-amd-amdhsa-gfx1102.o
|
||||
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1103 square-hip-amdgcn-amd-amdhsa-gfx1103.s -o square-hip-amdgcn-amd-amdhsa-gfx1103.o
|
||||
<ROCM_PATH>/llvm/bin/clang-offload-bundler -type=o -bundle-align=4096 -targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx900,hip-amdgcn-amd-amdhsa-gfx906,hip-amdgcn-amd-amdhsa-gfx908,hip-amdgcn-amd-amdhsa-gfx1010,hip-amdgcn-amd-amdhsa-gfx1030,hip-amdgcn-amd-amdhsa-gfx1100,hip-amdgcn-amd-amdhsa-gfx1101,hip-amdgcn-amd-amdhsa-gfx1102,hip-amdgcn-amd-amdhsa-gfx1103 -inputs=/dev/null,square-hip-amdgcn-amd-amdhsa-gfx900.o,square-hip-amdgcn-amd-amdhsa-gfx906.o,square-hip-amdgcn-amd-amdhsa-gfx908.o,square-hip-amdgcn-amd-amdhsa-gfx1010.o,square-hip-amdgcn-amd-amdhsa-gfx1030.o,square-hip-amdgcn-amd-amdhsa-gfx1100.o,square-hip-amdgcn-amd-amdhsa-gfx1101.o,square-hip-amdgcn-amd-amdhsa-gfx1102.o,square-hip-amdgcn-amd-amdhsa-gfx1103.o -outputs=offload_bundle.hipfb
|
||||
<ROCM_PATH>/llvm/bin/llvm-mc -triple x86_64-unknown-linux-gnu hip_obj_gen.mcin -o square_device.o --filetype=obj
|
||||
```
|
||||
|
||||
### Main source files
|
||||
The main() program source file can link with the static library using hipcc. A simple source file that calls the device function inside libHipDevice.a:
|
||||
**Note:** Using option `-bundle-align=4096` only works on ROCm 4.0 and newer compilers. Also, the architecture must match the same arch as when compiling to assembly.
|
||||
|
||||
hipMain2.cpp:
|
||||
Finally, using the system linker, hipcc, or clang, link the host and device objects into an executable:
|
||||
```
|
||||
#include <hip/hip_runtime.h>
|
||||
#include <hip/hip_runtime_api.h>
|
||||
#include <iostream>
|
||||
|
||||
#define HIP_ASSERT(status) assert(status == hipSuccess)
|
||||
#define LEN 512
|
||||
|
||||
extern __device__ int square_me(int);
|
||||
|
||||
__global__ void square_and_save(int* A, int* B) {
|
||||
int tid = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
B[tid] = square_me(A[tid]);
|
||||
}
|
||||
|
||||
void run_test2() {
|
||||
int *A_h, *B_h, *A_d, *B_d;
|
||||
A_h = new int[LEN];
|
||||
B_h = new int[LEN];
|
||||
for (unsigned i = 0; i < LEN; i++) {
|
||||
A_h[i] = i;
|
||||
B_h[i] = 0;
|
||||
}
|
||||
size_t valbytes = LEN*sizeof(int);
|
||||
|
||||
HIP_ASSERT(hipMalloc((void**)&A_d, valbytes));
|
||||
HIP_ASSERT(hipMalloc((void**)&B_d, valbytes));
|
||||
|
||||
HIP_ASSERT(hipMemcpy(A_d, A_h, valbytes, hipMemcpyHostToDevice));
|
||||
hipLaunchKernelGGL(square_and_save, dim3(LEN/64), dim3(64),
|
||||
0, 0, A_d, B_d);
|
||||
HIP_ASSERT(hipMemcpy(B_h, B_d, valbytes, hipMemcpyDeviceToHost));
|
||||
|
||||
for (unsigned i = 0; i < LEN; i++) {
|
||||
assert(A_h[i]*A_h[i] == B_h[i]);
|
||||
}
|
||||
|
||||
HIP_ASSERT(hipFree(A_d));
|
||||
HIP_ASSERT(hipFree(B_d));
|
||||
free(A_h);
|
||||
free(B_h);
|
||||
std::cout << "Test Passed!\n";
|
||||
}
|
||||
|
||||
int main(){
|
||||
// Run test that generates static lib with ar
|
||||
run_test2();
|
||||
}
|
||||
<ROCM_PATH>/hip/bin/hipcc square_host.o square_device.o -o square_asm.out
|
||||
```
|
||||
|
||||
To link to the static library:
|
||||
## How to build and run this sample:
|
||||
- Build the sample using cmake
|
||||
```
|
||||
hipcc libHipDevice.a hipMain2.cpp -fgpu-rdc -o test_device_static_hipcc.out
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
|
||||
## How to build and run this sample:
|
||||
Use the make command to build the static libraries, link with it, and execute it.
|
||||
- Change directory to either host or device functions folder.
|
||||
- To build the static library and link the main executable, use `make all`.
|
||||
- To execute, run the generated executable `./test_*.out`.
|
||||
|
||||
Alternatively, use these CMake commands.
|
||||
- Execute sample
|
||||
```
|
||||
cd device_functions
|
||||
mkdir -p build
|
||||
cd build
|
||||
cmake ..
|
||||
make
|
||||
./test_*.out
|
||||
$ ./square_asm.out
|
||||
info: running on device AMD Radeon Graphics
|
||||
info: allocate host mem ( 7.63 MB)
|
||||
info: allocate device mem ( 7.63 MB)
|
||||
info: copy Host2Device
|
||||
info: launch 'vector_square' kernel
|
||||
info: copy Device2Host
|
||||
info: check result
|
||||
PASSED!
|
||||
```
|
||||
It is recommended to use Visual Studio's command prompt for this sample due to requirement of MS Librarian tool - LIB.exe on windows platform.
|
||||
Override CMAKE_C_COMPILER and CMAKE_CXX_COMPILER to hipcc as Visual Studio's compiler would use cl.exe as default compiler.
|
||||
i.e. cmake.exe -GNinja -DCMAKE_CXX_COMPILER_ID=ROCMClang -DCMAKE_C_COMPILER_ID=ROCMClang -DCMAKE_PREFIX_PATH=%HIP_PATH% -DCMAKE_C_COMPILER=%HIP_PATH%/bin/hipcc.bat -DCMAKE_CXX_COMPILER=%HIP_PATH%/bin/hipcc.bat ..
|
||||
|
||||
## For More Infomation, please refer to the HIP FAQ.
|
||||
**Note:** Currently, defined arch is `gfx900`, `gfx906`, `gfx908`, `gfx1010`,`gfx1030`,`gfx1100`,`gfx1101`,`gfx1102` and `gfx1103`. Any undefined arch can be modified with make argument `GPU_ARCHxx`.
|
||||
|
||||
## For More Information, please refer to the HIP FAQ.
|
||||
|
||||
@@ -56,12 +56,16 @@ Finally, using the system linker, hipcc, or clang, link the host and device obje
|
||||
```
|
||||
|
||||
## How to build and run this sample:
|
||||
Use these make commands to compile into assembly, compile assembly into executable, and execute it.
|
||||
- To compile the HIP application into host and device assembly: `make src_to_asm`.
|
||||
- To compile the assembly files into an executable: `make asm_to_exec`.
|
||||
- To execute, run
|
||||
- Build the sample using cmake
|
||||
```
|
||||
./square_asm.out
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
|
||||
- Execute sample
|
||||
```
|
||||
$ ./square_asm.out
|
||||
info: running on device AMD Radeon Graphics
|
||||
info: allocate host mem ( 7.63 MB)
|
||||
info: allocate device mem ( 7.63 MB)
|
||||
|
||||
@@ -84,14 +84,16 @@ Finally, using the system linker, hipcc, or clang, link the host and device obje
|
||||
If you haven't modified the GPU archs, this executable should run on the defined `gfx900`, `gfx906`, `gfx908`, `gfx1010`, `gfx1030`, `gfx1100`, `gfx1101`, `gfx1102` and `gfx1103`.
|
||||
|
||||
## How to build and run this sample:
|
||||
Use these make commands to compile into LLVM IR, compile IR into executable, and execute it.
|
||||
- To compile the HIP application into host and device LLVM IR: `make src_to_ir`.
|
||||
- To disassembly the LLVM IR bitcode into human readable LLVM IR: `make bc_to_ll`.
|
||||
- To assembly the human readable LLVM IR bitcode back into LLVM IR bitcode: `make ll_to_bc`.
|
||||
- To compile the LLVM IR files into an executable: `make ir_to_exec`.
|
||||
- To execute, run
|
||||
- Build the sample using cmake
|
||||
```
|
||||
./square_ir.out
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
|
||||
- Execute sample
|
||||
```
|
||||
$ ./square_ir.out
|
||||
info: running on device AMD Radeon Graphics
|
||||
info: allocate host mem ( 7.63 MB)
|
||||
info: allocate device mem ( 7.63 MB)
|
||||
|
||||
@@ -2,15 +2,14 @@
|
||||
I. Build
|
||||
|
||||
```
|
||||
mkdir -p build; cd build
|
||||
rm -rf *;
|
||||
CXX="$(hipconfig -l)"/clang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
make
|
||||
$ mkdir build; cd build
|
||||
$ CXX="$(hipconfig -l)"/clang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
$ make
|
||||
```
|
||||
|
||||
Note, users may need to add ADMGPU support as command line option, if test failed to run, for example,
|
||||
```
|
||||
CXX="$(hipconfig -l)"/clang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
|
||||
$ CXX="$(hipconfig -l)"/clang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
|
||||
```
|
||||
|
||||
II. Test
|
||||
|
||||
@@ -8,28 +8,28 @@ I. Prepare
|
||||
|
||||
II. Build
|
||||
```
|
||||
mkdir -p build; cd build
|
||||
rm -rf *;
|
||||
CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
cmake ..
|
||||
make
|
||||
$ mkdir -p build; cd build
|
||||
$ rm -rf *;
|
||||
$ CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
$ cmake ..
|
||||
$ make
|
||||
```
|
||||
|
||||
Note, users may need to add AMD GPU support, if test failed, for example,
|
||||
```
|
||||
CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
|
||||
$ CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
|
||||
```
|
||||
To enable compiler auto detection of gpu users may need to add ADMGPU support as command line option,
|
||||
To enable compiler auto detection of gpu users may need to add ADMGPU support as command line option,
|
||||
if test failed to run, for example,
|
||||
```
|
||||
CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=native ..
|
||||
$ CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=native ..
|
||||
```
|
||||
III. Test
|
||||
```
|
||||
./test_fortran
|
||||
$ ./test_fortran
|
||||
Succeeded testing Fortran!
|
||||
|
||||
./test_cpp
|
||||
$ ./test_cpp
|
||||
Device name AMD Radeon Graphics
|
||||
PASSED!
|
||||
```
|
||||
|
||||
@@ -66,8 +66,21 @@ Here the first parameter will store the time taken value, second parameter is th
|
||||
We can print the value of time take comfortably since eventMs is float variable.
|
||||
|
||||
## How to build and run:
|
||||
Use the make command and execute it using ./exe
|
||||
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./hipEvent
|
||||
Device name Navi 14 [Radeon Pro W5500]
|
||||
hipMemcpyHostToDevice time taken = 0.981ms
|
||||
kernel Execution time = 0.539ms
|
||||
hipMemcpyDeviceToHost time taken = 1.220ms
|
||||
PASSED!
|
||||
```
|
||||
|
||||
## More Info:
|
||||
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
|
||||
|
||||
@@ -2,14 +2,14 @@
|
||||
I. Build
|
||||
|
||||
```
|
||||
mkdir -p build; cd build
|
||||
rm -rf *;
|
||||
CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
make
|
||||
$ mkdir -p build; cd build
|
||||
$ rm -rf *;
|
||||
$ CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
$ make
|
||||
```
|
||||
To enable compiler auto detection of gpu users may need to add ADMGPU support as command line option, if test failed to run, for example,
|
||||
```
|
||||
CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
|
||||
$ CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
|
||||
```
|
||||
II. Test
|
||||
|
||||
|
||||
@@ -2,10 +2,10 @@
|
||||
I. Build
|
||||
|
||||
```
|
||||
mkdir -p build; cd build
|
||||
rm -rf *;
|
||||
cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
make
|
||||
$ mkdir -p build; cd build
|
||||
$ rm -rf *;
|
||||
$ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
$ make
|
||||
```
|
||||
|
||||
II. Test
|
||||
|
||||
@@ -2,10 +2,10 @@
|
||||
I. Build
|
||||
|
||||
```
|
||||
mkdir -p build; cd build
|
||||
rm -rf *;
|
||||
CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
make
|
||||
$ mkdir -p build; cd build
|
||||
$ rm -rf *;
|
||||
$ CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
|
||||
$ make
|
||||
```
|
||||
|
||||
II. Test
|
||||
|
||||
@@ -28,8 +28,18 @@ Be careful while using shared memory, since all threads within the block can acc
|
||||
` __syncthreads();`
|
||||
|
||||
## How to build and run:
|
||||
Use the make command and execute it using ./exe
|
||||
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./sharedMemory
|
||||
Device name Navi 14 [Radeon Pro W5500]
|
||||
PASSED!
|
||||
```
|
||||
|
||||
## More Info:
|
||||
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
|
||||
|
||||
@@ -36,9 +36,18 @@ In this tutorial, we'll use `__shfl()` ops. In the same sourcecode, we used for
|
||||
Be careful while using shfl operations, since all exchanges are possible between the threads of corresponding warp only.
|
||||
|
||||
## How to build and run:
|
||||
Use the make command and execute it using ./exe
|
||||
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
|
||||
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./shfl
|
||||
Device name Navi 14 [Radeon Pro W5500]
|
||||
PASSED!
|
||||
```
|
||||
## requirement for nvidia
|
||||
please make sure you have a 3.0 or higher compute capable device in order to use warp shfl operations and add `-gencode arch=compute=30, code=sm_30` nvcc flag in the Makefile while using this application.
|
||||
|
||||
|
||||
@@ -38,9 +38,18 @@ In the same sourcecode, we used for MatrixTranspose. We'll add the following:
|
||||
With the help of this application, we can say that kernel code can be converted into multi-dimensional threads with ease.
|
||||
|
||||
## How to build and run:
|
||||
Use the make command and execute it using ./exe
|
||||
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
|
||||
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./2dshfl
|
||||
Device name Navi 14 [Radeon Pro W5500]
|
||||
PASSED!
|
||||
```
|
||||
## requirement for nvidia
|
||||
please make sure you have a 3.0 or higher compute capable device in order to use warp shfl operations and add `-gencode arch=compute=30, code=sm_30` nvcc flag in the Makefile while using this application.
|
||||
|
||||
|
||||
@@ -38,9 +38,18 @@ The other important change is:
|
||||
here we replaced 4th parameter with amount of additional shared memory to allocate when launching the kernel.
|
||||
|
||||
## How to build and run:
|
||||
Use the make command and execute it using ./exe
|
||||
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
|
||||
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./dynamic_shared
|
||||
Device name Navi 14 [Radeon Pro W5500]
|
||||
dynamic_shared PASSED!
|
||||
```
|
||||
## More Info:
|
||||
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
|
||||
- [HIP Kernel Language](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md)
|
||||
|
||||
@@ -49,8 +49,17 @@ and while kernel launch, we make the following changes in 5th parameter to hipLa
|
||||
here we replaced 4th parameter with amount of additional shared memory to allocate when launching the kernel.
|
||||
|
||||
## How to build and run:
|
||||
Use the make command and execute it using ./exe
|
||||
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./stream
|
||||
stream PASSED!
|
||||
```
|
||||
|
||||
## More Info:
|
||||
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
|
||||
|
||||
@@ -0,0 +1,12 @@
|
||||
# peer2peer
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./peer2peer
|
||||
Peer2Peer application requires atleast 2 gpu devices
|
||||
```
|
||||
@@ -16,7 +16,7 @@ Programmers familiar with CUDA, OpenCL will be able to quickly learn and start c
|
||||
|
||||
## Simple Matrix Transpose
|
||||
|
||||
For this tutorial we will be using an example which sums up the row of a 2D matrix and writes it in a 1D array.
|
||||
For this tutorial we will be using an example which sums up the row of a 2D matrix and writes it in a 1D array.
|
||||
|
||||
In this tutorial, we'll use `#pragma unroll`. In the same sourcecode, we used for gpuMatrixRowSum. We'll add it just before the for loop as following:
|
||||
|
||||
@@ -31,9 +31,18 @@ Specifying the optional parameter, #pragma unroll value, directs the unroller to
|
||||
Specifying #pragma nounroll indicates that the loop should not be unroll. #pragma unroll 1 will show the same behaviour.
|
||||
|
||||
## How to build and run:
|
||||
Use the make command and execute it using ./exe
|
||||
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
|
||||
|
||||
- Build the sample using cmake
|
||||
```
|
||||
$ mkdir build; cd build
|
||||
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
|
||||
$ make
|
||||
```
|
||||
- Execute the sample
|
||||
```
|
||||
$ ./unroll
|
||||
Device name
|
||||
PASSED
|
||||
```
|
||||
## requirement for nvidia
|
||||
please make sure you have a 3.0 or higher compute capable device in order to use warp shfl operations and add `-gencode arch=compute=30, code=sm_30` nvcc flag in the Makefile while using this application.
|
||||
|
||||
|
||||
@@ -40,3 +40,5 @@ Note that if you want debug version, add "-DCMAKE_BUILD_TYPE=Debug" in cmake cmd
|
||||
cmake ../samples
|
||||
|
||||
make package_samples
|
||||
|
||||
## Note: sample 2_Cookbook/22_cmake_hip_lang is current not included in toplevel cmake. To build this sample from toplevel cmake, uncomment Line 43 inside samples/2_Cookbook/CMakeLists.txt.
|
||||
|
||||
Ссылка в новой задаче
Block a user