SWDEV-436821 Update hip samples Readme files

Change-Id: I6bf3a72eac4a4242cb2dbf4e6eee73e0e1bef2ef


[ROCm/hip-tests commit: 76dd8ea569]
Этот коммит содержится в:
Rahul Manocha
2023-12-11 03:43:30 +00:00
родитель 91c9372c33
Коммит b299b79a77
29 изменённых файлов: 471 добавлений и 237 удалений
+23
Просмотреть файл
@@ -4,3 +4,26 @@ Show an application written directly in HIP which uses platform-specific check o
an instruction that only exists on the AMD platform.
See related [blog](http://gpuopen.com/platform-aware-coding-inside-hip/) demonstrating platform specialization.
- Steps to build this sample:
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute File
```
$ ./bit_extract
pch size: 11743288
__hipGetPCH succeeded!
info: running on device #0
info: allocate host mem ( 7.63 MB)
info: allocate device mem ( 7.63 MB)
info: copy Host2Device
info: launch 'bit_extract_kernel'
info: copy Device2Host
info: check result
PASSED!
```
+19
Просмотреть файл
@@ -0,0 +1,19 @@
# module_api
- Steps to build this sample
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute Code
```
$ ./launchKernelHcc.hip.out
PASSED!
$ ./runKernel.hip.out
PASSED!
$ ./defaultDriver.hip.out
PASSED!
```
+17
Просмотреть файл
@@ -0,0 +1,17 @@
# module_api_global
- Steps to build this sample
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute Code
```
$ ./runKernel1.hip.out
PASSED!
Shared Size Bytes = 0
Num Regs = 3
PASSED!
```
-3
Просмотреть файл
@@ -3,19 +3,16 @@
Simple test below is an example, shows how to use hipify-perl to port CUDA code to HIP:
- Add hip/bin path to the PATH
```
$ export PATH=$PATH:[MYHIP]/bin
```
- Define environment variable
```
$ export HIP_PATH=[MYHIP]
```
- Build executable file
```
$ cd ~/hip/samples/0_Intro/square
mkdir -p build && cd build
+26
Просмотреть файл
@@ -0,0 +1,26 @@
# hipDispatchLatency.cpp
- Steps to build this sample
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute Code
```
$ ./hipDispatchEnqueueRateMT 1 0
Thread ID : 0 , hipModuleLaunchKernel enqueue rate: 0.8 us, std: 0.1 us
$ ./hipDispatchEnqueueRateMT 1 1
Thread ID : 0 , hipLaunchKernelGGL enqueue rate: 1.0 us, std: 0.1 us
$ ./hipDispatchLatency
hipModuleLaunchKernel enqueue rate: 0.8 us, std: 0.1 us
hipLaunchKernelGGL enqueue rate: 1.0 us, std: 0.1 us
Timing around single dispatch latency: 8.1 us, std: 4.7 us
Batch dispatch latency: 1.4 us, std: 0.0 us
```
+79
Просмотреть файл
@@ -4,3 +4,82 @@ Simple tool that prints properties for each device (from hipGetDeviceProperties)
Properties includes all of the architectural feature flags for each device.
Also demonstrates how to use platform-specific compilation path (testing `__HIP_PLATFORM_AMD__` or `__HIP_PLATFORM_NVIDIA__`)
- Steps to build this sample
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute Code
```
$ ./hipInfo
--------------------------------------------------------------------------------
device# 0
Name:
pciBusID: 103
pciDeviceID: 0
pciDomainID: 0
multiProcessorCount: 64
maxThreadsPerMultiProcessor: 2560
isMultiGpuBoard: 0
clockRate: 1800 Mhz
memoryClockRate: 1000 Mhz
memoryBusWidth: 4096
totalGlobalMem: 31.98 GB
totalConstMem: 2147483647
sharedMemPerBlock: 64.00 KB
canMapHostMemory: 1
regsPerBlock: 65536
warpSize: 64
l2CacheSize: 8388608
computeMode: 0
maxThreadsPerBlock: 1024
maxThreadsDim.x: 1024
maxThreadsDim.y: 1024
maxThreadsDim.z: 1024
maxGridSize.x: 2147483647
maxGridSize.y: 65536
maxGridSize.z: 65536
major: 9
minor: 0
concurrentKernels: 1
cooperativeLaunch: 1
cooperativeMultiDeviceLaunch: 1
isIntegrated: 0
maxTexture1D: 16384
maxTexture2D.width: 16384
maxTexture2D.height: 16384
maxTexture3D.width: 16384
maxTexture3D.height: 16384
maxTexture3D.depth: 8192
hostNativeAtomicSupported: 1
isLargeBar: 1
asicRevision: 1
maxSharedMemoryPerMultiProcessor: 64.00 KB
clockInstructionRate: 1000.00 Mhz
arch.hasGlobalInt32Atomics: 1
arch.hasGlobalFloatAtomicExch: 1
arch.hasSharedInt32Atomics: 1
arch.hasSharedFloatAtomicExch: 1
arch.hasFloatAtomicAdd: 1
arch.hasGlobalInt64Atomics: 1
arch.hasSharedInt64Atomics: 1
arch.hasDoubles: 1
arch.hasWarpVote: 1
arch.hasWarpBallot: 1
arch.hasWarpShuffle: 1
arch.hasFunnelShift: 0
arch.hasThreadFenceSystem: 1
arch.hasSyncThreadsExt: 0
arch.hasSurfaceFuncs: 0
arch.has3dGrid: 1
arch.hasDynamicParallelism: 0
gcnArchName: gfx906:sramecc+:xnack-
peers:
non-peers: device#0
memInfo.total: 31.98 GB
memInfo.free: 31.96 GB (100%)
```
+13 -2
Просмотреть файл
@@ -87,8 +87,19 @@ After, copying the data from device to memory, we will verify it with the one we
Finally, we will free the memory allocated earlier by using free() for host while for devices we will use `hipFree`.
## How to build and run:
Use the make command and execute it using ./exe
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./MatrixTranspose
Device name Navi 14 [Radeon Pro W5500]
PASSED!
```
## More Info:
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
+15 -3
Просмотреть файл
@@ -45,9 +45,21 @@ Index for the respective operand in the ordered fashion is provided by `%` follo
Output Constraints are specified by an `"="` prefix as shown above ("=v"). This indicate that assemby will write to this operand, and the operand will then be made available as a return value of the asm expression. Input constraints do not have a prefix - just the constraint code. The constraint string of `"0"` says to use the assigned register for output as an input as well (it being the 0'th constraint).
## How to build and run:
Use the make command and execute it using ./exe
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./inline_asm
Device name
hipMemcpyHostToDevice time taken = 1.057ms
kernel Execution time = 0.509ms
hipMemcpyDeviceToHost time taken = 1.254ms
PASSED!
```
## More Info:
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
+21
Просмотреть файл
@@ -0,0 +1,21 @@
# texture_driver
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./texture2dDrv
tex2dKernelChar test PASSED ...
tex2dKernelShort test PASSED ...
tex2dKernelInt test PASSED ...
tex2dKernelFloat test PASSED ...
tex2dKernelChar4 test PASSED ...
tex2dKernelShort4 test PASSED ...
tex2dKernelInt4 test PASSED ...
tex2dKernelFloat4 test PASSED ...
texture2dDrv PASSED ...
```
+13 -12
Просмотреть файл
@@ -32,20 +32,21 @@ The macro supports specifying CLANG-specific, NVCC-specific compiler options usi
Common options targeting both compilers can be specificed after the ```HIPCC_OPTIONS``` keyword.
## How to build and run:
Use the following commands to build and execute the sample
- Build sample using cmake
```
$ mkdir build; cd build
# For shared lib of hip rt,
$ cmake ..
# Or for static lib of hip rt,
$ cmake -DCMAKE_PREFIX_PATH="/opt/rocm/llvm/lib/cmake" ..
$ make
```
mkdir build
cd build
For shared lib of hip rt,
cmake ..
Or for static lib of hip rt,
cmake -DCMAKE_PREFIX_PATH="/opt/rocm/llvm/lib/cmake" ..
Then,
make
./MatrixTranspose
- Execute the sample
```
$ ./MatrixTranspose
Device name
PASSED!
```
## More Info:
+24
Просмотреть файл
@@ -0,0 +1,24 @@
# occupancy
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./occupancy
Manual Configuration with block size 32
kernel Execution time = 0.433ms
Theoretical Occupancy is 40%
Automatic Configuation based on hipOccupancyMaxPotentialBlockSize
Suggested blocksize is 1024, Minimum gridsize is 128
kernel Execution time = 0.037ms
Theoretical Occupancy is 80%
Manual Test PASSED!
Automatic Test PASSED!
```
+15
Просмотреть файл
@@ -0,0 +1,15 @@
# gpu_arch
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./gpuarch
success
```
## Note : This sample works on architectures gfx908 and above
+60 -158
Просмотреть файл
@@ -1,179 +1,81 @@
# Emitting Static Library
This sample shows how to generate a static library for a simple HIP application. We will evaluate two types of static libraries: the first type exports host functions in a static library generated with --emit-static-lib and is compatible with host linkers, and second type exports device functions in a static library made with system ar.
Please refer to the hip_programming_guide for limitations.
## Static libraries with host functions
### Source files
The static library source files may contain host functions and kernel `__global__` and `__device__` functions. Here is an example (please refer to the directory host_functions).
hipOptLibrary.cpp:
```
#define HIP_ASSERT(status) assert(status == hipSuccess)
#define LEN 512
__global__ void copy(uint32_t* A, uint32_t* B) {
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
B[tid] = A[tid];
}
void run_test1() {
uint32_t *A_h, *B_h, *A_d, *B_d;
size_t valbytes = LEN * sizeof(uint32_t);
A_h = (uint32_t*)malloc(valbytes);
B_h = (uint32_t*)malloc(valbytes);
for (uint32_t i = 0; i < LEN; i++) {
A_h[i] = i;
B_h[i] = 0;
}
HIP_ASSERT(hipMalloc((void**)&A_d, valbytes));
HIP_ASSERT(hipMalloc((void**)&B_d, valbytes));
HIP_ASSERT(hipMemcpy(A_d, A_h, valbytes, hipMemcpyHostToDevice));
hipLaunchKernelGGL(copy, dim3(LEN/64), dim3(64), 0, 0, A_d, B_d);
HIP_ASSERT(hipMemcpy(B_h, B_d, valbytes, hipMemcpyDeviceToHost));
for (uint32_t i = 0; i < LEN; i++) {
assert(A_h[i] == B_h[i]);
}
HIP_ASSERT(hipFree(A_d));
HIP_ASSERT(hipFree(B_d));
free(A_h);
free(B_h);
std::cout << "Test Passed!\n";
}
```
The above source file can be compiled into a static library, libHipOptLibrary.a, using the --emit-static-lib flag, like so:
```
hipcc hipOptLibrary.cpp --emit-static-lib -fPIC -o libHipOptLibrary.a
```
### Main source files
The main() program source file may link with the above static library using either hipcc or a host compiler (such as g++). A simple source file that calls the host function inside libHipOptLibrary.a:
hipMain1.cpp:
```
extern void run_test1();
int main(){
run_test1();
}
```
To link to the static library:
Using hipcc:
```
hipcc hipMain1.cpp -L. -lHipOptLibrary -o test_emit_static_hipcc_linker.out
```
Using g++:
```
ROCM_PATH is the path where ROCM is installed. default path is /opt/rocm.
g++ hipMain1.cpp -L. -lHipOptLibrary -L<ROCM_PATH>/hip/lib -lamdhip64 -o test_emit_static_host_linker.out
# Compile to assembly and create an executable from modified asm
This sample shows how to generate the assembly code for a simple HIP source application, then re-compiling it and generating a valid HIP executable.
This sample uses a previous HIP application sample, please see [0_Intro/square](https://github.com/ROCm-Developer-Tools/HIP/blob/master/samples/0_Intro/square).
## Compiling the HIP source into assembly
Using HIP flags `-c -S` will help generate the host x86_64 and the device AMDGCN assembly code when paired with `--cuda-host-only` and `--cuda-device-only` respectively. In this sample we use these commands:
```
<ROCM_PATH>/hip/bin/hipcc -c -S --cuda-host-only -target x86_64-linux-gnu -o square_host.s square.cpp
<ROCM_PATH>/hip/bin/hipcc -c -S --cuda-device-only --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx1010 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 square.cpp
```
## Static libraries with device functions
The device assembly will be output into two separate files:
- square-hip-amdgcn-amd-amdhsa-gfx900.s
- square-hip-amdgcn-amd-amdhsa-gfx906.s
- square-hip-amdgcn-amd-amdhsa-gfx908.s
- square-hip-amdgcn-amd-amdhsa-gfx1010.s
- square-hip-amdgcn-amd-amdhsa-gfx1030.s
- square-hip-amdgcn-amd-amdhsa-gfx1100.s
- square-hip-amdgcn-amd-amdhsa-gfx1101.s
- square-hip-amdgcn-amd-amdhsa-gfx1102.s
- square-hip-amdgcn-amd-amdhsa-gfx1103.s
### Source files
The static library source files which contain only `__device__` functions need to be created using ar. Here is an example (please refer to the directory device_functions).
You may modify `--offload-arch` flag to build other archs and choose to enable or disable xnack and sram-ecc.
hipDevice.cpp:
**Note:** At this point, you may evaluate the assembly code, and make modifications if you are familiar with the AMDGCN assembly language and architecture.
## Compiling the assembly into a valid HIP executable
If valid, the modified host and device assembly may be compiled into a HIP executable. The host assembly can be compiled into an object using this command:
```
#include <hip/hip_runtime.h>
__device__ int square_me(int A) {
return A*A;
}
<ROCM_PATH>/hip/bin/hipcc -c square_host.s -o square_host.o
```
The above source file may be compiled into a static library, libHipDevice.a, by first compiling into a relocatable object, and then placed in an archive using ar:
However, the device assembly code will require a few extra steps. The device assemblies needs to be compiled into device objects, then offload-bundled into a HIP fat binary using the clang-offload-bundler, then llvm-mc embeds the binary inside of a host object using the MC directives provided in `hip_obj_gen.mcin`. The output is a host object with an embedded device object. Here are the steps for device side compilation into an object:
```
hipcc hipDevice.cpp -c -fgpu-rdc -fPIC -o hipDevice.o
ar rcsD libHipDevice.a hipDevice.o
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx900 square-hip-amdgcn-amd-amdhsa-gfx900.s -o square-hip-amdgcn-amd-amdhsa-gfx900.o
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx906 square-hip-amdgcn-amd-amdhsa-gfx906.s -o square-hip-amdgcn-amd-amdhsa-gfx906.o
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx908 square-hip-amdgcn-amd-amdhsa-gfx908.s -o square-hip-amdgcn-amd-amdhsa-gfx908.o
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1010 square-hip-amdgcn-amd-amdhsa-gfx1010.s -o square-hip-amdgcn-amd-amdhsa-gfx1010.o
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1030 square-hip-amdgcn-amd-amdhsa-gfx1030.s -o square-hip-amdgcn-amd-amdhsa-gfx1030.o
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1100 square-hip-amdgcn-amd-amdhsa-gfx1100.s -o square-hip-amdgcn-amd-amdhsa-gfx1100.o
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1101 square-hip-amdgcn-amd-amdhsa-gfx1101.s -o square-hip-amdgcn-amd-amdhsa-gfx1101.o
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1102 square-hip-amdgcn-amd-amdhsa-gfx1102.s -o square-hip-amdgcn-amd-amdhsa-gfx1102.o
<ROCM_PATH>/hip/../llvm/bin/clang -target amdgcn-amd-amdhsa -mcpu=gfx1103 square-hip-amdgcn-amd-amdhsa-gfx1103.s -o square-hip-amdgcn-amd-amdhsa-gfx1103.o
<ROCM_PATH>/llvm/bin/clang-offload-bundler -type=o -bundle-align=4096 -targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx900,hip-amdgcn-amd-amdhsa-gfx906,hip-amdgcn-amd-amdhsa-gfx908,hip-amdgcn-amd-amdhsa-gfx1010,hip-amdgcn-amd-amdhsa-gfx1030,hip-amdgcn-amd-amdhsa-gfx1100,hip-amdgcn-amd-amdhsa-gfx1101,hip-amdgcn-amd-amdhsa-gfx1102,hip-amdgcn-amd-amdhsa-gfx1103 -inputs=/dev/null,square-hip-amdgcn-amd-amdhsa-gfx900.o,square-hip-amdgcn-amd-amdhsa-gfx906.o,square-hip-amdgcn-amd-amdhsa-gfx908.o,square-hip-amdgcn-amd-amdhsa-gfx1010.o,square-hip-amdgcn-amd-amdhsa-gfx1030.o,square-hip-amdgcn-amd-amdhsa-gfx1100.o,square-hip-amdgcn-amd-amdhsa-gfx1101.o,square-hip-amdgcn-amd-amdhsa-gfx1102.o,square-hip-amdgcn-amd-amdhsa-gfx1103.o -outputs=offload_bundle.hipfb
<ROCM_PATH>/llvm/bin/llvm-mc -triple x86_64-unknown-linux-gnu hip_obj_gen.mcin -o square_device.o --filetype=obj
```
### Main source files
The main() program source file can link with the static library using hipcc. A simple source file that calls the device function inside libHipDevice.a:
**Note:** Using option `-bundle-align=4096` only works on ROCm 4.0 and newer compilers. Also, the architecture must match the same arch as when compiling to assembly.
hipMain2.cpp:
Finally, using the system linker, hipcc, or clang, link the host and device objects into an executable:
```
#include <hip/hip_runtime.h>
#include <hip/hip_runtime_api.h>
#include <iostream>
#define HIP_ASSERT(status) assert(status == hipSuccess)
#define LEN 512
extern __device__ int square_me(int);
__global__ void square_and_save(int* A, int* B) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
B[tid] = square_me(A[tid]);
}
void run_test2() {
int *A_h, *B_h, *A_d, *B_d;
A_h = new int[LEN];
B_h = new int[LEN];
for (unsigned i = 0; i < LEN; i++) {
A_h[i] = i;
B_h[i] = 0;
}
size_t valbytes = LEN*sizeof(int);
HIP_ASSERT(hipMalloc((void**)&A_d, valbytes));
HIP_ASSERT(hipMalloc((void**)&B_d, valbytes));
HIP_ASSERT(hipMemcpy(A_d, A_h, valbytes, hipMemcpyHostToDevice));
hipLaunchKernelGGL(square_and_save, dim3(LEN/64), dim3(64),
0, 0, A_d, B_d);
HIP_ASSERT(hipMemcpy(B_h, B_d, valbytes, hipMemcpyDeviceToHost));
for (unsigned i = 0; i < LEN; i++) {
assert(A_h[i]*A_h[i] == B_h[i]);
}
HIP_ASSERT(hipFree(A_d));
HIP_ASSERT(hipFree(B_d));
free(A_h);
free(B_h);
std::cout << "Test Passed!\n";
}
int main(){
// Run test that generates static lib with ar
run_test2();
}
<ROCM_PATH>/hip/bin/hipcc square_host.o square_device.o -o square_asm.out
```
To link to the static library:
## How to build and run this sample:
- Build the sample using cmake
```
hipcc libHipDevice.a hipMain2.cpp -fgpu-rdc -o test_device_static_hipcc.out
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
## How to build and run this sample:
Use the make command to build the static libraries, link with it, and execute it.
- Change directory to either host or device functions folder.
- To build the static library and link the main executable, use `make all`.
- To execute, run the generated executable `./test_*.out`.
Alternatively, use these CMake commands.
- Execute sample
```
cd device_functions
mkdir -p build
cd build
cmake ..
make
./test_*.out
$ ./square_asm.out
info: running on device AMD Radeon Graphics
info: allocate host mem ( 7.63 MB)
info: allocate device mem ( 7.63 MB)
info: copy Host2Device
info: launch 'vector_square' kernel
info: copy Device2Host
info: check result
PASSED!
```
It is recommended to use Visual Studio's command prompt for this sample due to requirement of MS Librarian tool - LIB.exe on windows platform.
Override CMAKE_C_COMPILER and CMAKE_CXX_COMPILER to hipcc as Visual Studio's compiler would use cl.exe as default compiler.
i.e. cmake.exe -GNinja -DCMAKE_CXX_COMPILER_ID=ROCMClang -DCMAKE_C_COMPILER_ID=ROCMClang -DCMAKE_PREFIX_PATH=%HIP_PATH% -DCMAKE_C_COMPILER=%HIP_PATH%/bin/hipcc.bat -DCMAKE_CXX_COMPILER=%HIP_PATH%/bin/hipcc.bat ..
## For More Infomation, please refer to the HIP FAQ.
**Note:** Currently, defined arch is `gfx900`, `gfx906`, `gfx908`, `gfx1010`,`gfx1030`,`gfx1100`,`gfx1101`,`gfx1102` and `gfx1103`. Any undefined arch can be modified with make argument `GPU_ARCHxx`.
## For More Information, please refer to the HIP FAQ.
+9 -5
Просмотреть файл
@@ -56,12 +56,16 @@ Finally, using the system linker, hipcc, or clang, link the host and device obje
```
## How to build and run this sample:
Use these make commands to compile into assembly, compile assembly into executable, and execute it.
- To compile the HIP application into host and device assembly: `make src_to_asm`.
- To compile the assembly files into an executable: `make asm_to_exec`.
- To execute, run
- Build the sample using cmake
```
./square_asm.out
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute sample
```
$ ./square_asm.out
info: running on device AMD Radeon Graphics
info: allocate host mem ( 7.63 MB)
info: allocate device mem ( 7.63 MB)
+9 -7
Просмотреть файл
@@ -84,14 +84,16 @@ Finally, using the system linker, hipcc, or clang, link the host and device obje
If you haven't modified the GPU archs, this executable should run on the defined `gfx900`, `gfx906`, `gfx908`, `gfx1010`, `gfx1030`, `gfx1100`, `gfx1101`, `gfx1102` and `gfx1103`.
## How to build and run this sample:
Use these make commands to compile into LLVM IR, compile IR into executable, and execute it.
- To compile the HIP application into host and device LLVM IR: `make src_to_ir`.
- To disassembly the LLVM IR bitcode into human readable LLVM IR: `make bc_to_ll`.
- To assembly the human readable LLVM IR bitcode back into LLVM IR bitcode: `make ll_to_bc`.
- To compile the LLVM IR files into an executable: `make ir_to_exec`.
- To execute, run
- Build the sample using cmake
```
./square_ir.out
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute sample
```
$ ./square_ir.out
info: running on device AMD Radeon Graphics
info: allocate host mem ( 7.63 MB)
info: allocate device mem ( 7.63 MB)
+4 -5
Просмотреть файл
@@ -2,15 +2,14 @@
I. Build
```
mkdir -p build; cd build
rm -rf *;
CXX="$(hipconfig -l)"/clang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
make
$ mkdir build; cd build
$ CXX="$(hipconfig -l)"/clang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
$ make
```
Note, users may need to add ADMGPU support as command line option, if test failed to run, for example,
```
CXX="$(hipconfig -l)"/clang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
$ CXX="$(hipconfig -l)"/clang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
```
II. Test
+10 -10
Просмотреть файл
@@ -8,28 +8,28 @@ I. Prepare
II. Build
```
mkdir -p build; cd build
rm -rf *;
CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
cmake ..
make
$ mkdir -p build; cd build
$ rm -rf *;
$ CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
$ cmake ..
$ make
```
Note, users may need to add AMD GPU support, if test failed, for example,
```
CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
$ CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
```
To enable compiler auto detection of gpu users may need to add ADMGPU support as command line option,
To enable compiler auto detection of gpu users may need to add ADMGPU support as command line option,
if test failed to run, for example,
```
CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=native ..
$ CXX="$(hipconfig -l)"/clang++ FC=$(which gfortran) cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=native ..
```
III. Test
```
./test_fortran
$ ./test_fortran
Succeeded testing Fortran!
./test_cpp
$ ./test_cpp
Device name AMD Radeon Graphics
PASSED!
```
+15 -2
Просмотреть файл
@@ -66,8 +66,21 @@ Here the first parameter will store the time taken value, second parameter is th
We can print the value of time take comfortably since eventMs is float variable.
## How to build and run:
Use the make command and execute it using ./exe
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./hipEvent
Device name Navi 14 [Radeon Pro W5500]
hipMemcpyHostToDevice time taken = 0.981ms
kernel Execution time = 0.539ms
hipMemcpyDeviceToHost time taken = 1.220ms
PASSED!
```
## More Info:
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
+5 -5
Просмотреть файл
@@ -2,14 +2,14 @@
I. Build
```
mkdir -p build; cd build
rm -rf *;
CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
make
$ mkdir -p build; cd build
$ rm -rf *;
$ CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
$ make
```
To enable compiler auto detection of gpu users may need to add ADMGPU support as command line option, if test failed to run, for example,
```
CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
$ CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS="gfx1102" ..
```
II. Test
+4 -4
Просмотреть файл
@@ -2,10 +2,10 @@
I. Build
```
mkdir -p build; cd build
rm -rf *;
cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
make
$ mkdir -p build; cd build
$ rm -rf *;
$ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
$ make
```
II. Test
+4 -4
Просмотреть файл
@@ -2,10 +2,10 @@
I. Build
```
mkdir -p build; cd build
rm -rf *;
CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
make
$ mkdir -p build; cd build
$ rm -rf *;
$ CXX="$(hipconfig -l)"/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm ..
$ make
```
II. Test
+12 -2
Просмотреть файл
@@ -28,8 +28,18 @@ Be careful while using shared memory, since all threads within the block can acc
` __syncthreads();`
## How to build and run:
Use the make command and execute it using ./exe
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./sharedMemory
Device name Navi 14 [Radeon Pro W5500]
PASSED!
```
## More Info:
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
+12 -3
Просмотреть файл
@@ -36,9 +36,18 @@ In this tutorial, we'll use `__shfl()` ops. In the same sourcecode, we used for
Be careful while using shfl operations, since all exchanges are possible between the threads of corresponding warp only.
## How to build and run:
Use the make command and execute it using ./exe
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./shfl
Device name Navi 14 [Radeon Pro W5500]
PASSED!
```
## requirement for nvidia
please make sure you have a 3.0 or higher compute capable device in order to use warp shfl operations and add `-gencode arch=compute=30, code=sm_30` nvcc flag in the Makefile while using this application.
+12 -3
Просмотреть файл
@@ -38,9 +38,18 @@ In the same sourcecode, we used for MatrixTranspose. We'll add the following:
With the help of this application, we can say that kernel code can be converted into multi-dimensional threads with ease.
## How to build and run:
Use the make command and execute it using ./exe
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./2dshfl
Device name Navi 14 [Radeon Pro W5500]
PASSED!
```
## requirement for nvidia
please make sure you have a 3.0 or higher compute capable device in order to use warp shfl operations and add `-gencode arch=compute=30, code=sm_30` nvcc flag in the Makefile while using this application.
+12 -3
Просмотреть файл
@@ -38,9 +38,18 @@ The other important change is:
here we replaced 4th parameter with amount of additional shared memory to allocate when launching the kernel.
## How to build and run:
Use the make command and execute it using ./exe
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./dynamic_shared
Device name Navi 14 [Radeon Pro W5500]
dynamic_shared PASSED!
```
## More Info:
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
- [HIP Kernel Language](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md)
+11 -2
Просмотреть файл
@@ -49,8 +49,17 @@ and while kernel launch, we make the following changes in 5th parameter to hipLa
here we replaced 4th parameter with amount of additional shared memory to allocate when launching the kernel.
## How to build and run:
Use the make command and execute it using ./exe
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./stream
stream PASSED!
```
## More Info:
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
+12
Просмотреть файл
@@ -0,0 +1,12 @@
# peer2peer
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./peer2peer
Peer2Peer application requires atleast 2 gpu devices
```
+13 -4
Просмотреть файл
@@ -16,7 +16,7 @@ Programmers familiar with CUDA, OpenCL will be able to quickly learn and start c
## Simple Matrix Transpose
For this tutorial we will be using an example which sums up the row of a 2D matrix and writes it in a 1D array.
For this tutorial we will be using an example which sums up the row of a 2D matrix and writes it in a 1D array.
In this tutorial, we'll use `#pragma unroll`. In the same sourcecode, we used for gpuMatrixRowSum. We'll add it just before the for loop as following:
@@ -31,9 +31,18 @@ Specifying the optional parameter, #pragma unroll value, directs the unroller to
Specifying #pragma nounroll indicates that the loop should not be unroll. #pragma unroll 1 will show the same behaviour.
## How to build and run:
Use the make command and execute it using ./exe
Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.
- Build the sample using cmake
```
$ mkdir build; cd build
$ cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm
$ make
```
- Execute the sample
```
$ ./unroll
Device name
PASSED
```
## requirement for nvidia
please make sure you have a 3.0 or higher compute capable device in order to use warp shfl operations and add `-gencode arch=compute=30, code=sm_30` nvcc flag in the Makefile while using this application.
+2
Просмотреть файл
@@ -40,3 +40,5 @@ Note that if you want debug version, add "-DCMAKE_BUILD_TYPE=Debug" in cmake cmd
cmake ../samples
make package_samples
## Note: sample 2_Cookbook/22_cmake_hip_lang is current not included in toplevel cmake. To build this sample from toplevel cmake, uncomment Line 43 inside samples/2_Cookbook/CMakeLists.txt.