rocm-systems/samples/2_Cookbook/9_unroll/Readme.md at bc528b1e8bffe797ff9dba27f8a514b7adbfaee2

Files

T

Nick Curtis 5257b54a39 Markdown fixes & Whitespace cleanup for samples (#1096 )

* Fix multiline code blocks in README's

* Whitespace cleanup

2019-05-12 19:27:44 +05:30

2.8 KiB

Rå Blame Historik

Using Pragma unroll

In this tutorial, we'll explain how to use #pragma unroll to improve the performance.

Introduction:

Loop unrolling optimization hints can be specified with #pragma unroll and #pragma nounroll. The pragma is placed immediately before a for loop. Specifying #pragma unroll without a parameter directs the loop unroller to attempt to fully unroll the loop if the trip count is known at compile time and attempt to partially unroll the loop if the trip count is not known at compile time.

Requirement:

For hardware requirement and software installation Installation

prerequiste knowledge:

Programmers familiar with CUDA, OpenCL will be able to quickly learn and start coding with the HIP API. In case you are not, don't worry. You choose to start with the best one. We'll be explaining everything assuming you are completely new to gpgpu programming.

Simple Matrix Transpose

For this tutorial we will be using MatrixTranspose with shfl operation i.e., our 4_shfl tutorial since it is the only examples where we used loops inside the kernel.

In this tutorial, we'll use #pragma unroll. In the same sourcecode, we used for MatrixTranspose. We'll add it just before the for loop as following:

#pragma unroll
	for(int i=0;i<width;i++)
	{
		for(int j=0;j<width;j++)
			out[i*width + j] = __shfl(val,j*width + i);
	}

Specifying the optional parameter, #pragma unroll value, directs the unroller to unroll the loop value times. Be careful while using it. Specifying #pragma nounroll indicates that the loop should not be unroll. #pragma unroll 1 will show the same behaviour.

How to build and run:

Use the make command and execute it using ./exe Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.

requirement for nvidia

please make sure you have a 3.0 or higher compute capable device in order to use warp shfl operations and add -gencode arch=compute=30, code=sm_30 nvcc flag in the Makefile while using this application.

More Info:

HIP FAQ
HIP Kernel Language
HIP Runtime API (Doxygen)
HIP Porting Guide
HIP Terminology (including Rosetta Stone of GPU computing terms across CUDA/HIP/HC/AMP/OpenL)
clang-hipify
Developer/CONTRIBUTING Info
Release Notes

2.8 KiB Rå Blame Historik