ファイル
rocm-systems/samples/2_Cookbook/6_dynamic_shared
Vladislav Sytchenko 08c35854c0 Revert "Revert "Merge branch 'amd-master-next' into amd-npi-next""
This reverts commit 051c84bba3.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I92ceb171e31026ed1864704cef2fc1497b883ef9
2020-10-05 13:20:58 -04:00
..
2016-10-19 12:58:16 -04:00

Using Dynamic shared memory

Earlier we learned how to use static shared memory. In this tutorial, we'll explain how to use the dynamic version of shared memory to improve the performance.

Introduction:

As we mentioned earlier that Memory bottlenecks is the main problem why we are not able to get the highest performance, therefore minimizing the latency for memory access plays prominent role in application optimization. In this tutorial, we'll learn how to use dynamic shared memory.

Requirement:

For hardware requirement and software installation Installation

prerequiste knowledge:

Programmers familiar with CUDA, OpenCL will be able to quickly learn and start coding with the HIP API. In case you are not, don't worry. You choose to start with the best one. We'll be explaining everything assuming you are completely new to gpgpu programming.

Simple Matrix Transpose

We will be using the Simple Matrix Transpose application from the previous tutorial and modify it to learn how to use shared memory.

Shared Memory

Shared memory is way more faster than that of global and constant memory and accessible to all the threads in the block. For In the same sourcecode, we will use the HIP_DYNAMIC_SHARED keyword to declare dynamic shared memory as follows:

HIP_DYNAMIC_SHARED(float, sharedMem) here the first parameter is the data type while the second one is the variable name.

The other important change is:

    hipLaunchKernelGGL(matrixTranspose,
                  dim3(WIDTH/THREADS_PER_BLOCK_X, WIDTH/THREADS_PER_BLOCK_Y),
                  dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y),
                  sizeof(float)*WIDTH*WIDTH, 0,
                  gpuTransposeMatrix , gpuMatrix, WIDTH);

here we replaced 4th parameter with amount of additional shared memory to allocate when launching the kernel.

How to build and run:

Use the make command and execute it using ./exe Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.

More Info: