2
0

Remove HIP_MARKER left overs due to HIP PR 2032

Change-Id: Ieae68dd3b12c92b1d6830619ca4c6ae43c400225


[ROCm/hip commit: 819677825f]
Este cometimento está contido em:
Rahul Garg
2020-05-05 22:58:40 +00:00
ascendente 94570bacc2
cometimento 4efc743802
5 ficheiros modificados com 0 adições e 615 eliminações
-17
Ver ficheiro
@@ -248,9 +248,6 @@ if ($HIP_PLATFORM eq "hcc" and $HIP_COMPILER eq "clang") {
$HCC_VERSION_MAJOR=$HCC_VERSION;
$HCC_VERSION_MAJOR=~s/\..*//;
$HIP_ATP_MARKER=$ENV{'HIP_ATP_MARKER'} // 1;
$marker_path = "$ROCM_PATH/profiler/CXLActivityLogger";
# HCC* may be used to compile src/hip_hcc.o (and also feed the HIPCXXFLAGS below)
$HCC = "$HCC_HOME/bin/hcc";
$HCCFLAGS = "-hc -D__HIPCC__ -isystem $HCC_HOME/include ";
@@ -298,20 +295,6 @@ if ($HIP_PLATFORM eq "hcc" and $HIP_COMPILER eq "clang") {
$HIPLDFLAGS .= " -L$HSA_PATH/lib -L$ROCM_PATH/lib -lhsa-runtime64 -lhc_am ";
# $HIPLDFLAGS .= " -L$HCC_HOME/compiler/lib -lLLVMAMDGPUDesc -lLLVMAMDGPUUtils -lLLVMMC -lLLVMCore -lLLVMSupport ";
# Add trace marker library:
# TODO - once we cleanly separate the HIP API headers from HIP library headers this logic should move to CMakebuild option - apps do not need to see the marker library.
if ($HIP_ATP_MARKER) {
$marker_inc_path = "$marker_path/include";
if (-e $marker_inc_path) {
$HIPCXXFLAGS .= " -isystem $marker_inc_path";
}
}
$marker_lib_path = "$marker_path/bin/x86_64";
if (-e $marker_lib_path) {
$HIPLDFLAGS .= " -L$marker_lib_path -lCXLActivityLogger -Wl,--rpath=$marker_lib_path";
}
if (not $isWindows) {
$HIPLDFLAGS .= " -lm";
}
-279
Ver ficheiro
@@ -1,279 +0,0 @@
# Profiling HIP Code
This section describes the profiling and debugging capabilities that HIP provides.
Profiling information can viewed in the CodeXL visualization tool or printed directly to stderr as the application runs.
This document starts with some of the general capabilities of CodeXL and then describes some of the additional HIP marker and debug features.
<!-- toc -->
- [CodeXL Profiling](#codexl-profiling)
* [Collecting and Viewing Traces](#collecting-and-viewing-traces)
+ [Using rocm-profiler timestamp profiling](#using-rocm-profiler-timestamp-profiling)
+ [Using rocm-profiler performance counter collection:](#using-rocm-profiler-performance-counter-collection)
+ [Using CodeXL to view profiling results:](#using-codexl-to-view-profiling-results)
+ [More information on CodeXL](#more-information-on-codexl)
* [HIP Markers](#hip-markers)
+ [Profiling HIP APIs](#profiling-hip-apis)
+ [Adding markers to applications](#adding-markers-to-applications)
* [Additional HIP Profiling Features](#additional-hip-profiling-features)
+ [Demangling C++ Kernel Names](#demangling-c-kernel-names)
+ [Controlling when profiling starts and ends](#controlling-when-profiling-starts-and-ends)
+ [Reducing timeline trace output file size](#reducing-timeline-trace-output-file-size)
+ [How to enable profiling at HIP build time](#how-to-enable-profiling-at-hip-build-time)
- [Tracing and Debug](#tracing-and-debug)
* [Tracing HIP APIs](#tracing-hip-apis)
+ [Color](#color)
<!-- tocstop -->
## CodeXL Profiling
### Collecting and Viewing Traces
#### Using rocm-profiler timestamp profiling
rocm-profiler is a command-line tool for tracing any application that uses ROCr API, including HCC and HIP.
rocm-profiler's timeline trace will show the beginning and end for all kernel commands, data transfer commands, and HSA Runtime (ROCr) API calls. The trace results are saved into a file, which by convention uses the "atp" extension. Here is an example that shows how to run the command-line profiler:
```shell
$ /opt/rocm/bin/rocm-profiler -o <outputATPFileName> -A -T <applicationName> <applicationArguments>
```
#### Using rocm-profiler performance counter collection:
rocm-profiler can record performance counter information to provide greater insight inside a kernel, such as the memory bandwidth, ALU busy percentage, and cache statistics.
Collecting the common set of useful counters requires passing the counter configuration files for two passes:
```
$ /opt/rocm/bin/rocm-profiler -C -O --counterfile /opt/rocm/profiler/counterfiles/counters_HSA_Fiji_pass1 --counterfile /opt/rocm/profiler/counterfiles/counters_HSA_Fiji_pass2 <applicationName> <applicationArguments>
```
#### Using CodeXL to view profiling results:
The trace can be loaded and viewed in the CodeXL visualization tool:
- Open the CodeXL GUI, create an new project, and switch to "Profile Mode":
- $ CodeXL &
- [File->New Project, leave fields as is, just click "OK"]
- [Profile->Switch to Profile Mode]
- Load timestamp tracing results into a timeline view:
- Right click on the project in the CodeXL Explorer view
- Click "Import Session..."
- Select to $HOME/apitrace.atp (or appropriate .atp file if you used another file name)
- Load the performance counter results
- Right click on the project in the CodeXL Explorer view
- Click "Import Session..."
- Select $HOME/Session1.csv (or appropriate .csv file if you used another file name)
#### More information on CodeXL
rocm-profiler --help will show additional options and usage guidelines.
See this [blog](http://gpuopen.com/getting-up-to-speed-with-the-codexl-gpu-profiler-and-radeon-open-compute/) for more information on profiling ROCm apps (including HIP) with CodeXL.
The 2.2 version of Windows CodeXL does not correctly handle Linux line-endings. If you are collecting a trace on Linux and then viewing it with the 2.2 Windows CodeXL, first convert the line ending in the .atp file to Windows-style line endings.
### HIP Markers
#### Profiling HIP APIs
HIP can generate markers at function beginning and end which are displayed on the CodeXL timeline view.
HIP 1.0 compiles marker support by default, and you can enable it by setting the HIP_PROFILE_API environment variable and then running the rocm-profiler:
```shell
# Use profile to generate timeline view:
export HIP_PROFILE_API=1
$ /opt/rocm/bin/rocm-profiler -A -T <applicationName> <applicationArguments>
Or
$ /opt/rocm/bin/rocm-profiler -e HIP_PROFILE_API=1 -A -T <applicationName> <applicationArguments>
```
HIP_PROFILE_API supports two levels of information.
- HIP_PROFILE_API=1 : Short format. Print name of API but no arguments. For example:
`hipMemcpy`
- HIP_PROFILE_API=2 : Long format. Print name of API + values of all function arguments. For example:
`hipMemcpy (0x7f32154db010, 0x50446e000, 4000000, hipMemcpyDeviceToHost)`
#### Adding markers to applications
Markers can be used to define application-specific events that will be recorded in the ATP file and displayed in the CodeXL GUI.
This can be particularly useful for visualizing how the higher-level phases of application behavior relate to the lower level HIP APIs, kernel launches, and data transfers.
For example, an instrumented machine learning framework could show the beginning and ending of each layer in the network.
Markers have a specific begin and end time, and can be nested. Nested calls are displayed hierarchically in the CodeXL GUI, with each level of the hierarchy occupying a different row.
The HIP APis are defined in "hip_profile.h":
```
#include <hip/hip_profile.h>
HIP_BEGIN_MARKER(const char *markerName, const char *groupName);
HIP_END_MARKER();
HIP_BEGIN_MARKER("Setup", "MyAppGroup");
// ...
// application code for setup
// ...
HIP_END_MARKER();
```
For C++ codes, HIP also provides a scoped marker which records the start time when constructed and the end time when the scoped marker is destructed at the end of the scope. This provides a convenient, single-line mechanism to record an event that neatly corresponds to a region of code.
```cxx
void FunctionFoo(...)
{
HIP_SCOPED_MARKER("FunctionFoo", "MyAppGroup"); // Marker starts recording here.
// ...
// Function implementation
// ...
// Marker destroyed here and records end time stamp.
};
```
The HIP marker API is only supported on ROCm platform. The marker macros are defined on CUDA platforms and will compile, but are silently ignored at runtime.
This [HIP sample](https://github.com/ROCm-Developer-Tools/HIP/tree/master/samples/2_Cookbook/2_Profiler) shows the profiler marker API used in a small application.
More information on the marker API can be found in the profiler header file and PDF in a ROCm installation:
- /opt/rocm/profiler/CXLActivityLogger/include/CXLActivityLogger.h
- /opt/rocm/profiler/CXLActivityLogger/doc/CXLActivityLogger.pdf
### Additional HIP Profiling Features
#### Demangling C++ Kernel Names
HIP includes the `hipdemangleatp` tool which can post-process an ATP file to "demangle" C++ names.
Mangled kernel names encode the C++ arguments and other information, and are guaranteed to be unique even for cases such as operator overloading. However, the mangled names can be quite verbose. For example:
`ZZ39gemm_NoTransA_MICRO_NBK_M_N_K_TS16XMTS4RN2hc16accelerator_viewEPKflS3_lPfliiiiiiffEN3_EC__719__cxxamp_trampolineElililiiiiiiS3_iS3_S4_ff`
`hipdemangleatp` will convert this into the more readable:
`gemm_NoTransA_MICRO_NBK_M_N_K_TS16XMTS4`
The `hipdemangleatp` tool operates on the ATP file "in-place" and thus replaces the input file with the demangled version.
```
$ hipdemangleatp myfile.atp
```
The kernel name is also shown in some of the summary htlm files (Top10 kernels). These can be regenerated from the demangled ATP file by re-running rocm-profiler:
```
$ rocm-profiler -T --atpfile myfile.atp
```
A future version of CodeXL may directly integrate demangle functionality.
#### Controlling when profiling starts and ends
hipProfilerStart() and hipProfilerEnd() can be inserted into an application to control which phases of the applications are profiled.
These APIs can be used to skip initialization code or to focus profiling on a desired region, and are particularly useful for large long-running applications.
See the API documentation for more information. These APIs work on both ROCm and CUDA paths.
On ROCm, the following environment variables can be used to control when profiling occurs:
```
HIP_DB_START_API : Comma-separated list of tid.api_seq_num for when to start debug and profiling.
HIP_DB_STOP_API : Comma-separated list of tid.api_seq_num for when to stop debug and profiling.
```
HIP/ROCm assigns a monotonically increasing sequence number to the APIs called from each thread. The thread and API sequence number can be used in the above API to control when tracing starts and stops. These flags also control the HIP_DB messages (described below).
When using these options, start the profiler with profiling disabled:
```
# ROCm:
$ rocm-profiler --startdisabled ...
# CUDA:
$ nvprof --profile-from-start-off ...
```
This feature is under development.
#### Reducing timeline trace output file size
If the application is already recording the HIP APIs, the HSA APIs are somewhat redundant and the ATP file size can be substantially reduced by not recording these APIs. HIP includes a text file that lists all of the HSA APIs and can assist in this filtering:
```
$ rocm-profiler -F hip/bin/hsa-api-filter-cxl.txt
```
This file can be copied and edited to provide more selective HSA event recording.
#### How to enable profiling at HIP build time
Pre-built packages of HIP are not built with profiling support enabled.You must enable marker support manually when compiling HIP.
1. Build HIP with ATP markers enabled
HIP pre-built packages are enabled with ATP marker support by default.
To enable ATP marker support when building HIP from source, use the option ```-DCOMPILE_HIP_ATP_MARKER=1``` during the cmake configure step. Build and install HIP.
```shell
$ mkdir build && cd build
$ cmake .. -DCOMPILE_HIP_ATP_MARKER
$ make install
```
2. Install ROCm-Profiler
Installing HIP from the [rocm](http://gpuopen.com/getting-started-with-boltzmann-components-platforms-installation/) pre-built packages, installs the ROCm-Profiler as well.
Alternatively, you can build ROCm-Profiler using the instructions [here](https://github.com/RadeonOpenCompute/ROCm-Profiler#building-the-rocm-profiler).
3. Recompile the target application
Then follow the steps above to collect a marker-enabled trace.
## Tracing and Debug
### Tracing HIP APIs
The HIP runtime can print the HIP function strings to stderr using HIP_TRACE_API environment variable.
The trace prints two messages for each API - one at the beginning of the API call (line starts with "<<") and one at the end of the API call (line ends with ">>").
Here's an example for one API followed by a description for the sections of the trace:
```
<<hip-api tid:1.6 hipMemcpy (0x7f32154db010, 0x50446e000, 4000000, hipMemcpyDeviceToHost)
hip-api tid:1.6 hipMemcpy ret= 0 (hipSuccess)>>
```
- `<<hip-api` is the header used for all HIP API debug messages. The message is also shown in a specific color. This can be used to distinguish this API from other HIP or application messages.
- `tid:1.6` indicates that this API call came from thread #1 and is the 6th API call in that thread. When the first API in a new thread is called, HIP will associates a short sequential ID with that thread. You can see the full thread ID (reported by C++) as 0x7f6183b097c0 in the example below.
- `hipMemcpy` is the name of the API.
- The first line then prints a comma-separated list of the arguments to the function. APIs which return values to the caller by writing to pointers will show the pointer addresses rather than the pointer contents. This behavior may change in the future.
- The second line shows the completion of the API, including the numeric return value (`ret= 0`) as well as an string representation for the error code (`hipSuccess`). If the returned error code is non-zero, then the csecond line message is shown in red (unless HIP_TRACE_API_COLOR is "none" - see below).
Heres a specific example showing the output of the [square](https://github.com/ROCm-Developer-Tools/HIP/tree/master/samples/0_Intro/square) program running on HIP:
```
$ HIP_TRACE_API=1 ./square.hip.out
hip-api tid:1:HIP initialized short_tid#1 (maps to full_tid: 0x7f6183b097c0)
<<hip-api tid:1.1 hipGetDeviceProperties (0x7ffddb673e08, 0)
hip-api tid:1.1 hipGetDeviceProperties ret= 0 (hipSuccess)>>
info: running on device gfx803
info: allocate host mem ( 7.63 MB)
info: allocate device mem ( 7.63 MB)
<<hip-api tid:1.2 hipMalloc (0x7ffddb673fb8, 4000000)
hip-api tid:1.2 hipMalloc ret= 0 (hipSuccess)>>
<<hip-api tid:1.3 hipMalloc (0x7ffddb673fb0, 4000000)
hip-api tid:1.3 hipMalloc ret= 0 (hipSuccess)>>
info: copy Host2Device
<<hip-api tid:1.4 hipMemcpy (0x50409d000, 0x7f32158ac010, 4000000, hipMemcpyHostToDevice)
hip-api tid:1.4 hipMemcpy ret= 0 (hipSuccess)>>
info: launch 'vector_square' kernel
1.5 hipLaunchKernel 'HIP_KERNEL_NAME(vector_square)' gridDim:{512,1,1} groupDim:{256,1,1} sharedMem:+0 stream#0.0
info: copy Device2Host
<<hip-api tid:1.6 hipMemcpy (0x7f32154db010, 0x50446e000, 4000000, hipMemcpyDeviceToHost)
hip-api tid:1.6 hipMemcpy ret= 0 (hipSuccess)>>
info: check result
PASSED!
```
HIP_TRACE_API supports multiple levels of debug information:
- 0x1 = print all HIP APIs. This is the most verbose setting; the flags below allow selecting a subset.
- 0x2 = print HIP APIs which initiate GPU kernel commands. Includes hipLaunchKernel, hipLaunchModuleKernel
- 0x4 = print HIP APIs which initiate GPU memory commands. Includes hipMemcpy*, hipMemset*.
- 0x8 = print HIP APIs which allocate or free memory. Includes hipMalloc, hipHostMalloc, hipFree, hipHostFree.
These can be combined. For example, HIP_TRACE_API=6 shows a concise view of the HIP commands (both kernel and memory) that are sent to the GPU.
#### Color
Note this trace mode uses colors. "less -r" can handle raw control characters and will display the debug output in proper colors.
You can change the color used for the trace mode with the HIP_TRACE_API_COLOR environment variable. Possible values are None/Red/Green/Yellow/Blue/Magenta/Cyan/White.
None will disable use of color control codes for both the opening and closing and may be useful when saving the trace file or when a pure text trace is desired.
-53
Ver ficheiro
@@ -1,53 +0,0 @@
HIP_PATH?= $(wildcard /opt/rocm/hip)
HIPCC=$(HIP_PATH)/bin/hipcc
HIPPROFILER=/opt/rocm/bin/rocm-profiler
PROFILER_OPT=-A -o MT.atp -e HIP_PROFILE_API=1
HIPPROFILER_POST_CMD=$(HIP_PATH)/bin/hipdemangleatp MT.atp
TARGET=hcc
SOURCES = MatrixTranspose.cpp
OBJECTS = $(SOURCES:.cpp=.o)
EXECUTABLE=./MatrixTranspose
.PHONY: test
all: $(EXECUTABLE) profile
OPT =-g
CXXFLAGS =$(OPT)
CXX=$(HIPCC)
$(EXECUTABLE): $(OBJECTS)
$(HIPCC) $(OBJECTS) -o $@
profile: $(EXECUTABLE)
$(HIPPROFILER) $(PROFILER_OPT) $(EXECUTABLE)
$(HIPPROFILER_POST_CMD)
# Pass option to control start and stop iterations for profiling - see MatrixTranspose.cpp for implementation:
# Note we start profiler in --startdisabled mode - no timing collected until app enabled it via hipProfilerStart()
profile_trigger: $(EXECUTABLE)
$(HIPPROFILER) $(PROFILER_OPT) --startdisabled $(EXECUTABLE) 3 6
$(HIPPROFILER_POST_CMD)
run: $(EXECUTABLE)
$(EXECUTABLE)
clean:
rm -f $(EXECUTABLE)
rm -f $(OBJECTS)
rm -f $(HIP_PATH)/src/*.o
@@ -1,219 +0,0 @@
/*
Copyright (c) 2015-present Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/
#include <iostream>
// hip header file
#include "hip/hip_runtime.h"
#include "hip/hip_profile.h"
#define WIDTH 1024
#define NUM (WIDTH * WIDTH)
#define THREADS_PER_BLOCK_X 4
#define THREADS_PER_BLOCK_Y 4
#define THREADS_PER_BLOCK_Z 1
#define ITERATIONS 10
// Cmdline parms to control start and stop triggers
int startTriggerIteration = -1;
int stopTriggerIteration = -1;
// Device (Kernel) function, it must be void
__global__ void matrixTranspose(float* out, float* in, const int width) {
int x = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x;
int y = hipBlockDim_y * hipBlockIdx_y + hipThreadIdx_y;
out[y * width + x] = in[x * width + y];
}
// CPU implementation of matrix transpose
void matrixTransposeCPUReference(float* output, float* input, const unsigned int width) {
for (unsigned int j = 0; j < width; j++) {
for (unsigned int i = 0; i < width; i++) {
output[i * width + j] = input[j * width + i];
}
}
}
// Use a separate function to demonstrate how to use function name as part of scoped marker:
void runGPU(float* Matrix, float* TransposeMatrix, float* gpuMatrix, float* gpuTransposeMatrix) {
// __func__ is a standard C++ macro which expands to the name of the function, in this case
// "runGPU"
HIP_SCOPED_MARKER(__func__, "MyGroup");
for (int i = 0; i < ITERATIONS; i++) {
if (i == startTriggerIteration) {
hipProfilerStart();
}
if (i == stopTriggerIteration) {
hipProfilerStop();
}
float eventMs = 0.0f;
hipEvent_t start, stop;
hipEventCreate(&start);
hipEventCreate(&stop);
// Record the start event
hipEventRecord(start, NULL);
// Memory transfer from host to device
hipMemcpy(gpuMatrix, Matrix, NUM * sizeof(float), hipMemcpyHostToDevice);
// Record the stop event
hipEventRecord(stop, NULL);
hipEventSynchronize(stop);
hipEventElapsedTime(&eventMs, start, stop);
printf("hipMemcpyHostToDevice time taken = %6.3fms\n", eventMs);
// Record the start event
hipEventRecord(start, NULL);
// Lauching kernel from host
hipLaunchKernelGGL(matrixTranspose,
dim3(WIDTH / THREADS_PER_BLOCK_X, WIDTH / THREADS_PER_BLOCK_Y),
dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y), 0, 0, gpuTransposeMatrix,
gpuMatrix, WIDTH);
// Record the stop event
hipEventRecord(stop, NULL);
hipEventSynchronize(stop);
hipEventElapsedTime(&eventMs, start, stop);
printf("kernel Execution time = %6.3fms\n", eventMs);
// Record the start event
hipEventRecord(start, NULL);
// Memory transfer from device to host
hipMemcpy(TransposeMatrix, gpuTransposeMatrix, NUM * sizeof(float), hipMemcpyDeviceToHost);
// Record the stop event
hipEventRecord(stop, NULL);
hipEventSynchronize(stop);
hipEventElapsedTime(&eventMs, start, stop);
printf("hipMemcpyDeviceToHost time taken = %6.3fms\n", eventMs);
}
};
int main(int argc, char* argv[]) {
if (argc >= 2) {
startTriggerIteration = atoi(argv[1]);
printf("info : will start tracing at iteration:%d\n", startTriggerIteration);
}
if (argc >= 3) {
stopTriggerIteration = atoi(argv[2]);
printf("info : will stop tracing at iteration:%d\n", stopTriggerIteration);
}
float* Matrix;
float* TransposeMatrix;
float* cpuTransposeMatrix;
float* gpuMatrix;
float* gpuTransposeMatrix;
hipDeviceProp_t devProp;
hipGetDeviceProperties(&devProp, 0);
std::cout << "Device name " << devProp.name << std::endl;
{
// Show example of how to create a "scoped marker".
// The scoped marker records the time spent inside the { scope } of the marker - the begin
// timestamp is at the beginning of the code scope, and the end is recorded when the SCOPE
// exits. This can be viewed in CodeXL timeline relative to other GPU and CPU events. This
// marker captures the time spent in setup including host allocation, initialization, and
// device memory allocation.
HIP_SCOPED_MARKER("Setup", "MyGroup");
Matrix = (float*)malloc(NUM * sizeof(float));
TransposeMatrix = (float*)malloc(NUM * sizeof(float));
cpuTransposeMatrix = (float*)malloc(NUM * sizeof(float));
// initialize the input data
for (int i = 0; i < NUM; i++) {
Matrix[i] = (float)i * 10.0f;
}
// allocate the memory on the device side
hipMalloc((void**)&gpuMatrix, NUM * sizeof(float));
hipMalloc((void**)&gpuTransposeMatrix, NUM * sizeof(float));
// FYI, the scoped-marker will be destroyed here when the scope exits, and will record its
// "end" timestamp.
}
runGPU(Matrix, TransposeMatrix, gpuMatrix, gpuTransposeMatrix);
// show how to use explicit begin/end markers:
// We begin the timed region with HIP_BEGIN_MARKER, passing in the markerName and group:
// The region will stop when HIP_END_MARKER is called
// This is another way to mark begin/end - as an alternative to scoped markers.
HIP_BEGIN_MARKER("Check&TearDown", "MyGroup");
int errors = 0;
// CPU MatrixTranspose computation
matrixTransposeCPUReference(cpuTransposeMatrix, Matrix, WIDTH);
// verify the results
double eps = 1.0E-6;
for (int i = 0; i < NUM; i++) {
if (std::abs(TransposeMatrix[i] - cpuTransposeMatrix[i]) > eps) {
errors++;
}
}
if (errors != 0) {
printf("FAILED: %d errors\n", errors);
} else {
printf("PASSED!\n");
}
// free the resources on device side
hipFree(gpuMatrix);
hipFree(gpuTransposeMatrix);
// free the resources on host side
free(Matrix);
free(TransposeMatrix);
free(cpuTransposeMatrix);
// This ends the last marker started in this thread, in this case "Check&TearDown"
HIP_END_MARKER();
return errors;
}
-47
Ver ficheiro
@@ -1,47 +0,0 @@
## Using hipEvents to measure performance ###
This tutorial is follow-up of the previous two tutorial where we learn how to write our first hip program, in which we compute Matrix Transpose and in second one, we added feature to measure time taken for memory transfer and kernel execution. In this tutorial, we'll explain how to use the codexl/rocm-profiler for hip timeline tracing. Also, we will augment the source code with additional markers so we can see the high-level application flow alongside the information that CodeXL automatically collects.
## Introduction:
CodeXL and rocm-profiler are the tool used for profiling the application, which is of prominent use in optimizing the application by means of finding the memory bottlenecks and etc.
## Requirement:
[CodeXL Installation](http://gpuopen.com/compute-product/codexl/)
## prerequiste knowledge:
Programmers familiar with CUDA, OpenCL will be able to quickly learn and start coding with the HIP API. In case you are not, don't worry. You choose to start with the best one. We'll be explaining everything assuming you are completely new to gpgpu programming.
## Simple Matrix Transpose
We will be using the Simple Matrix Transpose source code from the previous tutorial as it is.
## Using CodeXL markers for HIP Functions
HIP can generate markers at function being/end which are displayed on the CodeXL timeline view. To do this, you need to install ROCm-Profiler and enable HIP to generate the markers:
1. Install ROCm-Profiler Installing HIP from the rocm pre-built packages, installs the ROCm-Profiler as well. Alternatively, you can build ROCm-Profiler using the instructions given below.
2. Run with profiler enabled to generate ATP file.
(These steps are also captured in the Makefile)
The HIP_PROFILE_API enables display of the HIP APIs on the CodeXL trimeline view.
`/opt/rocm/bin/rocm-profiler -o <outputATPFileName> -A <applicationName> -e HIP_PROFILE_API=1 <applicationArguments>`
##Using HIP_TRACE_API
You can also print the HIP function strings to stderr using HIP_TRACE_API environment variable. This can also be combined with the more detailed debug information provided by the HIP_DB switch. For example:
`HIP_TRACE_API=1 HIP_DB=0x2 ./myHipApp`
Note this trace mode uses colors. "less -r" can handle raw control characters and will display the debug output in proper colors.
## More Info:
- [HIP FAQ](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md)
- [HIP Kernel Language](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md)
- [HIP Runtime API (Doxygen)](http://rocm-developer-tools.github.io/HIP)
- [HIP Porting Guide](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_porting_guide.md)
- [HIP Terminology](https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_terms.md) (including Rosetta Stone of GPU computing terms across CUDA/HIP/HC/AMP/OpenL)
- [HIPIFY](https://github.com/ROCm-Developer-Tools/HIP/blob/master/hipify-clang/README.md)
- [Developer/CONTRIBUTING Info](https://github.com/ROCm-Developer-Tools/HIP/blob/master/CONTRIBUTING.md)
- [Release Notes](https://github.com/ROCm-Developer-Tools/HIP/blob/master/RELEASE.md)