Fixing Readme & File structure & Typos

Change-Id: I354b4e0e0448c3e8555d0b766b2c410c8049c2ff


[ROCm/rocprofiler commit: f308e36798]
This commit is contained in:
Ammar ELWazir
2023-05-31 15:41:49 +00:00
zatwierdzone przez Ammar Elwazir
rodzic a305118408
commit f033633c04
12 zmienionych plików z 170 dodań i 450 usunięć
+155 -151
Wyświetl plik
@@ -4,45 +4,49 @@ The information presented in this document is for informational purposes only an
© 2022 Advanced Micro Devices, Inc. All Rights Reserved.
## ROC Profiler library version 1.0
## ROC Profiler v1
## Introduction
Profiling with metrics and traces based on perfcounters (PMC) and traces (SPM).
Implementation is based on AqlProfile HSA extension.
Library supports GFX8/GFX9.
The last API library version for ROCProfiler v1 is 8.0.0
The library source tree:
- doc - Documentation
- include/rocprofiler/rocprofiler.h - Library public API
- include/rocprofiler/v2/rocprofiler.h - V2 Beta Library public API
- include/rocprofiler/v2/rocprofiler_plugins.h - V2 Beta Tool's Plugins Library public API
- src - Library sources
- core - Library API sources
- util - Library utils sources
- xml - XML parser
- test - Library test suite
- ctrl - Test controll
- util - Test utils
- simple_convolution - Simple convolution test kernel
## Build environment:
- doc - Documentation
- include/rocprofiler/rocprofiler.h - Library public API
- include/rocprofiler/v2/rocprofiler.h - V2 Beta Library public API
- include/rocprofiler/v2/rocprofiler_plugins.h - V2 Beta Tool's Plugins Library public API
- src - Library sources
- core - Library API sources
- util - Library utils sources
- xml - XML parser
- test - Library test suite
- ctrl - Test controll
- util - Test utils
- simple_convolution - Simple convolution test kernel
## Build environment
Roctracer & Rocprofiler need to be installed in the same directory.
```bash
$ export CMAKE_PREFIX_PATH=<path to hsa-runtime includes>:<path to hsa-runtime library>
$ export CMAKE_BUILD_TYPE=<debug|release> # release by default
$ export CMAKE_DEBUG_TRACE=1 # 1 to enable debug tracing
export CMAKE_PREFIX_PATH=<path to hsa-runtime includes>:<path to hsa-runtime library>
export CMAKE_BUILD_TYPE=<debug|release> # release by default
export CMAKE_DEBUG_TRACE=1 # 1 to enable debug tracing
```
To build with the current installed ROCM:
```bash
$ cd .../rocprofiler
$ export CMAKE_PREFIX_PATH=/opt/rocm/include/hsa:/opt/rocm
$ mkdir build
$ cd build
$ cmake ..
$ make
cd .../rocprofiler
./build.sh ## (for clean build use `-cb`)
```
To run the test:
```bash
$ cd .../rocprofiler/build
$ export LD_LIBRARY_PATH=.:<other paths> # paths to ROC profiler and oher libraries
@@ -66,18 +70,20 @@ To enable verbose tracing:
$ export ROCPROFILER_TRACE=1
```
## ROC Profiler library version 9.0
## ROCProfiler v2
## Introduction
ROCProfilerV2 is a newly developed design for AMDs tooling infrastructure that provides a hardware specific low level performance analysis interface for profiling of GPU compute applications.
The first API library version for ROCProfiler v2 is 9.0.0
#### Note: ROCProfilerV2 is currently considered a beta version and is subject to change in future releases
### ROCProfilerV2 Modules
- Counters
- Hardware
- Buffer Pool
- Generic Buffer
- Session
- Filter
- Tools
@@ -89,11 +95,10 @@ ROCProfilerV2 is a newly developed design for AMDs tooling infrastructure tha
### Requirements
- Makecache
- makecache
- Gtest Development Package (Ubuntu: libgtest-dev)
- Cppheaderparser Python3 Package
- Lxml Python3 Package
- Barectf Python3 Package (has to be installed using pip not OS artifactory)
- libsystemd-dev, libelf-dev, libnuma-dev on ubuntu or their corresponding packages on any other OS
- Cppheaderparser, websockets, matplotlib, lxml, barectf Python3 Packages
### Build
@@ -110,13 +115,6 @@ The user has two options for building:
./build.sh --clean-build OR ./build.sh -cb
```
- Optionally, For testing, run the following
```bash
cd build && ./rocprofv2 -t
```
For more information on tests, please see the Tests section
- Option 2 (Where ROCM_PATH envronment need to be set with the current installation directory of rocm), run the following:
```bash
@@ -124,16 +122,23 @@ The user has two options for building:
mkdir build && cd build
# Configuring the rocprofv2 build
cmake -DCMAKE_MODULE_PATH=$ROCM_PATH/hip/cmake <CMAKE_OPTIONS> ..
cmake -DCMAKE_PREFIX_PATH=$ROCM_PATH -DCMAKE_MODULE_PATH=$ROCM_PATH/hip/cmake <CMAKE_OPTIONS> ..
# Building the main runtime of the rocprofv2 project
cmake --build . -- runtime
cmake --build . -- -j
# Optionally, for building API documentation
cmake --build . -- doc
cmake --build . -- -j doc
# Optionally, for building ROCProfiler V2 samples
cmake --build . -- -j samples
# Optionally, for building packages (DEB, RPM, TGZ)
cmake --build . -- package
cmake --build . -- -j tests
# Optionally, for building packages (DEB, RPM, TGZ)
# Note: Requires rpm package on ubuntu
cmake --build . -- -j package
```
### Install
@@ -150,24 +155,14 @@ The user has two options for building:
```bash
cd build
# Install rocprofv2 in the ROCM_PATH path
cmake --build . -- install
cmake --build . -- -j install
```
### Test
- Optionally, for tests: run the following:
```bash
cmake --build . -- check
```
For more information on tests, please see the Tests section
## Features & Usage
### Tool:
### Tool
- rocsys: This is a frontend command line utility to launch/start/stop/exit a session with the required application to be traced or profiled in rocprofv2 context. Usage:
- rocsys: This is a frontend command line utility to launch/start/stop/exit a session with the required application to be traced or profiled in rocprofv2 context. Usage:
```bash
# Launch the application with the required profiling and tracing options with giving a session identifier to be used later
@@ -182,140 +177,136 @@ The user has two options for building:
# Exit a session with a given identifier created at launch
rocsys –session session_name exit
```
- rocprofv2:
- Counters and Metric Collection: HW counters and derived metrics can be collected using following option:
- rocprofv2:
- Counters and Metric Collection: HW counters and derived metrics can be collected using following option:
```bash
rocprofv2 -i samples/input.txt <app_relative_path>
input.txt
```
input.txt content:
input.txt content Example (Details of what is needed inside input.txt will be mentioned with every feature):
```bash
pmc: SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE SQ_INSTS_VALU
```
- Application Trace Support: Differnt trace options are available while profiling an app:
- Application Trace Support: Differnt trace options are available while profiling an app:
```bash
# HIP API & asynchronous activity tracing
rocprofv2 --hip-api <app_relative_path>
rocprofv2 --hip-activity <app_relative_path>
rocprofv2 --hip-api <app_relative_path> ## For synchronous HIP API Activity tracing
rocprofv2 --hip-activity <app_relative_path> ## For both Synchronous & ASynchronous HIP API Activity tracing
rocprofv2 --hip-trace <app_relative_path> ## Same as --hip-activity, added for backward compatibility
# HSA API & asynchronous activity tracing
rocprofv2 --hsa-api <app_relative_path>
rocprofv2 --hsa-activity <app_relative_path>
rocprofv2 --hsa-api <app_relative_path> ## For synchronous HSA API Activity tracing
rocprofv2 --hsa-activity <app_relative_path> ## For both Synchronous & ASynchronous HSA API Activity tracing
rocprofv2 --hsa-trace <app_relative_path> ## Same as --hsa-activity, added for backward compatibility
# Kernel dispatches tracing
rocprofv2 --kernel-trace <app_relative_path>
rocprofv2 --kernel-trace <app_relative_path> ## Kernel Dispatch Tracing
# HIP & HSA API and asynchronous activity and kernel dispatches tracing
rocprofv2 --sys-trace <app_relative_path>
rocprofv2 --sys-trace <app_relative_path> ## Same as combining --hip-trace & --hsa-trace & --kernel-trace
```
For complete usage options, please run rocprofv2 help
```bash
rocprofv2 --help
```
- (ATT) Advanced Thread Trace: It can collect kernel running time, granular hardware metrics per kernel dispatch and provide hotspot analysis at source code level via hardware tracing.
- (ATT) Advanced Thread Trace: It can collect kernel running time, granular hardware metrics per kernel dispatch and provide hotspot analysis at source code level via hardware tracing.
```bash
# (Optional) setup ROCPROFV2_ATT_LIB_PATH environment variable for AQL-ATT.
# The default location is:
export ROCPROFV2_ATT_LIB_PATH="/usr/lib/hsa-amd-aqlprofile/librocprofv2_att.so"
# ATT(Advanced Thread Trace) needs few proeconditions before running.
#1. Make sure to generate the assembly file for application
# ATT(Advanced Thread Trace) needs few preconditions before running.
# 1. Make sure to generate the assembly file for application by executing the following before compiling your HIP Application
export HIPCC_COMPILE_FLAGS_APPEND="--save-temps -g"
#2. Install plugin package
# 2. Install plugin package
see Plugin Support section for installation
#3. Additionally you might need to install few python packages.e.g:
pip3 install websockets
pip3 install matplotlib
# Run the following to view the trace
rocprofv2 --plugin att <app_relative_path_assembly_file> -i input.txt <app_relative_path>
# 3. Run the following to view the trace
rocprofv2 --plugin att <app_relative_path_assembly_file> -i input.txt --mode <network, file, off> <app_relative_path>
# app_assembly_file_relative_path is the assembly file with .s extension generated in 1st step
# app_relative_path is the path for the application binary
# Mode:
# - Network: opens the server with the browser UI.
# att needs 2 ports opened (8000, 18000), In case the browser is running on a different machine.
# - File: dumps the json files to disk, it can be used to quickly verify if there is anything wrong with the data.
# - Off runs collection but not analysis/parsing. So it can be later used on another system to be viewed.
# input.txt gives flexibility to to target the compute unit and provide filters.
# input.txt contents: att: TARGET_CU=0
# att needs 2 ports opened (8000, 18000), In case the browser is running on a different machine.
# input.txt contents:
# TARGET_CU=1 // or some other CU [0,15]
# SE_MASK=0x1 // bitmask of shader engines. The fewer, the easier on the hardware. Default enables all 24 because SE_MASK code is recent.
# SIMD_MASK=0xF // bitmask of SIMDs, there are four in GFX9.
# samples/att.txt is having an example on how to right input file for ATT
```
- Plugin Support: We have a template for adding new plugins. New plugins can be written on top of rocprofv2 to support the desired output format. These plugins are modular in nature and can easily be decoupled from the code based on need. E.g.
- file plugin: outputs the data in txt files.
- Perfetto plugin: outputs the data in protobuf format.
- Adavced thread tracer plugin: advanced hardware traces data in binary format.
- CTF plugin: Outputs the data in ctf format(a binary trace format)
- Plugin Support: We have a template for adding new plugins. New plugins can be written on top of rocprofv2 to support the desired output format using include/rocprofiler/v2/rocprofiler_plugins.h header file. These plugins are modular in nature and can easily be decoupled from the code based on need. E.g.
- file plugin: outputs the data in txt files.
- Perfetto plugin: outputs the data in protobuf format.
- Protobuf files can be viewed using ui.perfetto.dev or using trace_processor
- ATT (Advanced thread tracer) plugin: advanced hardware traces data in binary format. Please refer ATT section.
- CTF plugin: Outputs the data in ctf format(a binary trace format)
- CTF binary output can be viewed using TraceCompass or babeltrace.
installation:
installtion:
```bash
rocprofiler-plugins_9.0.0-local_amd64.deb
rocprofiler-plugins-9.0.0-local.x86_64.rpm
```
usage:
```bash
# plugin_name can be file, perfetto , ctf
./rocprofv2 --plugin plugin_name -i samples/input.txt <app_relative_path>
./rocprofv2 --plugin plugin_name -i samples/input.txt -d output_dir <app_relative_path> # -d is optional, but can be used to define the directory output for output results
```
- Profile Replay Modes: Different replay modes are provided for flexibility to support kernel profiling. The API provides functionality for profiling GPU applications in kernel and application and user mode and also with no replay mode at all and it provides the records pool support with an easy sequence of calls, so the user can be able to profile and trace in easy small steps. Currently, Kernel replay mode is the only supported mode.
- Device Profiling: A device profiling session allows the user to profile the GPU device for counters irrespective of the running applications on the GPU. This is different from application profiling. device profiling session doesn't care about the host running processes and threads. It directly provides low level profiling information.
- Session Support: A session is a unique identifier for a profiling/tracing/pc-sampling task. A ROCProfilerV2 Session has enough information about what needs to be collected or traced and it allows the user to start/stop profiling/tracing whenever required. A simple session API usage:
- Session Support: A session is a unique identifier for a profiling/tracing/pc-sampling task. A ROCProfilerV2 Session has enough information about what needs to be collected or traced and it allows the user to start/stop profiling/tracing whenever required. More details on the API can be found in the API specification documentation that can be installed using rocprofiler-doc package. Samples also can be found for how to use the API in samples directory.
```c++
// Initialize the tools
rocprofiler_initialize();
## Tests
// Creating the session with given replay mode
rocprofiler_session_id_t session_id;
rocprofiler_create_session(rocprofiler_KERNEL_REPLAY_MODE, &session_id);
// Start Session
rocprofiler_start_session(session_id);
// profile a kernel -kernelA
hipLaunchKernelGGL(kernelA, dim3(1), dim3(1), 0, 0);
// Deactivating session
rocprofiler_terminate_session(session_id);
// Destroy sessions
rocprofiler_destroy_session(session_id);
// Destroy all profiling related objects
rocprofiler_finalize();
```
## Tests:
We make use of the GoogleTest (Gtest) framework to automatically find and add test cases to the CMAKE testing environment. ROCProfilerV2 testing is categorized as following:
- unittests (Gtest Based) : These includes tests for core classes. Any newly added functionality should have a unit test written to it.
- featuretests (standalone and Gtest Based): These includes both API tests and tool tests. Tool is tested against different applications to make sure we have right output in evry run.
- unittests (Gtest Based) : These includes tests for core classes. Any newly added functionality should have a unit test written to it.
- memorytests (standalone): This includes running address sanitizer for memory leaks, corruptions.
- featuretests (standalone and Gtest Based): These includes both API tests and tool tests. Tool is tested against different applications to make sure we have right output in evry run.
- memorytests (standalone): This includes running address sanitizer for memory leaks, corruptions.
installation:
```bash
rocprofiler-tests_9.0.0-local_amd64.deb
rocprofiler-tests-9.0.0-local.x86_64.rpm
```
usage:
From build directory:
```bash
./run_tests.sh OR make -j check
```
## Documentation:
We make use of doxygen to autmatically generate API documentation. Generated document can be found in the following path:
- Optionally, for tests: run the following:
- Option 1, using rocprofv2 script:
```bash
cd build && ./rocprofv2 -t
```
- Option 2, using cmake directly:
```bash
cd build && cmake --build . -- -j check
```
## Documentation
We make use of doxygen to automatically generate API documentation. Generated document can be found in the following path:
```bash
# ROCM_PATH by default is /opt/rocm
@@ -323,54 +314,67 @@ We make use of doxygen to autmatically generate API documentation. Generated doc
<ROCM_PATH>/share/doc/rocprofv2
```
installtion:
installation:
```bash
rocprofiler-docs_9.0.0-local_amd64.deb
rocprofiler-docs-9.0.0-local.x86_64.rpm
```
## Samples
- Profiling: Profiling Samples depending on replay mode
- Tracing: Tracing Samples
insalltion:
installation:
```bash
rocprofiler-samples_9.0.0-local_amd64.deb
rocprofiler-samples-9.0.0-local.x86_64.rpm
```
usage:
samples can be run as independent executables once installed
## Project Structure
- Doc: Documentation settings for doxygen
- Plugins
- File Plugin
- Perfetto Plugin
- Adavced thread tracer Plugin
- CTF Plugin
- Samples: Samples of how to use the API
- Script: Scripts needed for tracing
- Src: Source files of the project
- API: API implementation for rocprofv2
- Core: Core source files needed for the API
- Counters: Basic and Derived Counters
- Hardware: Hardware support
- HSA: Provides support for profiler and tracer to communicate with HSA
- Queues: Intercepting HSA Queues
- Packets: Packets Preparation for profiling
- Memory: Memory Pool used in buffers that saves the output data
- Session: Session Logic
- Filter: Type of profiling or tracing and its properties
- Tracer: Tracing support of the session
- Profiler: Profiling support of the session
- Tools: Tools needed to run profiling and tracing
- rocsys: Controling Session from another CLI
- rocprofv2: Binary version of rocprofv2 script (Not yet supported at the moment)
- Utils: Utilities needed by the project
- Tests: Tests folder
- bin: ROCProf scripts along with V1 post processing scripts
- doc: Documentation settings for doxygen, V1 API Specifications pdf document.
- include:
- rocprofiler.h: V1 API Header File
- v2:
- rocprofiler.h: V2 API Header File
- rocprofiler_plugin.h: V2 Tool Plugins API
- plugin
- file: File Plugin
- perfetto: Perfetto Plugin
- att: Adavced thread tracer Plugin
- ctf: CTF Plugin
- samples: Samples of how to use the API, and also input.txt input file samples for counter collection and ATT.
- script: Scripts needed for tracing
- src: Source files of the project
- api: API implementation for rocprofv2
- core: Core source files needed for the V1/V2 API
- counters: Basic and Derived Counters
- hardware: Hardware support
- hsa: Provides support for profiler and tracer to communicate with HSA
- queues: Intercepting HSA Queues
- packets: Packets Preparation for profiling
- memory: Memory Pool used in buffers that saves the output data
- session: Session Logic
- filter: Type of profiling or tracing and its properties
- tracer: Tracing support of the session
- profiler: Profiling support of the session
- spm: SPM support of the session
- att: ATT support of the session
- tools: Tools needed to run profiling and tracing
- rocsys: Controlling Session from another CLI
- utils: Utilities needed by the project
- tests: Tests folder
- CMakeLists.txt: Handles cmake list for the whole project
- build.sh: To easily build and compile rocprofiler
- CHANGELOG.md: Changes that are happening per release
## Support
+2 -2
Wyświetl plik
@@ -68,7 +68,7 @@ while [ 1 ] ; do
elif [[ "$1" = "-t" || "$1" = "--test" ]] ; then
if [ $RUN_FROM_BUILD == 1 ]; then
export ROCPROFILER_METRICS_PATH=$ROCM_DIR/build/counters/derived_counters.xml
TO_CLEAN=no $ROCM_DIR/build.sh
RUN_TEST=yes TO_CLEAN=no $ROCM_DIR/build.sh
if [ "$CURRENT_DIR/build" -ef "./build" ] ; then
./run_tests.sh
else
@@ -98,7 +98,7 @@ while [ 1 ] ; do
if [ $RUN_FROM_BUILD == 1 ]; then
TO_CLEAN=no $ROCM_DIR/build.sh
pushd build
make install
make -j install
exit 1
fi
elif [[ "$1" = "--clean-install" ]] ; then
+6 -3
Wyświetl plik
@@ -69,6 +69,7 @@ if [ -z "$PREFIX_PATH" ] ; then PREFIX_PATH=$PACKAGE_ROOT; fi
if [ -z "$HIP_VDI" ] ; then HIP_VDI=0; fi
if [ -n "$ROCM_RPATH" ] ; then LD_RUNPATH_FLAG=" -Wl,--enable-new-dtags -Wl,--rpath,${ROCM_RPATH}"; fi
if [ -z "$TO_CLEAN" ] ; then TO_CLEAN=yes; fi
if [ -z "$RUN_TEST" ] ; then RUN_TEST=no; fi
if [ -z "$ASAN" ] ; then ASAN=False; fi
if [ -z "$GPU_LIST" ] ; then GPU_LIST='gfx900 gfx906 gfx908 gfx90a gfx1030'; fi
@@ -97,10 +98,12 @@ popd
MAKE_OPTS="-j -C $ROCPROFILER_ROOT/$BUILD_DIR"
cmake --build "$BUILD_DIR" -- $MAKE_OPTS
cmake --build "$BUILD_DIR" -- $MAKE_OPTS doc
cmake --build "$BUILD_DIR" -- $MAKE_OPTS samples
cmake --build "$BUILD_DIR" -- $MAKE_OPTS mytest
cmake --build "$BUILD_DIR" -- $MAKE_OPTS tests
cmake --build "$BUILD_DIR" -- $MAKE_OPTS package
if [ "$RUN_TEST" = "no" ] ; then
cmake --build "$BUILD_DIR" -- $MAKE_OPTS doc
cmake --build "$BUILD_DIR" -- $MAKE_OPTS samples
cmake --build "$BUILD_DIR" -- $MAKE_OPTS package
fi
exit 0
@@ -73,7 +73,7 @@ if ( "${ROCM_ROOT_DIR}" STREQUAL "" )
message ( FATAL_ERROR "ROCM_ROOT_DIR is not found." )
endif ()
find_library ( FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_INSTALL_PREFIX} PATHS ${ROCM_ROOT_DIR})
find_library ( FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_ROOT_DIR} PATH_SUFFIXES lib)
if ( NOT FIND_AQL_PROFILE_LIB )
message ( FATAL_ERROR "AQL_PROFILE not installed. Please install AQL_PROFILE" )
endif()
@@ -40,7 +40,7 @@ install(TARGETS rocprofiler_tool LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/rocprofiler
COMPONENT asan)
add_subdirectory(amdsys)
add_subdirectory(rocsys)
add_subdirectory(rocprofv2)
add_executable(ctrl ctrl.cpp)
@@ -7,19 +7,19 @@ set(CMAKE_BINARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR})
file(GLOB ROCPROFILER_AMDSYS_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
# Compiling/Installing ROCProfiler API
add_executable(rocprofiler_amdsys_fe ${ROCPROFILER_AMDSYS_SRC_FILES})
add_executable(rocprofiler_rocsys_fe ${ROCPROFILER_AMDSYS_SRC_FILES})
set_target_properties(rocprofiler_amdsys_fe PROPERTIES
set_target_properties(rocprofiler_rocsys_fe PROPERTIES
OUTPUT_NAME "rocsys")
target_include_directories(rocprofiler_amdsys_fe
target_include_directories(rocprofiler_rocsys_fe
PRIVATE
${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/inc)
target_link_libraries(rocprofiler_amdsys_fe dl rt stdc++fs)
target_link_libraries(rocprofiler_rocsys_fe dl rt stdc++fs)
install(TARGETS rocprofiler_amdsys_fe RUNTIME
install(TARGETS rocprofiler_rocsys_fe RUNTIME
PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
DESTINATION ${CMAKE_INSTALL_BINDIR}
COMPONENT runtime)
@@ -42,16 +42,11 @@ __global__ void helloworld(char* in, char* out) {
int main(int argc, char* argv[]) {
hipDeviceProp_t devProp;
HIP_RC(hipGetDeviceProperties(&devProp, 0));
std::cout << " System minor " << devProp.minor << std::endl;
std::cout << " System major " << devProp.major << std::endl;
std::cout << " agent prop name " << devProp.name << std::endl;
/* Initial input,output for the host and create memory objects for the
* kernel*/
const char* input = "GdkknVnqkc";
size_t strlength = strlen(input);
std::cout << "input string:" << std::endl;
std::cout << input << std::endl;
char* output = reinterpret_cast<char*>(malloc(strlength + 1));
char* inputBuffer;
@@ -68,13 +63,7 @@ int main(int argc, char* argv[]) {
HIP_RC(hipFree(inputBuffer));
HIP_RC(hipFree(outputBuffer));
output[strlength] = '\0'; // Add the terminal character to the end of output.
std::cout << "\noutput string:" << std::endl;
std::cout << output << std::endl;
free(output);
std::cout << "Passed!\n";
return SUCCESS;
}
@@ -60,11 +60,6 @@ int main() {
hipDeviceProp_t devProp;
HIP_RC(hipGetDeviceProperties(&devProp, 0));
std::cout << " System minor " << devProp.minor << std::endl;
std::cout << " System major " << devProp.major << std::endl;
std::cout << " agent prop name " << devProp.name << std::endl;
std::cout << "hip Device prop succeeded " << std::endl;
int i;
int errors;
@@ -102,8 +97,6 @@ int main() {
}
if (errors != 0) {
printf("FAILED: %d errors\n", errors);
} else {
printf("PASSED!\n");
}
HIP_RC(hipFree(deviceA));
@@ -821,7 +821,6 @@ void __attribute__((constructor)) globalsetting() {
std::string app_path = GetRunningPath(running_path);
std::stringstream gfx_path;
gfx_path << app_path << metrics_path;
std::cout << gfx_path.str() << std::endl;
setenv("ROCPROFILER_METRICS_PATH", gfx_path.str().c_str(), true);
}
@@ -41,9 +41,6 @@ std::string GetRunningPath(std::string string_to_erase) {
path.clear(); // reset path
path.append(real_path);
//std::cout << path << std::endl;
size_t pos = path.find(to_erase);
if (pos != std::string::npos) path.erase(pos, to_erase.length());
} else {
@@ -1,265 +0,0 @@
// TODO(aelwazir): To be checked
#include "hip/hip_runtime.h"
#include <cstdio>
#include <unistd.h>
#include <hip/hip_profile.h>
#include <iostream>
#define N 2560
//change here to run this app longer
#define num_iters 1
template<int n, int m>
__global__ void kernel(double* x) {
for (int idx = threadIdx.x + blockIdx.x * blockDim.x; idx < N; idx += gridDim.x * blockDim.x)
{
#pragma unroll
for (int i = 0; i < n; ++i)
x[idx] += i * m;
}
}
void cpuWork() {
// Do some CPU "work".
usleep(1000);
}
inline void hip_assert(hipError_t err, const char *file, int line)
{
if (err != hipSuccess)
{
fprintf(stderr,"HIP error: %s %s %d\n", hipGetErrorString(err), file, line);
exit(-1);
}
}
#define hipErrorCheck(f) { hip_assert((f), __FILE__, __LINE__); }
#define kernelErrorCheck() { hipErrorCheck(hipPeekAtLastError()); }
int main() {
double* x;
double* x_h;
size_t sz = N * sizeof(double);
std::cout << "running app....." << std::endl;
hipErrorCheck(hipHostMalloc(&x_h, sz));
memset(x_h, 0, sz);
hipErrorCheck(hipMallocManaged(&x, sz));
hipErrorCheck(hipMemset(x, 0, sz));
hipStream_t stream;
hipErrorCheck(hipStreamCreate(&stream));
hipFuncAttributes attr;
int blocks = 80;
int threads = 32;
int fact = 100;
for (int j = 0; j < num_iters; ++j) {
for (int n = 0; n < 25*fact; ++n) {
hipErrorCheck(hipMemcpyAsync(x, x_h, sz, hipMemcpyHostToDevice));
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,1>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,2>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,3>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,4>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,5>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,6>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,7>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,8>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,9>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,10>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,11>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,12>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,13>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,14>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,15>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,16>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,17>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,18>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,19>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,20>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,20>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,21>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,22>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,23>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,24>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,25>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,26>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,27>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,28>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,29>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,30>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,30>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,31>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,32>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,33>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,34>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,35>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,36>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,37>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,38>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,39>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,40>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipErrorCheck(hipMemcpyAsync(x_h, x, sz, hipMemcpyDeviceToHost));
hipErrorCheck(hipDeviceSynchronize());
}
hipErrorCheck(hipMemset(x, 0, sz));
cpuWork();
for (int n = 0; n < 200*fact; ++n) {
hipErrorCheck(hipFuncGetAttributes(&attr, reinterpret_cast<const void*>(kernel<10,1>)));
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<10,1>), dim3(blocks), dim3(threads), 0, stream, x);
kernelErrorCheck();
hipErrorCheck(hipStreamSynchronize(stream));
}
hipErrorCheck(hipMemset(x, 0, sz));
cpuWork();
for (int n = 0; n < 30*fact; ++n) {
for (int k = 0; k < 7; ++k) {
hipErrorCheck(hipFuncGetAttributes(&attr, reinterpret_cast<const void*>(kernel<8,1>)));
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<8,1>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
}
hipErrorCheck(hipDeviceSynchronize());
}
hipErrorCheck(hipMemset(x, 0, sz));
cpuWork();
for (int n = 0; n < 100*fact; ++n) {
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,1>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,2>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,3>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,4>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,5>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipErrorCheck(hipDeviceSynchronize());
}
hipErrorCheck(hipMemset(x, 0, sz));
cpuWork();
for (int n = 0; n < 100*fact; ++n) {
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,1>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,2>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,3>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,4>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,5>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipErrorCheck(hipDeviceSynchronize());
}
hipErrorCheck(hipMemset(x, 0, sz));
cpuWork();
for (int n = 0; n < 50*fact; ++n) {
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,1>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,2>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,3>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,4>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,5>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,6>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,7>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,8>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipErrorCheck(hipDeviceSynchronize());
}
hipErrorCheck(hipMemset(x, 0, sz));
cpuWork();
for (int n = 0; n < 50*fact; ++n) {
int val;
hipErrorCheck(hipDeviceGetAttribute(&val, hipDeviceAttributeMaxThreadsPerBlock, 0));
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<4000,1>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipErrorCheck(hipDeviceSynchronize());
}
hipErrorCheck(hipMemset(x, 0, sz));
cpuWork();
for (int n = 0; n < 50*fact; ++n) {
hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<5000,1>), dim3(blocks), dim3(threads), 0, 0, x);
kernelErrorCheck();
hipErrorCheck(hipDeviceSynchronize());
}
hipErrorCheck(hipMemset(x, 0, sz));
cpuWork();
hipErrorCheck(hipDeviceSynchronize());
}
hipErrorCheck(hipHostFree(x_h));
hipErrorCheck(hipFree(x));
hipErrorCheck(hipStreamDestroy(stream));
}