diff --git a/projects/rocprofiler/README.md b/projects/rocprofiler/README.md index 2720748301..8ca82ab496 100644 --- a/projects/rocprofiler/README.md +++ b/projects/rocprofiler/README.md @@ -4,45 +4,49 @@ The information presented in this document is for informational purposes only an © 2022 Advanced Micro Devices, Inc. All Rights Reserved. -## ROC Profiler library version 1.0 +## ROC Profiler v1 ## Introduction + Profiling with metrics and traces based on perfcounters (PMC) and traces (SPM). Implementation is based on AqlProfile HSA extension. Library supports GFX8/GFX9. +The last API library version for ROCProfiler v1 is 8.0.0 The library source tree: - - doc - Documentation - - include/rocprofiler/rocprofiler.h - Library public API - - include/rocprofiler/v2/rocprofiler.h - V2 Beta Library public API - - include/rocprofiler/v2/rocprofiler_plugins.h - V2 Beta Tool's Plugins Library public API - - src - Library sources - - core - Library API sources - - util - Library utils sources - - xml - XML parser - - test - Library test suite - - ctrl - Test controll - - util - Test utils - - simple_convolution - Simple convolution test kernel -## Build environment: +- doc - Documentation +- include/rocprofiler/rocprofiler.h - Library public API +- include/rocprofiler/v2/rocprofiler.h - V2 Beta Library public API +- include/rocprofiler/v2/rocprofiler_plugins.h - V2 Beta Tool's Plugins Library public API +- src - Library sources + - core - Library API sources + - util - Library utils sources + - xml - XML parser +- test - Library test suite + - ctrl - Test controll + - util - Test utils + - simple_convolution - Simple convolution test kernel + +## Build environment Roctracer & Rocprofiler need to be installed in the same directory. + ```bash -$ export CMAKE_PREFIX_PATH=: -$ export CMAKE_BUILD_TYPE= # release by default -$ export CMAKE_DEBUG_TRACE=1 # 1 to enable debug tracing +export CMAKE_PREFIX_PATH=: +export CMAKE_BUILD_TYPE= # release by default +export CMAKE_DEBUG_TRACE=1 # 1 to enable debug tracing ``` + To build with the current installed ROCM: + ```bash -$ cd .../rocprofiler -$ export CMAKE_PREFIX_PATH=/opt/rocm/include/hsa:/opt/rocm -$ mkdir build -$ cd build -$ cmake .. -$ make +cd .../rocprofiler +./build.sh ## (for clean build use `-cb`) ``` + To run the test: + ```bash $ cd .../rocprofiler/build $ export LD_LIBRARY_PATH=.: # paths to ROC profiler and oher libraries @@ -66,18 +70,20 @@ To enable verbose tracing: $ export ROCPROFILER_TRACE=1 ``` -## ROC Profiler library version 9.0 +## ROCProfiler v2 ## Introduction ROCProfilerV2 is a newly developed design for AMD’s tooling infrastructure that provides a hardware specific low level performance analysis interface for profiling of GPU compute applications. +The first API library version for ROCProfiler v2 is 9.0.0 +#### Note: ROCProfilerV2 is currently considered a beta version and is subject to change in future releases ### ROCProfilerV2 Modules - Counters - Hardware -- Buffer Pool +- Generic Buffer - Session - Filter - Tools @@ -89,11 +95,10 @@ ROCProfilerV2 is a newly developed design for AMD’s tooling infrastructure tha ### Requirements -- Makecache +- makecache - Gtest Development Package (Ubuntu: libgtest-dev) -- Cppheaderparser Python3 Package -- Lxml Python3 Package -- Barectf Python3 Package (has to be installed using pip not OS artifactory) +- libsystemd-dev, libelf-dev, libnuma-dev on ubuntu or their corresponding packages on any other OS +- Cppheaderparser, websockets, matplotlib, lxml, barectf Python3 Packages ### Build @@ -110,13 +115,6 @@ The user has two options for building: ./build.sh --clean-build OR ./build.sh -cb ``` - - Optionally, For testing, run the following - - ```bash - cd build && ./rocprofv2 -t - ``` - For more information on tests, please see the Tests section - - Option 2 (Where ROCM_PATH envronment need to be set with the current installation directory of rocm), run the following: ```bash @@ -124,16 +122,23 @@ The user has two options for building: mkdir build && cd build # Configuring the rocprofv2 build - cmake -DCMAKE_MODULE_PATH=$ROCM_PATH/hip/cmake .. + cmake -DCMAKE_PREFIX_PATH=$ROCM_PATH -DCMAKE_MODULE_PATH=$ROCM_PATH/hip/cmake .. # Building the main runtime of the rocprofv2 project - cmake --build . -- runtime + cmake --build . -- -j # Optionally, for building API documentation - cmake --build . -- doc + cmake --build . -- -j doc + + # Optionally, for building ROCProfiler V2 samples + cmake --build . -- -j samples # Optionally, for building packages (DEB, RPM, TGZ) - cmake --build . -- package + cmake --build . -- -j tests + + # Optionally, for building packages (DEB, RPM, TGZ) + # Note: Requires rpm package on ubuntu + cmake --build . -- -j package ``` ### Install @@ -150,24 +155,14 @@ The user has two options for building: ```bash cd build # Install rocprofv2 in the ROCM_PATH path - cmake --build . -- install + cmake --build . -- -j install ``` -### Test - -- Optionally, for tests: run the following: - - ```bash - cmake --build . -- check - ``` - For more information on tests, please see the Tests section - - ## Features & Usage -### Tool: +### Tool - - rocsys: This is a frontend command line utility to launch/start/stop/exit a session with the required application to be traced or profiled in rocprofv2 context. Usage: +- rocsys: This is a frontend command line utility to launch/start/stop/exit a session with the required application to be traced or profiled in rocprofv2 context. Usage: ```bash # Launch the application with the required profiling and tracing options with giving a session identifier to be used later @@ -182,140 +177,136 @@ The user has two options for building: # Exit a session with a given identifier created at launch rocsys –session session_name exit ``` - - rocprofv2: - - Counters and Metric Collection: HW counters and derived metrics can be collected using following option: +- rocprofv2: + + - Counters and Metric Collection: HW counters and derived metrics can be collected using following option: ```bash rocprofv2 -i samples/input.txt input.txt ``` - input.txt content: + input.txt content Example (Details of what is needed inside input.txt will be mentioned with every feature): ```bash pmc: SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE SQ_INSTS_VALU ``` - - Application Trace Support: Differnt trace options are available while profiling an app: + - Application Trace Support: Differnt trace options are available while profiling an app: ```bash # HIP API & asynchronous activity tracing - rocprofv2 --hip-api - rocprofv2 --hip-activity + rocprofv2 --hip-api ## For synchronous HIP API Activity tracing + rocprofv2 --hip-activity ## For both Synchronous & ASynchronous HIP API Activity tracing + rocprofv2 --hip-trace ## Same as --hip-activity, added for backward compatibility # HSA API & asynchronous activity tracing - rocprofv2 --hsa-api - rocprofv2 --hsa-activity + rocprofv2 --hsa-api ## For synchronous HSA API Activity tracing + rocprofv2 --hsa-activity ## For both Synchronous & ASynchronous HSA API Activity tracing + rocprofv2 --hsa-trace ## Same as --hsa-activity, added for backward compatibility # Kernel dispatches tracing - rocprofv2 --kernel-trace + rocprofv2 --kernel-trace ## Kernel Dispatch Tracing # HIP & HSA API and asynchronous activity and kernel dispatches tracing - rocprofv2 --sys-trace + rocprofv2 --sys-trace ## Same as combining --hip-trace & --hsa-trace & --kernel-trace ``` For complete usage options, please run rocprofv2 help + ```bash rocprofv2 --help ``` - - (ATT) Advanced Thread Trace: It can collect kernel running time, granular hardware metrics per kernel dispatch and provide hotspot analysis at source code level via hardware tracing. + + - (ATT) Advanced Thread Trace: It can collect kernel running time, granular hardware metrics per kernel dispatch and provide hotspot analysis at source code level via hardware tracing. ```bash - # (Optional) setup ROCPROFV2_ATT_LIB_PATH environment variable for AQL-ATT. - # The default location is: - export ROCPROFV2_ATT_LIB_PATH="/usr/lib/hsa-amd-aqlprofile/librocprofv2_att.so" - - # ATT(Advanced Thread Trace) needs few proeconditions before running. - #1. Make sure to generate the assembly file for application + # ATT(Advanced Thread Trace) needs few preconditions before running. + # 1. Make sure to generate the assembly file for application by executing the following before compiling your HIP Application export HIPCC_COMPILE_FLAGS_APPEND="--save-temps -g" - #2. Install plugin package + # 2. Install plugin package see Plugin Support section for installation - #3. Additionally you might need to install few python packages.e.g: - pip3 install websockets - pip3 install matplotlib - - # Run the following to view the trace - rocprofv2 --plugin att -i input.txt + # 3. Run the following to view the trace + rocprofv2 --plugin att -i input.txt --mode # app_assembly_file_relative_path is the assembly file with .s extension generated in 1st step # app_relative_path is the path for the application binary + # Mode: + # - Network: opens the server with the browser UI. + # att needs 2 ports opened (8000, 18000), In case the browser is running on a different machine. + # - File: dumps the json files to disk, it can be used to quickly verify if there is anything wrong with the data. + # - Off runs collection but not analysis/parsing. So it can be later used on another system to be viewed. # input.txt gives flexibility to to target the compute unit and provide filters. - # input.txt contents: att: TARGET_CU=0 - # att needs 2 ports opened (8000, 18000), In case the browser is running on a different machine. + # input.txt contents: + # TARGET_CU=1 // or some other CU [0,15] + # SE_MASK=0x1 // bitmask of shader engines. The fewer, the easier on the hardware. Default enables all 24 because SE_MASK code is recent. + # SIMD_MASK=0xF // bitmask of SIMDs, there are four in GFX9. + # samples/att.txt is having an example on how to right input file for ATT ``` - - Plugin Support: We have a template for adding new plugins. New plugins can be written on top of rocprofv2 to support the desired output format. These plugins are modular in nature and can easily be decoupled from the code based on need. E.g. - - file plugin: outputs the data in txt files. - - Perfetto plugin: outputs the data in protobuf format. - - Adavced thread tracer plugin: advanced hardware traces data in binary format. - - CTF plugin: Outputs the data in ctf format(a binary trace format) + - Plugin Support: We have a template for adding new plugins. New plugins can be written on top of rocprofv2 to support the desired output format using include/rocprofiler/v2/rocprofiler_plugins.h header file. These plugins are modular in nature and can easily be decoupled from the code based on need. E.g. + - file plugin: outputs the data in txt files. + - Perfetto plugin: outputs the data in protobuf format. + - Protobuf files can be viewed using ui.perfetto.dev or using trace_processor + - ATT (Advanced thread tracer) plugin: advanced hardware traces data in binary format. Please refer ATT section. + - CTF plugin: Outputs the data in ctf format(a binary trace format) + - CTF binary output can be viewed using TraceCompass or babeltrace. + + installation: - installtion: ```bash rocprofiler-plugins_9.0.0-local_amd64.deb rocprofiler-plugins-9.0.0-local.x86_64.rpm ``` + usage: ```bash # plugin_name can be file, perfetto , ctf - ./rocprofv2 --plugin plugin_name -i samples/input.txt + ./rocprofv2 --plugin plugin_name -i samples/input.txt -d output_dir # -d is optional, but can be used to define the directory output for output results ``` -- Profile Replay Modes: Different replay modes are provided for flexibility to support kernel profiling. The API provides functionality for profiling GPU applications in kernel and application and user mode and also with no replay mode at all and it provides the records pool support with an easy sequence of calls, so the user can be able to profile and trace in easy small steps. Currently, Kernel replay mode is the only supported mode. - - Device Profiling: A device profiling session allows the user to profile the GPU device for counters irrespective of the running applications on the GPU. This is different from application profiling. device profiling session doesn't care about the host running processes and threads. It directly provides low level profiling information. -- Session Support: A session is a unique identifier for a profiling/tracing/pc-sampling task. A ROCProfilerV2 Session has enough information about what needs to be collected or traced and it allows the user to start/stop profiling/tracing whenever required. A simple session API usage: +- Session Support: A session is a unique identifier for a profiling/tracing/pc-sampling task. A ROCProfilerV2 Session has enough information about what needs to be collected or traced and it allows the user to start/stop profiling/tracing whenever required. More details on the API can be found in the API specification documentation that can be installed using rocprofiler-doc package. Samples also can be found for how to use the API in samples directory. - ```c++ - // Initialize the tools - rocprofiler_initialize(); +## Tests - // Creating the session with given replay mode - rocprofiler_session_id_t session_id; - rocprofiler_create_session(rocprofiler_KERNEL_REPLAY_MODE, &session_id); - - // Start Session - rocprofiler_start_session(session_id); - - // profile a kernel -kernelA - hipLaunchKernelGGL(kernelA, dim3(1), dim3(1), 0, 0); - - // Deactivating session - rocprofiler_terminate_session(session_id); - - // Destroy sessions - rocprofiler_destroy_session(session_id); - - // Destroy all profiling related objects - rocprofiler_finalize(); - ``` - - ## Tests: We make use of the GoogleTest (Gtest) framework to automatically find and add test cases to the CMAKE testing environment. ROCProfilerV2 testing is categorized as following: - - unittests (Gtest Based) : These includes tests for core classes. Any newly added functionality should have a unit test written to it. - - featuretests (standalone and Gtest Based): These includes both API tests and tool tests. Tool is tested against different applications to make sure we have right output in evry run. +- unittests (Gtest Based) : These includes tests for core classes. Any newly added functionality should have a unit test written to it. - - memorytests (standalone): This includes running address sanitizer for memory leaks, corruptions. +- featuretests (standalone and Gtest Based): These includes both API tests and tool tests. Tool is tested against different applications to make sure we have right output in evry run. + +- memorytests (standalone): This includes running address sanitizer for memory leaks, corruptions. installation: + ```bash rocprofiler-tests_9.0.0-local_amd64.deb rocprofiler-tests-9.0.0-local.x86_64.rpm ``` - usage: - From build directory: - ```bash - ./run_tests.sh OR make -j check - ``` -## Documentation: -We make use of doxygen to autmatically generate API documentation. Generated document can be found in the following path: +- Optionally, for tests: run the following: + +- Option 1, using rocprofv2 script: + + ```bash + cd build && ./rocprofv2 -t + ``` + +- Option 2, using cmake directly: + + ```bash + cd build && cmake --build . -- -j check + ``` + +## Documentation + +We make use of doxygen to automatically generate API documentation. Generated document can be found in the following path: ```bash # ROCM_PATH by default is /opt/rocm @@ -323,54 +314,67 @@ We make use of doxygen to autmatically generate API documentation. Generated doc /share/doc/rocprofv2 ``` - installtion: + installation: + ```bash rocprofiler-docs_9.0.0-local_amd64.deb rocprofiler-docs-9.0.0-local.x86_64.rpm ``` + ## Samples - Profiling: Profiling Samples depending on replay mode - Tracing: Tracing Samples -insalltion: +installation: + ```bash rocprofiler-samples_9.0.0-local_amd64.deb rocprofiler-samples-9.0.0-local.x86_64.rpm ``` + usage: samples can be run as independent executables once installed ## Project Structure -- Doc: Documentation settings for doxygen -- Plugins - - File Plugin - - Perfetto Plugin - - Adavced thread tracer Plugin - - CTF Plugin -- Samples: Samples of how to use the API -- Script: Scripts needed for tracing -- Src: Source files of the project - - API: API implementation for rocprofv2 - - Core: Core source files needed for the API - - Counters: Basic and Derived Counters - - Hardware: Hardware support - - HSA: Provides support for profiler and tracer to communicate with HSA - - Queues: Intercepting HSA Queues - - Packets: Packets Preparation for profiling - - Memory: Memory Pool used in buffers that saves the output data - - Session: Session Logic - - Filter: Type of profiling or tracing and its properties - - Tracer: Tracing support of the session - - Profiler: Profiling support of the session - - Tools: Tools needed to run profiling and tracing - - rocsys: Controling Session from another CLI - - rocprofv2: Binary version of rocprofv2 script (Not yet supported at the moment) - - Utils: Utilities needed by the project -- Tests: Tests folder +- bin: ROCProf scripts along with V1 post processing scripts +- doc: Documentation settings for doxygen, V1 API Specifications pdf document. +- include: + - rocprofiler.h: V1 API Header File + - v2: + - rocprofiler.h: V2 API Header File + - rocprofiler_plugin.h: V2 Tool Plugins API +- plugin + - file: File Plugin + - perfetto: Perfetto Plugin + - att: Adavced thread tracer Plugin + - ctf: CTF Plugin +- samples: Samples of how to use the API, and also input.txt input file samples for counter collection and ATT. +- script: Scripts needed for tracing +- src: Source files of the project + - api: API implementation for rocprofv2 + - core: Core source files needed for the V1/V2 API + - counters: Basic and Derived Counters + - hardware: Hardware support + - hsa: Provides support for profiler and tracer to communicate with HSA + - queues: Intercepting HSA Queues + - packets: Packets Preparation for profiling + - memory: Memory Pool used in buffers that saves the output data + - session: Session Logic + - filter: Type of profiling or tracing and its properties + - tracer: Tracing support of the session + - profiler: Profiling support of the session + - spm: SPM support of the session + - att: ATT support of the session + - tools: Tools needed to run profiling and tracing + - rocsys: Controlling Session from another CLI + - utils: Utilities needed by the project +- tests: Tests folder - CMakeLists.txt: Handles cmake list for the whole project +- build.sh: To easily build and compile rocprofiler +- CHANGELOG.md: Changes that are happening per release ## Support diff --git a/projects/rocprofiler/bin/rocprofv2 b/projects/rocprofiler/bin/rocprofv2 index 5d08c11942..524b47d590 100755 --- a/projects/rocprofiler/bin/rocprofv2 +++ b/projects/rocprofiler/bin/rocprofv2 @@ -68,7 +68,7 @@ while [ 1 ] ; do elif [[ "$1" = "-t" || "$1" = "--test" ]] ; then if [ $RUN_FROM_BUILD == 1 ]; then export ROCPROFILER_METRICS_PATH=$ROCM_DIR/build/counters/derived_counters.xml - TO_CLEAN=no $ROCM_DIR/build.sh + RUN_TEST=yes TO_CLEAN=no $ROCM_DIR/build.sh if [ "$CURRENT_DIR/build" -ef "./build" ] ; then ./run_tests.sh else @@ -98,7 +98,7 @@ while [ 1 ] ; do if [ $RUN_FROM_BUILD == 1 ]; then TO_CLEAN=no $ROCM_DIR/build.sh pushd build - make install + make -j install exit 1 fi elif [[ "$1" = "--clean-install" ]] ; then diff --git a/projects/rocprofiler/build.sh b/projects/rocprofiler/build.sh index ba6d5d403c..be2b905134 100755 --- a/projects/rocprofiler/build.sh +++ b/projects/rocprofiler/build.sh @@ -69,6 +69,7 @@ if [ -z "$PREFIX_PATH" ] ; then PREFIX_PATH=$PACKAGE_ROOT; fi if [ -z "$HIP_VDI" ] ; then HIP_VDI=0; fi if [ -n "$ROCM_RPATH" ] ; then LD_RUNPATH_FLAG=" -Wl,--enable-new-dtags -Wl,--rpath,${ROCM_RPATH}"; fi if [ -z "$TO_CLEAN" ] ; then TO_CLEAN=yes; fi +if [ -z "$RUN_TEST" ] ; then RUN_TEST=no; fi if [ -z "$ASAN" ] ; then ASAN=False; fi if [ -z "$GPU_LIST" ] ; then GPU_LIST='gfx900 gfx906 gfx908 gfx90a gfx1030'; fi @@ -97,10 +98,12 @@ popd MAKE_OPTS="-j -C $ROCPROFILER_ROOT/$BUILD_DIR" cmake --build "$BUILD_DIR" -- $MAKE_OPTS -cmake --build "$BUILD_DIR" -- $MAKE_OPTS doc -cmake --build "$BUILD_DIR" -- $MAKE_OPTS samples cmake --build "$BUILD_DIR" -- $MAKE_OPTS mytest cmake --build "$BUILD_DIR" -- $MAKE_OPTS tests -cmake --build "$BUILD_DIR" -- $MAKE_OPTS package +if [ "$RUN_TEST" = "no" ] ; then + cmake --build "$BUILD_DIR" -- $MAKE_OPTS doc + cmake --build "$BUILD_DIR" -- $MAKE_OPTS samples + cmake --build "$BUILD_DIR" -- $MAKE_OPTS package +fi exit 0 diff --git a/projects/rocprofiler/cmake_modules/env.cmake b/projects/rocprofiler/cmake_modules/env.cmake index 278bb9ec85..68c0652020 100644 --- a/projects/rocprofiler/cmake_modules/env.cmake +++ b/projects/rocprofiler/cmake_modules/env.cmake @@ -73,7 +73,7 @@ if ( "${ROCM_ROOT_DIR}" STREQUAL "" ) message ( FATAL_ERROR "ROCM_ROOT_DIR is not found." ) endif () -find_library ( FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_INSTALL_PREFIX} PATHS ${ROCM_ROOT_DIR}) +find_library ( FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_ROOT_DIR} PATH_SUFFIXES lib) if ( NOT FIND_AQL_PROFILE_LIB ) message ( FATAL_ERROR "AQL_PROFILE not installed. Please install AQL_PROFILE" ) endif() diff --git a/projects/rocprofiler/src/tools/CMakeLists.txt b/projects/rocprofiler/src/tools/CMakeLists.txt index 0e636f5c49..414aee350d 100644 --- a/projects/rocprofiler/src/tools/CMakeLists.txt +++ b/projects/rocprofiler/src/tools/CMakeLists.txt @@ -40,7 +40,7 @@ install(TARGETS rocprofiler_tool LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/rocprofiler COMPONENT asan) -add_subdirectory(amdsys) +add_subdirectory(rocsys) add_subdirectory(rocprofv2) add_executable(ctrl ctrl.cpp) diff --git a/projects/rocprofiler/src/tools/amdsys/CMakeLists.txt b/projects/rocprofiler/src/tools/rocsys/CMakeLists.txt similarity index 76% rename from projects/rocprofiler/src/tools/amdsys/CMakeLists.txt rename to projects/rocprofiler/src/tools/rocsys/CMakeLists.txt index 6770fc53e2..e1dc487572 100644 --- a/projects/rocprofiler/src/tools/amdsys/CMakeLists.txt +++ b/projects/rocprofiler/src/tools/rocsys/CMakeLists.txt @@ -7,19 +7,19 @@ set(CMAKE_BINARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}) file(GLOB ROCPROFILER_AMDSYS_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp) # Compiling/Installing ROCProfiler API -add_executable(rocprofiler_amdsys_fe ${ROCPROFILER_AMDSYS_SRC_FILES}) +add_executable(rocprofiler_rocsys_fe ${ROCPROFILER_AMDSYS_SRC_FILES}) -set_target_properties(rocprofiler_amdsys_fe PROPERTIES +set_target_properties(rocprofiler_rocsys_fe PROPERTIES OUTPUT_NAME "rocsys") -target_include_directories(rocprofiler_amdsys_fe +target_include_directories(rocprofiler_rocsys_fe PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/inc) -target_link_libraries(rocprofiler_amdsys_fe dl rt stdc++fs) +target_link_libraries(rocprofiler_rocsys_fe dl rt stdc++fs) -install(TARGETS rocprofiler_amdsys_fe RUNTIME +install(TARGETS rocprofiler_rocsys_fe RUNTIME PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT runtime) diff --git a/projects/rocprofiler/src/tools/amdsys/amdsys.cpp b/projects/rocprofiler/src/tools/rocsys/rocsys.cpp similarity index 100% rename from projects/rocprofiler/src/tools/amdsys/amdsys.cpp rename to projects/rocprofiler/src/tools/rocsys/rocsys.cpp diff --git a/projects/rocprofiler/tests/featuretests/profiler/apps/hello_world_hip.cpp b/projects/rocprofiler/tests/featuretests/profiler/apps/hello_world_hip.cpp index 8aa6fd0751..7caa41d6ca 100755 --- a/projects/rocprofiler/tests/featuretests/profiler/apps/hello_world_hip.cpp +++ b/projects/rocprofiler/tests/featuretests/profiler/apps/hello_world_hip.cpp @@ -42,16 +42,11 @@ __global__ void helloworld(char* in, char* out) { int main(int argc, char* argv[]) { hipDeviceProp_t devProp; HIP_RC(hipGetDeviceProperties(&devProp, 0)); - std::cout << " System minor " << devProp.minor << std::endl; - std::cout << " System major " << devProp.major << std::endl; - std::cout << " agent prop name " << devProp.name << std::endl; /* Initial input,output for the host and create memory objects for the * kernel*/ const char* input = "GdkknVnqkc"; size_t strlength = strlen(input); - std::cout << "input string:" << std::endl; - std::cout << input << std::endl; char* output = reinterpret_cast(malloc(strlength + 1)); char* inputBuffer; @@ -68,13 +63,7 @@ int main(int argc, char* argv[]) { HIP_RC(hipFree(inputBuffer)); HIP_RC(hipFree(outputBuffer)); - output[strlength] = '\0'; // Add the terminal character to the end of output. - std::cout << "\noutput string:" << std::endl; - std::cout << output << std::endl; - free(output); - std::cout << "Passed!\n"; - return SUCCESS; } diff --git a/projects/rocprofiler/tests/featuretests/profiler/apps/vector_add_hip.cpp b/projects/rocprofiler/tests/featuretests/profiler/apps/vector_add_hip.cpp index c43223a16b..dc95dc91d1 100755 --- a/projects/rocprofiler/tests/featuretests/profiler/apps/vector_add_hip.cpp +++ b/projects/rocprofiler/tests/featuretests/profiler/apps/vector_add_hip.cpp @@ -60,11 +60,6 @@ int main() { hipDeviceProp_t devProp; HIP_RC(hipGetDeviceProperties(&devProp, 0)); - std::cout << " System minor " << devProp.minor << std::endl; - std::cout << " System major " << devProp.major << std::endl; - std::cout << " agent prop name " << devProp.name << std::endl; - - std::cout << "hip Device prop succeeded " << std::endl; int i; int errors; @@ -102,8 +97,6 @@ int main() { } if (errors != 0) { printf("FAILED: %d errors\n", errors); - } else { - printf("PASSED!\n"); } HIP_RC(hipFree(deviceA)); diff --git a/projects/rocprofiler/tests/featuretests/profiler/profiler_gtest.cpp b/projects/rocprofiler/tests/featuretests/profiler/profiler_gtest.cpp index e0037074c1..444ea0a724 100644 --- a/projects/rocprofiler/tests/featuretests/profiler/profiler_gtest.cpp +++ b/projects/rocprofiler/tests/featuretests/profiler/profiler_gtest.cpp @@ -821,7 +821,6 @@ void __attribute__((constructor)) globalsetting() { std::string app_path = GetRunningPath(running_path); std::stringstream gfx_path; gfx_path << app_path << metrics_path; - std::cout << gfx_path.str() << std::endl; setenv("ROCPROFILER_METRICS_PATH", gfx_path.str().c_str(), true); } diff --git a/projects/rocprofiler/tests/featuretests/utils/test_utils.cpp b/projects/rocprofiler/tests/featuretests/utils/test_utils.cpp index 117e0d38c3..61811a5b20 100644 --- a/projects/rocprofiler/tests/featuretests/utils/test_utils.cpp +++ b/projects/rocprofiler/tests/featuretests/utils/test_utils.cpp @@ -41,9 +41,6 @@ std::string GetRunningPath(std::string string_to_erase) { path.clear(); // reset path path.append(real_path); - //std::cout << path << std::endl; - - size_t pos = path.find(to_erase); if (pos != std::string::npos) path.erase(pos, to_erase.length()); } else { diff --git a/projects/rocprofiler/tests/unittests/amdsys.cpp b/projects/rocprofiler/tests/unittests/amdsys.cpp deleted file mode 100644 index ca9abf51c2..0000000000 --- a/projects/rocprofiler/tests/unittests/amdsys.cpp +++ /dev/null @@ -1,265 +0,0 @@ -// TODO(aelwazir): To be checked - -#include "hip/hip_runtime.h" - -#include -#include -#include -#include - - -#define N 2560 -//change here to run this app longer -#define num_iters 1 - - -template -__global__ void kernel(double* x) { - for (int idx = threadIdx.x + blockIdx.x * blockDim.x; idx < N; idx += gridDim.x * blockDim.x) - { - #pragma unroll - for (int i = 0; i < n; ++i) - x[idx] += i * m; - } -} - -void cpuWork() { - // Do some CPU "work". - usleep(1000); -} - -inline void hip_assert(hipError_t err, const char *file, int line) -{ - if (err != hipSuccess) - { - fprintf(stderr,"HIP error: %s %s %d\n", hipGetErrorString(err), file, line); - exit(-1); - } -} - -#define hipErrorCheck(f) { hip_assert((f), __FILE__, __LINE__); } -#define kernelErrorCheck() { hipErrorCheck(hipPeekAtLastError()); } - -int main() { - - double* x; - double* x_h; - - size_t sz = N * sizeof(double); - std::cout << "running app....." << std::endl; - hipErrorCheck(hipHostMalloc(&x_h, sz)); - - memset(x_h, 0, sz); - hipErrorCheck(hipMallocManaged(&x, sz)); - hipErrorCheck(hipMemset(x, 0, sz)); - - hipStream_t stream; - hipErrorCheck(hipStreamCreate(&stream)); - - hipFuncAttributes attr; - - int blocks = 80; - int threads = 32; - int fact = 100; - for (int j = 0; j < num_iters; ++j) { - for (int n = 0; n < 25*fact; ++n) { - hipErrorCheck(hipMemcpyAsync(x, x_h, sz, hipMemcpyHostToDevice)); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,1>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,2>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,3>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,4>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,5>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,6>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,7>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,8>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,9>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,10>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,11>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,12>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,13>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,14>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,15>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,16>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,17>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,18>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,19>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,20>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,20>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,21>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,22>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,23>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,24>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,25>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,26>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,27>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,28>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,29>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,30>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,30>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,31>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,32>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,33>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,34>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,35>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,36>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,37>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,38>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,39>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,40>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipErrorCheck(hipMemcpyAsync(x_h, x, sz, hipMemcpyDeviceToHost)); - hipErrorCheck(hipDeviceSynchronize()); - } - - hipErrorCheck(hipMemset(x, 0, sz)); - cpuWork(); - - for (int n = 0; n < 200*fact; ++n) { - hipErrorCheck(hipFuncGetAttributes(&attr, reinterpret_cast(kernel<10,1>))); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<10,1>), dim3(blocks), dim3(threads), 0, stream, x); - kernelErrorCheck(); - hipErrorCheck(hipStreamSynchronize(stream)); - } - - hipErrorCheck(hipMemset(x, 0, sz)); - cpuWork(); - - for (int n = 0; n < 30*fact; ++n) { - for (int k = 0; k < 7; ++k) { - hipErrorCheck(hipFuncGetAttributes(&attr, reinterpret_cast(kernel<8,1>))); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<8,1>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - } - hipErrorCheck(hipDeviceSynchronize()); - } - - hipErrorCheck(hipMemset(x, 0, sz)); - cpuWork(); - - for (int n = 0; n < 100*fact; ++n) { - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,1>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,2>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,3>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,4>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,5>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipErrorCheck(hipDeviceSynchronize()); - } - - hipErrorCheck(hipMemset(x, 0, sz)); - cpuWork(); - - for (int n = 0; n < 100*fact; ++n) { - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,1>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,2>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,3>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,4>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,5>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipErrorCheck(hipDeviceSynchronize()); - } - - hipErrorCheck(hipMemset(x, 0, sz)); - cpuWork(); - - for (int n = 0; n < 50*fact; ++n) { - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,1>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,2>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,3>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,4>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,5>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,6>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,7>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,8>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipErrorCheck(hipDeviceSynchronize()); - } - - hipErrorCheck(hipMemset(x, 0, sz)); - cpuWork(); - - for (int n = 0; n < 50*fact; ++n) { - int val; - hipErrorCheck(hipDeviceGetAttribute(&val, hipDeviceAttributeMaxThreadsPerBlock, 0)); - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<4000,1>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipErrorCheck(hipDeviceSynchronize()); - } - - hipErrorCheck(hipMemset(x, 0, sz)); - cpuWork(); - - for (int n = 0; n < 50*fact; ++n) { - hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<5000,1>), dim3(blocks), dim3(threads), 0, 0, x); - kernelErrorCheck(); - hipErrorCheck(hipDeviceSynchronize()); - } - - hipErrorCheck(hipMemset(x, 0, sz)); - cpuWork(); - - hipErrorCheck(hipDeviceSynchronize()); - - } - - hipErrorCheck(hipHostFree(x_h)); - hipErrorCheck(hipFree(x)); - hipErrorCheck(hipStreamDestroy(stream)); - -} \ No newline at end of file