SDK: OMPT Support (#22)
* Ability to select alternative compiler per file
Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.
Misc updates
Update OpenMP target sample
- samples/ompt -> samples/openmp_target
- fix sample test of openmp-target
- reorganize files
Rework OpenMP implementation
Minor OpenMP implementation cleanup
Rename samples/openmp_target CMake targets
Add tests/bin/openmp
- OpenMP target test app in tests/bin/openmp/target
Format samples/openmp_target CMakeLists.txt
Misc lib/rocprofiler-sdk/openmp cleanup
- fix includes
- convert_arg
Update openmp.def.cpp
- tweak includes
- remove lots of temporary variables
Update samples
- common::get_callback_id_names() -> common::get_callback_tracing_names()
- add kernel dispatch, memory copy, scratch memory buffered tracing to openmp target sample
Fix code object operation names
- add "CODE_OBJECT_" prefix
Update include/rocprofiler-sdk/openmp/api_id.h
- remove spurious comment
Miscellaneous openmp updates
- similar API for openmp_begin and openmp_end
- move implementations of ompt callbacks to openmp.cpp
- ompt_{thread_begin,thread_end,parallel_begin,parallel_end}_callbacks are openmp_events
[SWDEV-484495] Fix int truncation in CSV output (#1098)
CSV output truncates doubles to ints when it shouldn't. Derived metrics
are (mostly) doubles and lose precision (or become worthless) if treated
as an int. Converted these to double to match the format we return from
rocprof-sdk.
Co-authored-by: Benjamin Welton <ben@amd.com>
Update limit for max counter records in rocprof-tool (#1073)
A fixed sized std::array is used to store counter records in rocprofiler SDK. This limit was breached in SWDEV-484742. Upping the limit to 512 to be less likely to reach this limit again.
adding proxy ompt_data_t * arguments
fixes for proxy pointers
- Implement proxy ompt_data_t* pointers for clients
- Add ompt_data_t* arguments back to callback API
- Modify openmp sample to illustrate use of proxy pointers
formatting
SWDEV-467350: Skipping tool counter iteration for unsupported hardware (#1083)
Fixing some accumulate metrics (#1089)
* Fixing some accumulate metrics
* Fixing some more accumulate metrics
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
updating rocprofv3 help options (#1113)
* updating rocprofv3 help options
* updating CHANGELOG
Fixing installed pacakge tests in CI (#1119)
* Fixing installed pacakge tests in CI
* Formatted rocprofv3.py with black formatter
SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests. (#1112)
* SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests.
* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
* Adding backlog for codeobj changes
* Formatting
* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
---------
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
SWDEV-487621: Fixes for metric definitions (#1118)
* Fixes for metric definitions
* Removing gfx8
* Update changelog
* Fixing unit tests
* Small fixes
* Fix for write size
Fix PSDB change (#1120)
Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit c77e4d3b80
clang-18 build fix for RCCL (#1123)
Removes ambiguity on const usage, which clang-18 complains about
(preventing build with warn error).
mem copy direction field update (#1124)
Adding Node-id for debugging with log level trace (#1090)
fix botched rebase
Per Jonathan to remove -rdynamic warning so CI will continue
pedantic formatting
Correct the package name of rocprofiler-sdk (#1126)
* Correct the package name of rocprofiler-sdk
ROCM VERSION(for ex: 60300) was missing in the package name.
Added the same
* Use cmake cache string while setting the variable for ROCm Version
* correct the cmake-format
---------
Co-authored-by: Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>
Fixing kokkosp tool library packaging (#1121)
* Fixing kokkosp tool library packaging
* Update source/lib/rocprofiler-sdk-tool/kokkosp/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update CMakeLists.txt
* Update CMakeLists.txt
* Component Requirement in CPack
* Adding package dependency
* Update CMakeLists.txt
* Update rocprofiler_config_packaging.cmake
* Fix rocprofiler-sdk-tool-kokkosp BUILD/INSTALL RPATH
- CMAKE_INSTALL_LIBDIR doesn't help
* Add BUILD/INSTALL RPATH to rocprofv3-trigger-list-metrics
- fixes packaging issues
* Update packaging
- core depends on rocprofiler-sdk-roctx
- add CPACK_DEBIAN_PACKAGE_SHLIBDEPS_PRIVATE_DIRS to resolve inter-package dependencies
* Fix package depends version format
* Improve tests/rocprofv3/summary/validate logging
* Update CI workflow
- prioritize roctx package in Install Packages step
* Remove setting <package-name>_VERSION in config.cmake.in
- this is automatically handled by existence of <package-name>-config-version.cmake
* Update rocprofiler-sdk-config.cmake
- relax find_package versioning requirements to same major and minor version
* Update rocprofiler-sdk-config.cmake
- relax find_package versioning requirements (remove EXACT, specify range)
* Tweak CI workflow
* Update perfetto_reader.py
- better handle failure to load trace processor
* Misc cleanup for config packaging
* Update config packaging
* Update config packaging
* Revert perfetto for core-rpm packages
* Revert perfetto for core-rpm packages
- perfetto < 0.9.0
* Tweak tests/rocprofv3/summary/validate.py
- reorder some checks
---------
Co-authored-by: Ammar Elwazir <aelwazir@useocpm2m-387-013.amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Clang Warning Fixes (#1131)
Builds prevented on clang-18
Adding start and end timestamp columns in csv (#1128)
* Adding start and end timestamp columns in csv
* Adding assert check for the counter timestamps
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
rocprofv3: docs and help menu updates (#1129)
* doc updates
* Correcting ROCtx information
* Making ROCTx string consistent
* missing occurence
Renamed agent profiling service to device counting service (#1132)
* Renamed agent profiling service to device counting service
Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:
find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +
* Converted dispatch profile to dispatch counting service
* Debug for functioal counters test
* Minor changes for CI
* Minor fix
* More fixes for CI
* Update evaluate_ast.cpp
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
Testing updated RPM dockers (#1136)
* Testing updated RPM dockers
* Trying to fix PSDB for test package dependency
Agent Profiling Fixes for Broken/Improper API Usage (#1122)
Prevent's multiple setups of agent profiling on the same agent.
Fixes agent read context to only read agents that were setup.
Prevent copy of agent profiling internal data struct and reset
hsa_signal on move to prevent inadvertant delete.
Simplifying PR template (#1139)
Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.
Fixing installed pacakge tests in CI (#1119)
* Fixing installed pacakge tests in CI
* Formatted rocprofv3.py with black formatter
Fix PSDB change (#1120)
Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit c77e4d3b80
delete unused files
added arguments to some OMPT buffter records
* Fix cmake issues
Remove rocprofiler_ompt_finalize_tool
- a public API function is not necessary: should just finalize rocprofiler-sdk
Fix duplicate ROCPROFILER_{BUFFER,CALLBACK}_TRACING_KIND_STRING
Add lib/rocprofiler-sdk/ompt.hpp
- declares rocprofiler::sdk::finalize_ompt
Remove change to tests/rocprofv3/summary/conftest.py
Add set_fini_status(1) back to registration.cpp
Deleted uneeded files
Incoporate OpenMP code and sample
Fix merge issues with amd-staging
Add push_correlation_id for OpenMP tasking; improve debugability
fixup bad merge
* Suppress OpenMP data race
* Fix openmp_target sample
* Enum and struct name changes + source code reorg
- remove mix of ompt and openmp
- opted for ompt
- changes made for consistency
- ompt_api -> ompt
- openmp_api -> ompt
- OPENMP -> OMPT
* Update tests and more renaming
- dest_device_num -> dst_device_num
- src_addr -> src_address
- dest_addr -> dst_address
- remove info_type::begin
- require OMP_TARGET_OFFLOAD
* Update openmp-target test/sample env and labels
* Formatting
* Tweaks to cmake for openmp target
- Disable for thread sanitizers due to preloading issue
* OpenMP target cmake updates
- remove gfx1010 (fails on mi300)
- OPENMP_GPU_TARGETS
* Remove device_unload and target_map_emi support
- these are never supported by AMD OpenMP compilers
* Update CI workflow
- exclude openmp-target tests from navi3 and vega20
---------
Co-authored-by: Larry Meadows <Lawrence.Meadows@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 00c46fd5e5]
This commit is contained in:
@@ -23,8 +23,14 @@ env:
|
||||
ROCM_PATH: "/opt/rocm"
|
||||
GPU_TARGETS: "gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942 gfx1030 gfx1100 gfx1101 gfx1102"
|
||||
PATH: "/usr/bin:$PATH"
|
||||
TEMP_EXCLUDED_TESTS: "test-page-migration|app-abort"
|
||||
EXCLUDED_TESTS: ".*pc-sampling.*|.*pc_sampling.*"
|
||||
navi3_EXCLUDE_TESTS_REGEX: "^(test-page-migration-(execute|validate)|rocprofv3-test-(execute|validate)-app-abort)$"
|
||||
vega20_EXCLUDE_TESTS_REGEX: "^(test-page-migration-(execute|validate)|rocprofv3-test-(execute|validate)-app-abort)$"
|
||||
mi200_EXCLUDE_TESTS_REGEX: "^(test-page-migration-(execute|validate)|rocprofv3-test-(execute|validate)-app-abort)$"
|
||||
mi300_EXCLUDE_TESTS_REGEX: "^(test-page-migration-(execute|validate)|rocprofv3-test-(execute|validate)-app-abort)$"
|
||||
navi3_EXCLUDE_LABEL_REGEX: "^(pc-sampling|openmp-target)$"
|
||||
vega20_EXCLUDE_LABEL_REGEX: "^(pc-sampling|openmp-target)$"
|
||||
mi200_EXCLUDE_LABEL_REGEX: "^(openmp-target)$"
|
||||
mi300_EXCLUDE_LABEL_REGEX: "^(pc-sampling)$"
|
||||
|
||||
jobs:
|
||||
core-deb:
|
||||
@@ -32,12 +38,12 @@ jobs:
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
runner: ['vega20-emu', 'mi300-emu']
|
||||
runner: ['vega20', 'mi300']
|
||||
os: ['ubuntu-22.04']
|
||||
build-type: ['RelWithDebInfo']
|
||||
ci-flags: ['--linter clang-tidy']
|
||||
|
||||
runs-on: ${{ matrix.runner }}-runner-set
|
||||
runs-on: ${{ matrix.runner }}-emu-runner-set
|
||||
|
||||
# define this for containers
|
||||
env:
|
||||
@@ -72,7 +78,6 @@ jobs:
|
||||
if: ${{ contains(matrix.runner, 'mi200') }}
|
||||
shell: bash
|
||||
run: |
|
||||
echo "EXCLUDED_TESTS=''" >> $GITHUB_ENV
|
||||
echo 'ROCPROFILER_PC_SAMPLING_BETA_ENABLED=1' >> $GITHUB_ENV
|
||||
|
||||
- name: Configure, Build, and Test
|
||||
@@ -94,8 +99,8 @@ jobs:
|
||||
-DCPACK_PACKAGING_INSTALL_PREFIX="$(realpath /opt/rocm)"
|
||||
-DPython3_EXECUTABLE=$(which python3)
|
||||
--
|
||||
-LE "${EXCLUDED_TESTS}"
|
||||
-E "${{ env.TEMP_EXCLUDED_TESTS }}"
|
||||
-LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}"
|
||||
-E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}"
|
||||
|
||||
- name: Install
|
||||
if: ${{ contains(matrix.runner, env.CORE_EXT_RUNNER) }}
|
||||
@@ -119,8 +124,8 @@ jobs:
|
||||
export LD_LIBRARY_PATH=/opt/rocprofiler-sdk/lib:${LD_LIBRARY_PATH}
|
||||
cmake --build build-samples --target all --parallel 16
|
||||
cmake --build build-tests --target all --parallel 16
|
||||
ctest --test-dir build-samples -LE "${EXCLUDED_TESTS}" -E "${{ env.TEMP_EXCLUDED_TESTS }}" --output-on-failure
|
||||
ctest --test-dir build-tests -LE "${EXCLUDED_TESTS}" -E "${{ env.TEMP_EXCLUDED_TESTS }}" --output-on-failure
|
||||
ctest --test-dir build-samples -LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}" -E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}" --output-on-failure
|
||||
ctest --test-dir build-tests -LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}" -E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}" --output-on-failure
|
||||
|
||||
- name: Install Packages
|
||||
if: ${{ contains(matrix.runner, env.CORE_EXT_RUNNER) }}
|
||||
@@ -142,8 +147,8 @@ jobs:
|
||||
CMAKE_PREFIX_PATH=/opt/rocm cmake -B build-tests-deb /opt/rocm/share/rocprofiler-sdk/tests
|
||||
cmake --build build-samples-deb --target all --parallel 16
|
||||
cmake --build build-tests-deb --target all --parallel 16
|
||||
ctest --test-dir build-samples-deb -LE "${EXCLUDED_TESTS}" -E "${{ env.TEMP_EXCLUDED_TESTS }}" --output-on-failure
|
||||
ctest --test-dir build-tests-deb -LE "${EXCLUDED_TESTS}" -E "${{ env.TEMP_EXCLUDED_TESTS }}" --output-on-failure
|
||||
ctest --test-dir build-samples-deb -LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}" -E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}" --output-on-failure
|
||||
ctest --test-dir build-tests-deb -LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}" -E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}" --output-on-failure
|
||||
|
||||
- name: Archive production artifacts
|
||||
if: ${{ contains(matrix.runner, env.CORE_EXT_RUNNER) }}
|
||||
@@ -160,7 +165,7 @@ jobs:
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
runner: ['mi300-emu']
|
||||
runner: ['mi300']
|
||||
os: ['rhel-emu', 'sles-emu']
|
||||
build-type: ['RelWithDebInfo']
|
||||
ci-flags: ['--linter clang-tidy']
|
||||
@@ -196,7 +201,6 @@ jobs:
|
||||
if: ${{ contains(matrix.runner, 'mi200') }}
|
||||
shell: bash
|
||||
run: |
|
||||
echo "EXCLUDED_TESTS=''" >> $GITHUB_ENV
|
||||
echo 'ROCPROFILER_PC_SAMPLING_BETA_ENABLED=1' >> $GITHUB_ENV
|
||||
|
||||
- name: Configure, Build, and Test
|
||||
@@ -218,8 +222,8 @@ jobs:
|
||||
-DCPACK_PACKAGING_INSTALL_PREFIX="$(realpath /opt/rocm)"
|
||||
-DPython3_EXECUTABLE=$(which python3)
|
||||
--
|
||||
-LE "${EXCLUDED_TESTS}"
|
||||
-E "${{ env.TEMP_EXCLUDED_TESTS }}"
|
||||
-LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}"
|
||||
-E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}"
|
||||
|
||||
- name: Install
|
||||
if: ${{ contains(matrix.runner, env.CORE_EXT_RUNNER) }}
|
||||
@@ -243,8 +247,8 @@ jobs:
|
||||
export LD_LIBRARY_PATH=/opt/rocprofiler-sdk/lib:${LD_LIBRARY_PATH}
|
||||
cmake --build build-samples --target all --parallel 16
|
||||
cmake --build build-tests --target all --parallel 16
|
||||
ctest --test-dir build-samples -LE "${EXCLUDED_TESTS}" -E "${{ env.TEMP_EXCLUDED_TESTS }}" --output-on-failure
|
||||
ctest --test-dir build-tests -LE "${EXCLUDED_TESTS}" -E "${{ env.TEMP_EXCLUDED_TESTS }}" --output-on-failure
|
||||
ctest --test-dir build-samples -LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}" -E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}" --output-on-failure
|
||||
ctest --test-dir build-tests -LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}" -E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}" --output-on-failure
|
||||
|
||||
- name: Install Packages
|
||||
if: ${{ contains(matrix.runner, env.CORE_EXT_RUNNER) }}
|
||||
@@ -266,8 +270,8 @@ jobs:
|
||||
CMAKE_PREFIX_PATH=/opt/rocm cmake -B build-tests-deb /opt/rocm/share/rocprofiler-sdk/tests
|
||||
cmake --build build-samples-deb --target all --parallel 16
|
||||
cmake --build build-tests-deb --target all --parallel 16
|
||||
ctest --test-dir build-samples-deb -LE "${EXCLUDED_TESTS}" -E "${{ env.TEMP_EXCLUDED_TESTS }}" --output-on-failure
|
||||
ctest --test-dir build-tests-deb -LE "${EXCLUDED_TESTS}" -E "${{ env.TEMP_EXCLUDED_TESTS }}" --output-on-failure
|
||||
ctest --test-dir build-samples-deb -LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}" -E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}" --output-on-failure
|
||||
ctest --test-dir build-tests-deb -LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}" -E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}" --output-on-failure
|
||||
|
||||
- name: Archive production artifacts
|
||||
if: ${{ contains(matrix.runner, env.CORE_EXT_RUNNER) }}
|
||||
@@ -284,11 +288,11 @@ jobs:
|
||||
strategy:
|
||||
# fail-fast: false
|
||||
matrix:
|
||||
runner: ['mi200-emu']
|
||||
runner: ['mi200']
|
||||
os: ['ubuntu-22.04']
|
||||
build-type: ['Release']
|
||||
|
||||
runs-on: ${{ matrix.runner }}-runner-set
|
||||
runs-on: ${{ matrix.runner }}-emu-runner-set
|
||||
|
||||
# define this for containers
|
||||
env:
|
||||
@@ -361,7 +365,6 @@ jobs:
|
||||
if: ${{ contains(matrix.runner, 'mi200') }}
|
||||
shell: bash
|
||||
run: |
|
||||
echo "EXCLUDED_TESTS=''" >> $GITHUB_ENV
|
||||
echo 'ROCPROFILER_PC_SAMPLING_BETA_ENABLED=1' >> $GITHUB_ENV
|
||||
|
||||
- name: Configure, Build, and Test (Total Code Coverage)
|
||||
@@ -379,8 +382,8 @@ jobs:
|
||||
-DCMAKE_BUILD_TYPE=${{ matrix.build-type }}
|
||||
-DPython3_EXECUTABLE=$(which python3)
|
||||
--
|
||||
-LE "${EXCLUDED_TESTS}"
|
||||
-E "${{ env.TEMP_EXCLUDED_TESTS }}"
|
||||
-LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}"
|
||||
-E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}"
|
||||
|
||||
- name: Configure, Build, and Test (Tests Code Coverage)
|
||||
timeout-minutes: 30
|
||||
@@ -398,8 +401,8 @@ jobs:
|
||||
-DCMAKE_BUILD_TYPE=${{ matrix.build-type }}
|
||||
-DPython3_EXECUTABLE=$(which python3)
|
||||
--
|
||||
-LE "${EXCLUDED_TESTS}"
|
||||
-E "${{ env.TEMP_EXCLUDED_TESTS }}"
|
||||
-LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}"
|
||||
-E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}"
|
||||
|
||||
- name: Configure, Build, and Test (Samples Code Coverage)
|
||||
timeout-minutes: 30
|
||||
@@ -417,8 +420,8 @@ jobs:
|
||||
-DCMAKE_BUILD_TYPE=${{ matrix.build-type }}
|
||||
-DPython3_EXECUTABLE=$(which python3)
|
||||
--
|
||||
-LE "${EXCLUDED_TESTS}"
|
||||
-E "${{ env.TEMP_EXCLUDED_TESTS }}"
|
||||
-LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}"
|
||||
-E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}"
|
||||
|
||||
- name: Save XML Code Coverage
|
||||
id: save-coverage
|
||||
@@ -536,7 +539,7 @@ jobs:
|
||||
# - unittests
|
||||
# - integration-tests
|
||||
#
|
||||
ctest -N -LE 'samples|tests' -E "${{ env.TEMP_EXCLUDED_TESTS }}" -O ctest.mislabeled.log
|
||||
ctest -N -LE 'samples|tests' -E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}" -O ctest.mislabeled.log
|
||||
grep 'Total Tests: 0' ctest.mislabeled.log
|
||||
#
|
||||
# if following fails, then there is overlap between the labels.
|
||||
@@ -561,22 +564,22 @@ jobs:
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
runner: ['vega20-emu', 'navi3-emu', 'mi300-emu']
|
||||
runner: ['vega20', 'navi3', 'mi300']
|
||||
sanitizer: ['AddressSanitizer', 'ThreadSanitizer', 'LeakSanitizer', 'UndefinedBehaviorSanitizer']
|
||||
os: ['ubuntu-22.04']
|
||||
build-type: ['RelWithDebInfo']
|
||||
exclude:
|
||||
- { runner: 'vega20-emu', sanitizer: 'ThreadSanitizer' }
|
||||
- { runner: 'vega20-emu', sanitizer: 'AddressSanitizer' }
|
||||
- { runner: 'vega20-emu', sanitizer: 'UndefinedBehaviorSanitizer' }
|
||||
- { runner: 'mi300-emu', sanitizer: 'AddressSanitizer' }
|
||||
- { runner: 'mi300-emu', sanitizer: 'LeakSanitizer' }
|
||||
- { runner: 'navi3-emu', sanitizer: 'LeakSanitizer' }
|
||||
- { runner: 'navi3-emu', sanitizer: 'ThreadSanitizer' }
|
||||
- { runner: 'navi3-emu', sanitizer: 'UndefinedBehaviorSanitizer' }
|
||||
- { runner: 'vega20', sanitizer: 'ThreadSanitizer' }
|
||||
- { runner: 'vega20', sanitizer: 'AddressSanitizer' }
|
||||
- { runner: 'vega20', sanitizer: 'UndefinedBehaviorSanitizer' }
|
||||
- { runner: 'mi300', sanitizer: 'AddressSanitizer' }
|
||||
- { runner: 'mi300', sanitizer: 'LeakSanitizer' }
|
||||
- { runner: 'navi3', sanitizer: 'LeakSanitizer' }
|
||||
- { runner: 'navi3', sanitizer: 'ThreadSanitizer' }
|
||||
- { runner: 'navi3', sanitizer: 'UndefinedBehaviorSanitizer' }
|
||||
|
||||
if: ${{ contains(github.event_name, 'pull_request') }}
|
||||
runs-on: ${{ matrix.runner }}-runner-set
|
||||
runs-on: ${{ matrix.runner }}-emu-runner-set
|
||||
|
||||
# define this for containers
|
||||
env:
|
||||
@@ -611,7 +614,6 @@ jobs:
|
||||
if: ${{ contains(matrix.runner, 'mi200') }}
|
||||
shell: bash
|
||||
run: |
|
||||
echo "EXCLUDED_TESTS=''" >> $GITHUB_ENV
|
||||
echo 'ROCPROFILER_PC_SAMPLING_BETA_ENABLED=1' >> $GITHUB_ENV
|
||||
|
||||
- name: Configure, Build, and Test
|
||||
@@ -630,5 +632,5 @@ jobs:
|
||||
-DCMAKE_INSTALL_PREFIX="${{ env.ROCM_PATH }}"
|
||||
-DPython3_EXECUTABLE=$(which python3)
|
||||
--
|
||||
-LE "${EXCLUDED_TESTS}"
|
||||
-E "${{ env.TEMP_EXCLUDED_TESTS }}"
|
||||
-LE "${${{ matrix.runner }}_EXCLUDE_LABEL_REGEX}"
|
||||
-E "${${{ matrix.runner }}_EXCLUDE_TESTS_REGEX}"
|
||||
|
||||
@@ -60,7 +60,6 @@ set(${PACKAGE_NAME}_BUILD_TREE
|
||||
set(PROJECT_BUILD_TREE_TARGETS
|
||||
${SDK_PACKAGE_NAME}::${PACKAGE_NAME}-shared-library
|
||||
${SDK_PACKAGE_NAME}::${SDK_PACKAGE_NAME}-headers
|
||||
${SDK_PACKAGE_NAME}::${SDK_PACKAGE_NAME}-build-flags
|
||||
${SDK_PACKAGE_NAME}::${SDK_PACKAGE_NAME}-stack-protector)
|
||||
|
||||
configure_file(
|
||||
|
||||
@@ -33,3 +33,4 @@ add_subdirectory(code_object_isa_decode)
|
||||
add_subdirectory(advanced_thread_trace)
|
||||
add_subdirectory(external_correlation_id_request)
|
||||
add_subdirectory(pc_sampling)
|
||||
add_subdirectory(openmp_target)
|
||||
|
||||
@@ -189,7 +189,7 @@ tool_init(rocprofiler_client_finalize_t fini_func, void* tool_data)
|
||||
|
||||
call_stack_v->emplace_back(source_location{__FUNCTION__, __FILE__, __LINE__, ""});
|
||||
|
||||
callback_name_info name_info = common::get_callback_id_names();
|
||||
callback_name_info name_info = common::get_callback_tracing_names();
|
||||
|
||||
for(const auto& itr : name_info)
|
||||
{
|
||||
|
||||
@@ -48,7 +48,7 @@ get_buffer_tracing_names()
|
||||
}
|
||||
|
||||
inline auto
|
||||
get_callback_id_names()
|
||||
get_callback_tracing_names()
|
||||
{
|
||||
return rocprofiler::sdk::get_callback_tracing_names();
|
||||
}
|
||||
|
||||
@@ -0,0 +1,95 @@
|
||||
#
|
||||
#
|
||||
#
|
||||
cmake_minimum_required(VERSION 3.21.0 FATAL_ERROR)
|
||||
|
||||
if(NOT OMP_TARGET_COMPILER)
|
||||
find_program(
|
||||
amdclangpp_EXECUTABLE
|
||||
NAMES amdclang++
|
||||
HINTS ${ROCM_PATH} ENV ROCM_PATH /opt/rocm
|
||||
PATHS ${ROCM_PATH} ENV ROCM_PATH /opt/rocm
|
||||
PATH_SUFFIXES bin llvm/bin NO_CACHE)
|
||||
mark_as_advanced(amdclangpp_EXECUTABLE)
|
||||
|
||||
if(amdclangpp_EXECUTABLE)
|
||||
set(OMP_TARGET_COMPILER
|
||||
"${amdclangpp_EXECUTABLE}"
|
||||
CACHE FILEPATH "")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
project(rocprofiler-sdk-samples-openmp-target LANGUAGES CXX)
|
||||
|
||||
find_package(rocprofiler-sdk REQUIRED)
|
||||
|
||||
add_library(openmp-target-sample-client SHARED)
|
||||
target_sources(openmp-target-sample-client PRIVATE client.cpp client.hpp)
|
||||
target_link_libraries(
|
||||
openmp-target-sample-client
|
||||
PRIVATE rocprofiler-sdk::rocprofiler-sdk rocprofiler-sdk::samples-build-flags
|
||||
rocprofiler-sdk::samples-common-library)
|
||||
|
||||
set(DEFAULT_GPU_TARGETS
|
||||
"gfx906"
|
||||
"gfx908"
|
||||
"gfx90a"
|
||||
"gfx940"
|
||||
"gfx941"
|
||||
"gfx942"
|
||||
"gfx1100"
|
||||
"gfx1101"
|
||||
"gfx1102")
|
||||
|
||||
set(OPENMP_GPU_TARGETS
|
||||
"${DEFAULT_GPU_TARGETS}"
|
||||
CACHE STRING "GPU targets to compile for")
|
||||
|
||||
if(ROCPROFILER_MEMCHECK STREQUAL "ThreadSanitizer")
|
||||
set(IS_THREAD_SANITIZER ON)
|
||||
else()
|
||||
set(IS_THREAD_SANITIZER OFF)
|
||||
endif()
|
||||
|
||||
find_package(Threads REQUIRED)
|
||||
find_package(rocprofiler-sdk-roctx REQUIRED)
|
||||
|
||||
add_executable(openmp-target-sample)
|
||||
target_sources(openmp-target-sample PRIVATE main.cpp)
|
||||
target_link_libraries(
|
||||
openmp-target-sample PRIVATE Threads::Threads
|
||||
rocprofiler-sdk-roctx::rocprofiler-sdk-roctx)
|
||||
target_compile_options(openmp-target-sample PRIVATE -fopenmp)
|
||||
target_link_options(openmp-target-sample PRIVATE -fopenmp)
|
||||
|
||||
foreach(_TARGET ${OPENMP_GPU_TARGETS})
|
||||
target_compile_options(openmp-target-sample PRIVATE --offload-arch=${_TARGET})
|
||||
target_link_options(openmp-target-sample PRIVATE --offload-arch=${_TARGET})
|
||||
endforeach()
|
||||
|
||||
include(rocprofiler-sdk-custom-compilation)
|
||||
rocprofiler_sdk_custom_compilation(TARGET openmp-target-sample
|
||||
COMPILER ${OMP_TARGET_COMPILER})
|
||||
|
||||
rocprofiler_samples_get_preload_env(PRELOAD_ENV openmp-target-sample-client)
|
||||
rocprofiler_samples_get_ld_library_path_env(
|
||||
LIBRARY_PATH_ENV rocprofiler-sdk-roctx::rocprofiler-sdk-roctx-shared-library)
|
||||
|
||||
set(openmp-target-sample-env
|
||||
${PRELOAD_ENV} ${LIBRARY_PATH_ENV} "OMP_NUM_THREADS=2" "OMP_TARGET_OFFLOAD=mandatory"
|
||||
"OMP_DISPLAY_ENV=1" "ROCR_VISIBLE_DEVICES=0")
|
||||
|
||||
add_test(NAME openmp-target-sample COMMAND $<TARGET_FILE:openmp-target-sample>)
|
||||
|
||||
set_tests_properties(
|
||||
openmp-target-sample
|
||||
PROPERTIES TIMEOUT
|
||||
45
|
||||
LABELS
|
||||
"samples;openmp-target"
|
||||
ENVIRONMENT
|
||||
"${openmp-target-sample-env}"
|
||||
FAIL_REGULAR_EXPRESSION
|
||||
"${ROCPROFILER_DEFAULT_FAIL_REGEX}"
|
||||
DISABLED
|
||||
"${IS_THREAD_SANITIZER}")
|
||||
@@ -0,0 +1,660 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
// undefine NDEBUG so asserts are implemented
|
||||
#ifdef NDEBUG
|
||||
# undef NDEBUG
|
||||
#endif
|
||||
|
||||
/**
|
||||
* @file samples/ompt/client.cpp
|
||||
*
|
||||
* @brief Example rocprofiler client (tool)
|
||||
*/
|
||||
|
||||
#include "client.hpp"
|
||||
|
||||
#include <rocprofiler-sdk/callback_tracing.h>
|
||||
#include <rocprofiler-sdk/context.h>
|
||||
#include <rocprofiler-sdk/fwd.h>
|
||||
#include <rocprofiler-sdk/marker/api_id.h>
|
||||
#include <rocprofiler-sdk/ompt.h>
|
||||
#include <rocprofiler-sdk/registration.h>
|
||||
#include <rocprofiler-sdk/rocprofiler.h>
|
||||
|
||||
#include "common/call_stack.hpp"
|
||||
#include "common/defines.hpp"
|
||||
#include "common/filesystem.hpp"
|
||||
#include "common/name_info.hpp"
|
||||
|
||||
#include <cassert>
|
||||
#include <chrono>
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
#include <cstdio>
|
||||
#include <cstdlib>
|
||||
#include <fstream>
|
||||
#include <functional>
|
||||
#include <iomanip>
|
||||
#include <iostream>
|
||||
#include <map>
|
||||
#include <mutex>
|
||||
#include <ratio>
|
||||
#include <string>
|
||||
#include <string_view>
|
||||
#include <unordered_set>
|
||||
#include <vector>
|
||||
namespace client
|
||||
{
|
||||
namespace
|
||||
{
|
||||
using common::call_stack_t;
|
||||
using common::callback_name_info;
|
||||
using common::source_location;
|
||||
using kernel_symbol_data_t = rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t;
|
||||
using kernel_symbol_map_t = std::unordered_map<rocprofiler_kernel_id_t, kernel_symbol_data_t>;
|
||||
|
||||
rocprofiler_client_id_t* client_id = nullptr;
|
||||
rocprofiler_client_finalize_t client_fini_func = nullptr;
|
||||
auto client_ctx = rocprofiler_context_id_t{};
|
||||
auto cb_name_info = common::get_callback_tracing_names();
|
||||
auto bf_name_info = common::get_buffer_tracing_names();
|
||||
auto client_buffer = rocprofiler_buffer_id_t{};
|
||||
auto client_kernels = kernel_symbol_map_t{};
|
||||
auto call_stack_mtx = std::mutex{};
|
||||
|
||||
auto
|
||||
get_call_stack_lock()
|
||||
{
|
||||
return std::unique_lock<std::mutex>{call_stack_mtx};
|
||||
}
|
||||
|
||||
void
|
||||
print_call_stack(const call_stack_t& _call_stack)
|
||||
{
|
||||
common::print_call_stack("openmp_target_trace.log", _call_stack);
|
||||
}
|
||||
|
||||
void
|
||||
tool_tracing_ctrl_callback(rocprofiler_callback_tracing_record_t record,
|
||||
rocprofiler_user_data_t*,
|
||||
void* client_data)
|
||||
{
|
||||
auto* ctx = static_cast<rocprofiler_context_id_t*>(client_data);
|
||||
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER &&
|
||||
record.kind == ROCPROFILER_CALLBACK_TRACING_MARKER_CONTROL_API &&
|
||||
record.operation == ROCPROFILER_MARKER_CONTROL_API_ID_roctxProfilerPause)
|
||||
{
|
||||
ROCPROFILER_CALL(rocprofiler_stop_context(*ctx), "pausing client context");
|
||||
}
|
||||
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT &&
|
||||
record.kind == ROCPROFILER_CALLBACK_TRACING_MARKER_CONTROL_API &&
|
||||
record.operation == ROCPROFILER_MARKER_CONTROL_API_ID_roctxProfilerResume)
|
||||
{
|
||||
ROCPROFILER_CALL(rocprofiler_start_context(*ctx), "resuming client context");
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
tool_callback_tracing_callback(rocprofiler_callback_tracing_record_t record,
|
||||
rocprofiler_user_data_t* user_data,
|
||||
void* callback_data)
|
||||
{
|
||||
assert(callback_data != nullptr);
|
||||
|
||||
if(record.kind == ROCPROFILER_CALLBACK_TRACING_HSA_CORE_API)
|
||||
{
|
||||
if(record.operation == ROCPROFILER_HSA_CORE_API_ID_hsa_queue_destroy)
|
||||
{
|
||||
// skip hsa_queue_destroy for now, it tries to print the queue after it is destroyed
|
||||
return;
|
||||
}
|
||||
}
|
||||
else if(record.kind == ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT &&
|
||||
record.operation == ROCPROFILER_CODE_OBJECT_LOAD)
|
||||
{
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_UNLOAD)
|
||||
{
|
||||
// flush the buffer to ensure that any lookups for the client kernel names for the code
|
||||
// object are completed
|
||||
auto flush_status = rocprofiler_flush_buffer(client_buffer);
|
||||
if(flush_status != ROCPROFILER_STATUS_ERROR_BUFFER_BUSY)
|
||||
ROCPROFILER_CALL(flush_status, "buffer flush");
|
||||
}
|
||||
}
|
||||
else if(record.kind == ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT &&
|
||||
record.operation == ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER)
|
||||
{
|
||||
auto* data = static_cast<kernel_symbol_data_t*>(record.payload);
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_LOAD)
|
||||
{
|
||||
client_kernels.emplace(data->kernel_id, *data);
|
||||
}
|
||||
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_UNLOAD)
|
||||
{
|
||||
client_kernels.erase(data->kernel_id);
|
||||
}
|
||||
}
|
||||
else if(record.kind == ROCPROFILER_CALLBACK_TRACING_OMPT)
|
||||
{
|
||||
// demonstrate the use of the ompt_data_t* fields from OMPT
|
||||
// The client has its own version of those fields as well as an interface to the
|
||||
// ompt API entry points.
|
||||
auto* data = static_cast<rocprofiler_callback_tracing_ompt_data_t*>(record.payload);
|
||||
|
||||
if(record.operation == ROCPROFILER_OMPT_ID_parallel_begin)
|
||||
{
|
||||
// set the parallel_data value
|
||||
auto& args = data->args.parallel_begin;
|
||||
args.parallel_data->value = record.correlation_id.internal;
|
||||
}
|
||||
else if(record.operation == ROCPROFILER_OMPT_ID_parallel_end)
|
||||
{
|
||||
// set the parallel_data value
|
||||
auto& args = data->args.parallel_end;
|
||||
args.parallel_data->value = 0;
|
||||
}
|
||||
else if(record.operation == ROCPROFILER_OMPT_ID_thread_begin)
|
||||
{
|
||||
// set the thread_data value
|
||||
auto& args = data->args.thread_begin;
|
||||
args.thread_data->value = record.thread_id;
|
||||
}
|
||||
else if(record.operation == ROCPROFILER_OMPT_ID_thread_end)
|
||||
{
|
||||
// set the thread_data value
|
||||
auto& args = data->args.thread_end;
|
||||
args.thread_data->value = 0;
|
||||
}
|
||||
else if(record.operation == ROCPROFILER_OMPT_ID_implicit_task)
|
||||
{
|
||||
auto& args = data->args.implicit_task;
|
||||
// set the task_data value
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
|
||||
args.task_data->value = record.correlation_id.internal;
|
||||
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
|
||||
args.task_data->value = 0;
|
||||
else
|
||||
assert(0);
|
||||
}
|
||||
else if(record.operation == ROCPROFILER_OMPT_ID_target_emi)
|
||||
{
|
||||
auto& args = data->args.target_emi;
|
||||
// set the target_data value
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
|
||||
args.target_data->value = record.correlation_id.internal;
|
||||
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
|
||||
args.target_data->value = 0;
|
||||
else
|
||||
assert(0);
|
||||
}
|
||||
else if(record.operation == ROCPROFILER_OMPT_ID_target_data_op_emi)
|
||||
{
|
||||
auto& args = data->args.target_data_op_emi;
|
||||
// set the host_op_id value
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
|
||||
args.host_op_id->value = record.correlation_id.internal;
|
||||
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
|
||||
args.host_op_id->value = 0;
|
||||
else
|
||||
assert(0);
|
||||
}
|
||||
else if(record.operation == ROCPROFILER_OMPT_ID_target_submit_emi)
|
||||
{
|
||||
// set the host_op_id value
|
||||
auto& args = data->args.target_submit_emi;
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
|
||||
args.host_op_id->value = record.correlation_id.internal;
|
||||
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
|
||||
args.host_op_id->value = 0;
|
||||
else
|
||||
assert(0);
|
||||
}
|
||||
}
|
||||
|
||||
auto now = std::chrono::steady_clock::now().time_since_epoch().count();
|
||||
uint64_t dt = 0;
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
|
||||
user_data->value = now;
|
||||
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
|
||||
dt = (now - user_data->value);
|
||||
|
||||
const char* name = nullptr;
|
||||
rocprofiler_query_callback_tracing_kind_operation_name(
|
||||
record.kind, record.operation, &name, nullptr);
|
||||
|
||||
auto info = std::stringstream{};
|
||||
info << std::left << "tid=" << record.thread_id << ", cid=" << std::setw(3)
|
||||
<< record.correlation_id.internal << ", kind=" << std::setw(2) << record.kind
|
||||
<< ", operation=" << std::setw(3) << record.operation << ", phase=" << record.phase
|
||||
<< ", dt_nsec=" << std::setw(8) << dt << ", name=" << name;
|
||||
|
||||
auto info_data_cb = [](rocprofiler_callback_tracing_kind_t,
|
||||
int,
|
||||
uint32_t arg_num,
|
||||
const void* const arg_value_addr,
|
||||
int32_t indirection_count,
|
||||
const char* arg_type,
|
||||
const char* arg_name,
|
||||
const char* arg_value_str,
|
||||
int32_t dereference_count,
|
||||
void* cb_data) -> int {
|
||||
auto& dss = *static_cast<std::stringstream*>(cb_data);
|
||||
dss << ((arg_num == 0) ? "(" : ", ");
|
||||
dss << arg_num << ": " << arg_name << "=" << arg_value_str;
|
||||
(void) arg_value_addr;
|
||||
(void) arg_type;
|
||||
(void) indirection_count;
|
||||
(void) dereference_count;
|
||||
return 0;
|
||||
};
|
||||
|
||||
int32_t max_deref = 1;
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
|
||||
// not for PHASE_NONE
|
||||
max_deref = 2;
|
||||
auto info_data = std::stringstream{};
|
||||
if(record.kind != ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT)
|
||||
{
|
||||
ROCPROFILER_CALL(rocprofiler_iterate_callback_tracing_kind_operation_args(
|
||||
record, info_data_cb, max_deref, static_cast<void*>(&info_data)),
|
||||
"Failure iterating trace operation args");
|
||||
}
|
||||
|
||||
auto info_data_str = info_data.str();
|
||||
if(!info_data_str.empty()) info << " " << info_data_str << ")";
|
||||
|
||||
auto* call_stack_v = static_cast<call_stack_t*>(callback_data);
|
||||
auto _lk = get_call_stack_lock();
|
||||
call_stack_v->emplace_back(source_location{__FUNCTION__, __FILE__, __LINE__, info.str()});
|
||||
}
|
||||
|
||||
void
|
||||
tool_buffered_tracing_callback(rocprofiler_context_id_t context,
|
||||
rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler_record_header_t** headers,
|
||||
size_t num_headers,
|
||||
void* user_data,
|
||||
uint64_t drop_count)
|
||||
{
|
||||
assert(drop_count == 0 && "drop count should be zero for lossless policy");
|
||||
|
||||
auto* call_stack_v = static_cast<call_stack_t*>(user_data);
|
||||
|
||||
for(size_t i = 0; i < num_headers; ++i)
|
||||
{
|
||||
auto* header = headers[i];
|
||||
|
||||
if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
|
||||
header->kind == ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH)
|
||||
{
|
||||
auto* record =
|
||||
static_cast<rocprofiler_buffer_tracing_kernel_dispatch_record_t*>(header->payload);
|
||||
|
||||
auto info = std::stringstream{};
|
||||
|
||||
auto dt = (record->end_timestamp - record->start_timestamp);
|
||||
info << std::left << "tid=" << record->thread_id << ", cid=" << std::setw(3)
|
||||
<< record->correlation_id.internal << ", kind=" << std::setw(2) << record->kind
|
||||
<< ", operation=" << std::setw(3) << record->operation << ", phase= "
|
||||
<< ", dt_nsec=" << std::setw(8) << dt
|
||||
<< ", agent_id=" << record->dispatch_info.agent_id.handle
|
||||
<< ", queue_id=" << record->dispatch_info.queue_id.handle
|
||||
<< ", kernel_id=" << record->dispatch_info.kernel_id
|
||||
<< ", kernel=" << client_kernels.at(record->dispatch_info.kernel_id).kernel_name
|
||||
<< ", start=" << record->start_timestamp << ", stop=" << record->end_timestamp
|
||||
<< ", private_segment_size=" << record->dispatch_info.private_segment_size
|
||||
<< ", group_segment_size=" << record->dispatch_info.group_segment_size
|
||||
<< ", workgroup_size=(" << record->dispatch_info.workgroup_size.x << ","
|
||||
<< record->dispatch_info.workgroup_size.y << ","
|
||||
<< record->dispatch_info.workgroup_size.z << "), grid_size=("
|
||||
<< record->dispatch_info.grid_size.x << "," << record->dispatch_info.grid_size.y
|
||||
<< "," << record->dispatch_info.grid_size.z << ")";
|
||||
|
||||
if(record->start_timestamp > record->end_timestamp)
|
||||
throw std::runtime_error("kernel dispatch: start > end");
|
||||
|
||||
auto _lk = get_call_stack_lock();
|
||||
call_stack_v->emplace_back(
|
||||
source_location{__FUNCTION__, __FILE__, __LINE__, info.str()});
|
||||
}
|
||||
else if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
|
||||
header->kind == ROCPROFILER_BUFFER_TRACING_MEMORY_COPY)
|
||||
{
|
||||
auto* record =
|
||||
static_cast<rocprofiler_buffer_tracing_memory_copy_record_t*>(header->payload);
|
||||
|
||||
auto info = std::stringstream{};
|
||||
|
||||
auto dt = (record->end_timestamp - record->start_timestamp);
|
||||
info << std::left << "tid=" << record->thread_id << ", cid=" << std::setw(3)
|
||||
<< record->correlation_id.internal << ", kind=" << std::setw(2) << record->kind
|
||||
<< ", operation=" << std::setw(3) << record->operation << ", phase= "
|
||||
<< ", dt_nsec=" << std::setw(8) << dt
|
||||
<< ", src_agent_id=" << record->src_agent_id.handle
|
||||
<< ", dst_agent_id=" << record->dst_agent_id.handle
|
||||
<< ", direction=" << record->operation << ", start=" << record->start_timestamp
|
||||
<< ", stop=" << record->end_timestamp
|
||||
<< ", name=" << bf_name_info.at(record->kind, record->operation);
|
||||
|
||||
if(record->start_timestamp > record->end_timestamp)
|
||||
throw std::runtime_error("memory copy: start > end");
|
||||
|
||||
auto _lk = get_call_stack_lock();
|
||||
call_stack_v->emplace_back(
|
||||
source_location{__FUNCTION__, __FILE__, __LINE__, info.str()});
|
||||
}
|
||||
else if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
|
||||
header->kind == ROCPROFILER_BUFFER_TRACING_SCRATCH_MEMORY)
|
||||
{
|
||||
auto* record =
|
||||
static_cast<rocprofiler_buffer_tracing_scratch_memory_record_t*>(header->payload);
|
||||
|
||||
auto info = std::stringstream{};
|
||||
|
||||
auto _elapsed =
|
||||
std::chrono::duration_cast<std::chrono::duration<double, std::micro>>(
|
||||
std::chrono::nanoseconds{record->end_timestamp - record->start_timestamp})
|
||||
.count();
|
||||
|
||||
auto dt = (record->end_timestamp - record->start_timestamp);
|
||||
info << std::left << "tid=" << record->thread_id << ", cid=" << std::setw(3)
|
||||
<< record->correlation_id.internal << ", kind=" << std::setw(2) << record->kind
|
||||
<< ", operation=" << std::setw(3) << record->operation << ", phase= "
|
||||
<< ", dt_nsec=" << std::setw(8) << dt << ", agent_id=" << record->agent_id.handle
|
||||
<< ", queue_id=" << record->queue_id.handle << ", thread_id=" << record->thread_id
|
||||
<< ", elapsed=" << std::setprecision(3) << std::fixed << _elapsed
|
||||
<< " usec, flags=" << record->flags
|
||||
<< ", name=" << bf_name_info.at(record->kind, record->operation);
|
||||
|
||||
auto _lk = get_call_stack_lock();
|
||||
call_stack_v->emplace_back(
|
||||
source_location{__FUNCTION__, __FILE__, __LINE__, info.str()});
|
||||
}
|
||||
else
|
||||
{
|
||||
auto _msg = std::stringstream{};
|
||||
_msg << "unexpected rocprofiler_record_header_t category + kind: (" << header->category
|
||||
<< " + " << header->kind << ")";
|
||||
throw std::runtime_error{_msg.str()};
|
||||
}
|
||||
}
|
||||
|
||||
(void) context;
|
||||
(void) buffer_id;
|
||||
}
|
||||
|
||||
void
|
||||
tool_control_init(rocprofiler_context_id_t& primary_ctx)
|
||||
{
|
||||
// Create a specialized (throw-away) context for handling ROCTx profiler pause and resume.
|
||||
// A separate context is used because if the context that is associated with roctxProfilerPause
|
||||
// disabled that same context, a call to roctxProfilerResume would be ignored because the
|
||||
// context that enables the callback for that API call is disabled.
|
||||
auto cntrl_ctx = rocprofiler_context_id_t{};
|
||||
ROCPROFILER_CALL(rocprofiler_create_context(&cntrl_ctx), "control context creation failed");
|
||||
|
||||
// enable callback marker tracing with only the pause/resume operations
|
||||
ROCPROFILER_CALL(rocprofiler_configure_callback_tracing_service(
|
||||
cntrl_ctx,
|
||||
ROCPROFILER_CALLBACK_TRACING_MARKER_CONTROL_API,
|
||||
nullptr,
|
||||
0,
|
||||
tool_tracing_ctrl_callback,
|
||||
&primary_ctx),
|
||||
"callback tracing service failed to configure");
|
||||
|
||||
// start the context so that it is always active
|
||||
ROCPROFILER_CALL(rocprofiler_start_context(cntrl_ctx), "start of control context");
|
||||
}
|
||||
|
||||
int
|
||||
tool_init(rocprofiler_client_finalize_t fini_func, void* tool_data)
|
||||
{
|
||||
assert(tool_data != nullptr);
|
||||
|
||||
auto* call_stack_v = static_cast<call_stack_t*>(tool_data);
|
||||
|
||||
call_stack_v->emplace_back(source_location{__FUNCTION__, __FILE__, __LINE__, ""});
|
||||
|
||||
for(const auto& itr : cb_name_info)
|
||||
{
|
||||
auto name_idx = std::stringstream{};
|
||||
name_idx << " [" << std::setw(3) << itr.value << "]";
|
||||
call_stack_v->emplace_back(
|
||||
source_location{"rocprofiler_callback_tracing_kind_names " + name_idx.str(),
|
||||
__FILE__,
|
||||
__LINE__,
|
||||
std::string{itr.name}});
|
||||
|
||||
for(auto [didx, ditr] : itr.items())
|
||||
{
|
||||
auto operation_idx = std::stringstream{};
|
||||
operation_idx << " [" << std::setw(3) << didx << "]";
|
||||
call_stack_v->emplace_back(source_location{
|
||||
"rocprofiler_callback_tracing_kind_operation_names" + operation_idx.str(),
|
||||
__FILE__,
|
||||
__LINE__,
|
||||
std::string{"- "} + std::string{*ditr}});
|
||||
}
|
||||
}
|
||||
|
||||
for(const auto& itr : bf_name_info)
|
||||
{
|
||||
auto name_idx = std::stringstream{};
|
||||
name_idx << " [" << std::setw(3) << itr.value << "]";
|
||||
call_stack_v->emplace_back(
|
||||
source_location{"rocprofiler_buffer_tracing_kind_names " + name_idx.str(),
|
||||
__FILE__,
|
||||
__LINE__,
|
||||
std::string{itr.name}});
|
||||
|
||||
for(auto [didx, ditr] : itr.items())
|
||||
{
|
||||
auto operation_idx = std::stringstream{};
|
||||
operation_idx << " [" << std::setw(3) << didx << "]";
|
||||
call_stack_v->emplace_back(source_location{
|
||||
"rocprofiler_buffer_tracing_kind_operation_names" + operation_idx.str(),
|
||||
__FILE__,
|
||||
__LINE__,
|
||||
std::string{"- "} + std::string{*ditr}});
|
||||
}
|
||||
}
|
||||
|
||||
client_fini_func = fini_func;
|
||||
|
||||
ROCPROFILER_CALL(rocprofiler_create_context(&client_ctx), "context creation failed");
|
||||
|
||||
// enable the control
|
||||
tool_control_init(client_ctx);
|
||||
|
||||
ROCPROFILER_CALL(
|
||||
rocprofiler_configure_callback_tracing_service(client_ctx,
|
||||
ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT,
|
||||
nullptr,
|
||||
0,
|
||||
tool_callback_tracing_callback,
|
||||
tool_data),
|
||||
"callback tracing service failed to configure");
|
||||
|
||||
ROCPROFILER_CALL(
|
||||
rocprofiler_configure_callback_tracing_service(client_ctx,
|
||||
ROCPROFILER_CALLBACK_TRACING_OMPT,
|
||||
nullptr,
|
||||
0,
|
||||
tool_callback_tracing_callback,
|
||||
tool_data),
|
||||
"callback tracing service failed to configure")
|
||||
|
||||
ROCPROFILER_CALL(
|
||||
rocprofiler_configure_callback_tracing_service(client_ctx,
|
||||
ROCPROFILER_CALLBACK_TRACING_MARKER_CORE_API,
|
||||
nullptr,
|
||||
0,
|
||||
tool_callback_tracing_callback,
|
||||
tool_data),
|
||||
"callback tracing service failed to configure");
|
||||
|
||||
ROCPROFILER_CALL(
|
||||
rocprofiler_configure_callback_tracing_service(client_ctx,
|
||||
ROCPROFILER_CALLBACK_TRACING_MARKER_NAME_API,
|
||||
nullptr,
|
||||
0,
|
||||
tool_callback_tracing_callback,
|
||||
tool_data),
|
||||
"callback tracing service failed to configure");
|
||||
|
||||
constexpr auto buffer_size_bytes = 4096;
|
||||
constexpr auto buffer_watermark_bytes = buffer_size_bytes - (buffer_size_bytes / 8);
|
||||
|
||||
ROCPROFILER_CALL(rocprofiler_create_buffer(client_ctx,
|
||||
buffer_size_bytes,
|
||||
buffer_watermark_bytes,
|
||||
ROCPROFILER_BUFFER_POLICY_LOSSLESS,
|
||||
tool_buffered_tracing_callback,
|
||||
tool_data,
|
||||
&client_buffer),
|
||||
"buffer creation");
|
||||
|
||||
for(auto itr : {ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH,
|
||||
ROCPROFILER_BUFFER_TRACING_MEMORY_COPY,
|
||||
ROCPROFILER_BUFFER_TRACING_SCRATCH_MEMORY})
|
||||
{
|
||||
ROCPROFILER_CALL(rocprofiler_configure_buffer_tracing_service(
|
||||
client_ctx, itr, nullptr, 0, client_buffer),
|
||||
"buffer tracing service configure");
|
||||
}
|
||||
|
||||
int valid_ctx = 0;
|
||||
ROCPROFILER_CALL(rocprofiler_context_is_valid(client_ctx, &valid_ctx),
|
||||
"failure checking context validity");
|
||||
if(valid_ctx == 0)
|
||||
{
|
||||
// notify rocprofiler that initialization failed
|
||||
// and all the contexts, buffers, etc. created
|
||||
// should be ignored
|
||||
return -1;
|
||||
}
|
||||
ROCPROFILER_CALL(rocprofiler_start_context(client_ctx), "rocprofiler context start failed");
|
||||
|
||||
// no errors
|
||||
return 0;
|
||||
}
|
||||
|
||||
void
|
||||
tool_fini(void* tool_data)
|
||||
{
|
||||
assert(tool_data != nullptr);
|
||||
|
||||
auto* _call_stack = static_cast<call_stack_t*>(tool_data);
|
||||
auto _lk = get_call_stack_lock();
|
||||
_call_stack->emplace_back(source_location{__FUNCTION__, __FILE__, __LINE__, ""});
|
||||
|
||||
print_call_stack(*_call_stack);
|
||||
|
||||
delete _call_stack;
|
||||
}
|
||||
} // namespace
|
||||
|
||||
void
|
||||
setup()
|
||||
{}
|
||||
|
||||
void
|
||||
shutdown()
|
||||
{
|
||||
if(client_id) client_fini_func(*client_id);
|
||||
}
|
||||
|
||||
void
|
||||
start()
|
||||
{
|
||||
ROCPROFILER_CALL(rocprofiler_start_context(client_ctx), "rocprofiler context start failed");
|
||||
}
|
||||
|
||||
void
|
||||
stop()
|
||||
{
|
||||
int status = 0;
|
||||
ROCPROFILER_CALL(rocprofiler_is_initialized(&status), "failed to retrieve init status");
|
||||
if(status != 0)
|
||||
{
|
||||
ROCPROFILER_CALL(rocprofiler_stop_context(client_ctx), "rocprofiler context stop failed");
|
||||
}
|
||||
}
|
||||
} // namespace client
|
||||
|
||||
extern "C" rocprofiler_tool_configure_result_t*
|
||||
rocprofiler_configure(uint32_t version,
|
||||
const char* runtime_version,
|
||||
uint32_t priority,
|
||||
rocprofiler_client_id_t* id)
|
||||
{
|
||||
// set the client name
|
||||
id->name = "ExampleTool";
|
||||
|
||||
// store client info
|
||||
client::client_id = id;
|
||||
|
||||
// compute major/minor/patch version info
|
||||
uint32_t major = version / 10000;
|
||||
uint32_t minor = (version % 10000) / 100;
|
||||
uint32_t patch = version % 100;
|
||||
|
||||
// generate info string
|
||||
auto info = std::stringstream{};
|
||||
info << id->name << " (priority=" << priority << ") is using rocprofiler-sdk v" << major << "."
|
||||
<< minor << "." << patch << " (" << runtime_version << ")";
|
||||
|
||||
std::clog << info.str() << std::endl;
|
||||
|
||||
// demonstration of alternative way to get the version info
|
||||
{
|
||||
auto version_info = std::array<uint32_t, 3>{};
|
||||
ROCPROFILER_CALL(
|
||||
rocprofiler_get_version(&version_info.at(0), &version_info.at(1), &version_info.at(2)),
|
||||
"failed to get version info");
|
||||
|
||||
if(std::array<uint32_t, 3>{major, minor, patch} != version_info)
|
||||
{
|
||||
throw std::runtime_error{"version info mismatch"};
|
||||
}
|
||||
}
|
||||
|
||||
// data passed around all the callbacks
|
||||
auto* client_tool_data = new std::vector<client::source_location>{};
|
||||
|
||||
// add first entry
|
||||
client_tool_data->emplace_back(
|
||||
client::source_location{__FUNCTION__, __FILE__, __LINE__, info.str()});
|
||||
|
||||
// create configure data
|
||||
static auto cfg =
|
||||
rocprofiler_tool_configure_result_t{sizeof(rocprofiler_tool_configure_result_t),
|
||||
&client::tool_init,
|
||||
&client::tool_fini,
|
||||
static_cast<void*>(client_tool_data)};
|
||||
|
||||
// return pointer to configure data
|
||||
return &cfg;
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#ifdef openmp_target_sample_client_EXPORTS
|
||||
# define CLIENT_API __attribute__((visibility("default")))
|
||||
#else
|
||||
# define CLIENT_API
|
||||
#endif
|
||||
|
||||
namespace client
|
||||
{
|
||||
void
|
||||
setup() CLIENT_API;
|
||||
|
||||
void
|
||||
shutdown() CLIENT_API;
|
||||
|
||||
void
|
||||
start() CLIENT_API;
|
||||
|
||||
void
|
||||
stop() CLIENT_API;
|
||||
} // namespace client
|
||||
@@ -0,0 +1,161 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
// #include "client.hpp"
|
||||
|
||||
#include <rocprofiler-sdk-roctx/roctx.h>
|
||||
|
||||
#include <math.h>
|
||||
#include <stdio.h>
|
||||
|
||||
constexpr float EPS_FLOAT = 1.0e-7f;
|
||||
constexpr double EPS_DOUBLE = 1.0e-15;
|
||||
|
||||
#pragma omp declare target
|
||||
template <typename T>
|
||||
T
|
||||
mul(T a, T b)
|
||||
{
|
||||
T c;
|
||||
c = a * b;
|
||||
return c;
|
||||
}
|
||||
#pragma omp end declare target
|
||||
|
||||
template <typename T>
|
||||
void
|
||||
vmul(T* a, T* b, T* c, int N)
|
||||
{
|
||||
#pragma omp target map(to : a [0:N], b [0:N]) map(from : c [0:N])
|
||||
#pragma omp teams distribute parallel for
|
||||
for(int i = 0; i < N; i++)
|
||||
{
|
||||
c[i] = mul(a[i], b[i]);
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
main()
|
||||
{
|
||||
// client::setup();
|
||||
auto range_id = roctxRangeStart("main");
|
||||
|
||||
constexpr int N = 100000;
|
||||
int a_i[N], b_i[N], c_i[N], validate_i[N];
|
||||
float a_f[N], b_f[N], c_f[N], validate_f[N];
|
||||
double a_d[N], b_d[N], c_d[N], validate_d[N];
|
||||
int N_errors = 0;
|
||||
bool flag = false;
|
||||
|
||||
roctxMark("initialization");
|
||||
|
||||
#pragma omp parallel for
|
||||
for(int i = 0; i < N; ++i)
|
||||
{
|
||||
a_f[i] = a_i[i] = i + 1;
|
||||
b_f[i] = b_i[i] = i + 2;
|
||||
a_d[i] = a_i[i];
|
||||
b_d[i] = b_i[i];
|
||||
validate_i[i] = a_i[i] * b_i[i];
|
||||
validate_f[i] = a_f[i] * b_f[i];
|
||||
validate_d[i] = a_d[i] * b_d[i];
|
||||
}
|
||||
|
||||
vmul(a_i, b_i, c_i, N);
|
||||
vmul(a_f, b_f, c_f, N);
|
||||
|
||||
auto tid = roctx_thread_id_t{};
|
||||
// get the thread id recognized by rocprofiler-sdk from roctx
|
||||
roctxGetThreadId(&tid);
|
||||
// pause API tracing
|
||||
roctxProfilerPause(tid);
|
||||
|
||||
// we don't expect to see the third vmul
|
||||
vmul(a_d, b_d, c_d, N);
|
||||
|
||||
// resume API tracing
|
||||
roctxProfilerResume(tid);
|
||||
|
||||
for(int i = 0; i < N; i++)
|
||||
{
|
||||
if(c_i[i] != validate_i[i])
|
||||
{
|
||||
++N_errors;
|
||||
// print 1st bad index
|
||||
if(!flag)
|
||||
{
|
||||
printf(
|
||||
"First fail: c_i[%d](%d) != validate_i[%d](%d)\n", i, c_i[i], i, validate_i[i]);
|
||||
flag = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
flag = false;
|
||||
for(int i = 0; i < N; i++)
|
||||
{
|
||||
if(fabs(c_f[i] - validate_f[i]) > EPS_FLOAT)
|
||||
{
|
||||
++N_errors;
|
||||
// print 1st bad index
|
||||
if(!flag)
|
||||
{
|
||||
printf("First fail: c_f[%d](%f) != validate_f[%d](%f)\n",
|
||||
i,
|
||||
static_cast<double>(c_f[i]),
|
||||
i,
|
||||
static_cast<double>(validate_f[i]));
|
||||
flag = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
flag = false;
|
||||
for(int i = 0; i < N; i++)
|
||||
{
|
||||
if(fabs(c_d[i] - validate_d[i]) > EPS_DOUBLE)
|
||||
{
|
||||
++N_errors;
|
||||
// print 1st bad index
|
||||
if(!flag)
|
||||
{
|
||||
printf(
|
||||
"First fail: c_d[%d](%f) != validate_d[%d](%f)\n", i, c_d[i], i, validate_d[i]);
|
||||
flag = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
if(N_errors == 0)
|
||||
{
|
||||
printf("Success\n");
|
||||
return 0;
|
||||
}
|
||||
else
|
||||
{
|
||||
printf("Total %d failures\n", N_errors);
|
||||
printf("Fail\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
roctxRangeStop(range_id);
|
||||
|
||||
// client::stop();
|
||||
// client::shutdown();
|
||||
}
|
||||
@@ -27,7 +27,6 @@ set(ROCPROFILER_HEADER_FILES
|
||||
internal_threading.h
|
||||
marker.h
|
||||
ompt.h
|
||||
openmp.h
|
||||
pc_sampling.h
|
||||
profile_config.h
|
||||
registration.h
|
||||
@@ -43,7 +42,7 @@ install(
|
||||
add_subdirectory(hip)
|
||||
add_subdirectory(hsa)
|
||||
add_subdirectory(marker)
|
||||
add_subdirectory(openmp)
|
||||
add_subdirectory(ompt)
|
||||
add_subdirectory(rccl)
|
||||
add_subdirectory(cxx)
|
||||
add_subdirectory(kfd)
|
||||
|
||||
@@ -84,7 +84,7 @@ typedef struct
|
||||
} rocprofiler_buffer_tracing_hip_api_record_t;
|
||||
|
||||
/**
|
||||
* @brief Additional trace data for OpenMP target routines
|
||||
* @brief Additional trace data for OMPT target routines
|
||||
*/
|
||||
|
||||
typedef struct rocprofiler_buffer_tracing_ompt_target_t
|
||||
@@ -98,13 +98,13 @@ typedef struct rocprofiler_buffer_tracing_ompt_target_t
|
||||
|
||||
typedef struct rocprofiler_buffer_tracing_ompt_target_data_op_t
|
||||
{
|
||||
uint64_t host_op_id; // from the host_op_id argument to the OMPT callback
|
||||
int32_t optype; // ompt_target_data_op_t kind of operation
|
||||
int32_t src_device_num; // ompt device number for data source
|
||||
int32_t dest_device_num; // ompt device number for data destination
|
||||
int32_t reserved; // for padding
|
||||
uint64_t bytes; // size in bytes of the operation
|
||||
const void* codeptr_ra; // pointer to the callsite of the target_data_op
|
||||
uint64_t host_op_id; // from the host_op_id argument to the OMPT callback
|
||||
int32_t optype; // ompt_target_data_op_t kind of operation
|
||||
int32_t src_device_num; // ompt device number for data source
|
||||
int32_t dst_device_num; // ompt device number for data destination
|
||||
int32_t reserved; // for padding
|
||||
uint64_t bytes; // size in bytes of the operation
|
||||
const void* codeptr_ra; // pointer to the callsite of the target_data_op
|
||||
} rocprofiler_buffer_tracing_ompt_target_data_op_t;
|
||||
|
||||
typedef struct rocprofiler_buffer_tracing_ompt_target_kernel_t
|
||||
@@ -115,9 +115,9 @@ typedef struct rocprofiler_buffer_tracing_ompt_target_kernel_t
|
||||
} rocprofiler_buffer_tracing_ompt_target_kernel_t;
|
||||
|
||||
/**
|
||||
* @brief ROCProfiler Buffer OPENMP API Tracer Record.
|
||||
* @brief ROCProfiler Buffer OMPT Tracer Record.
|
||||
*/
|
||||
typedef struct rocprofiler_buffer_tracing_ompt_api_record_t
|
||||
typedef struct rocprofiler_buffer_tracing_ompt_record_t
|
||||
{
|
||||
uint64_t size; ///< size of this struct
|
||||
rocprofiler_buffer_tracing_kind_t kind;
|
||||
@@ -129,16 +129,16 @@ typedef struct rocprofiler_buffer_tracing_ompt_api_record_t
|
||||
union
|
||||
{
|
||||
rocprofiler_buffer_tracing_ompt_target_t target;
|
||||
rocprofiler_buffer_tracing_ompt_target_data_op_t target_data;
|
||||
rocprofiler_buffer_tracing_ompt_target_kernel_t kernel;
|
||||
rocprofiler_buffer_tracing_ompt_target_data_op_t target_data_op;
|
||||
rocprofiler_buffer_tracing_ompt_target_kernel_t target_kernel;
|
||||
uint64_t reserved[5];
|
||||
};
|
||||
|
||||
/// @var kind
|
||||
/// @brief ::ROCPROFILER_CALLBACK_TRACING_OPENMP
|
||||
/// @brief ::ROCPROFILER_BUFFER_TRACING_OMPT
|
||||
/// @var operation
|
||||
/// @brief Specification of the API function,::rocprofiler_ompt_operation_t
|
||||
} rocprofiler_buffer_tracing_ompt_api_record_t;
|
||||
/// @brief Specification of the ::rocprofiler_ompt_operation_t
|
||||
} rocprofiler_buffer_tracing_ompt_record_t;
|
||||
|
||||
/**
|
||||
* @brief ROCProfiler Buffer Marker Tracer Record.
|
||||
|
||||
@@ -27,7 +27,7 @@
|
||||
#include <rocprofiler-sdk/hip.h>
|
||||
#include <rocprofiler-sdk/hsa.h>
|
||||
#include <rocprofiler-sdk/marker.h>
|
||||
#include <rocprofiler-sdk/openmp.h>
|
||||
#include <rocprofiler-sdk/ompt.h>
|
||||
#include <rocprofiler-sdk/rccl.h>
|
||||
|
||||
#include <hsa/hsa.h>
|
||||
@@ -80,13 +80,13 @@ typedef struct
|
||||
} rocprofiler_callback_tracing_hip_api_data_t;
|
||||
|
||||
/**
|
||||
* @brief ROCProfiler OPENMP Callback Data
|
||||
* @brief ROCProfiler OMPT Callback Data
|
||||
*/
|
||||
typedef struct
|
||||
{
|
||||
uint64_t size; ///< size of this struct
|
||||
rocprofiler_ompt_api_args_t args;
|
||||
} rocprofiler_callback_tracing_ompt_api_data_t;
|
||||
uint64_t size; ///< size of this struct
|
||||
rocprofiler_ompt_args_t args;
|
||||
} rocprofiler_callback_tracing_ompt_data_t;
|
||||
|
||||
/**
|
||||
* @brief ROCProfiler Marker Tracer Callback Data.
|
||||
|
||||
@@ -78,6 +78,7 @@ ROCPROFILER_DEFINE_CATEGORY(category, hsa_api, "HSA API function")
|
||||
ROCPROFILER_DEFINE_CATEGORY(category, hip_api, "HIP API function")
|
||||
ROCPROFILER_DEFINE_CATEGORY(category, marker_api, "Marker API region")
|
||||
ROCPROFILER_DEFINE_CATEGORY(category, rccl_api, "RCCL API function")
|
||||
ROCPROFILER_DEFINE_CATEGORY(category, openmp, "OpenMP")
|
||||
ROCPROFILER_DEFINE_CATEGORY(category, kernel_dispatch, "GPU kernel dispatch")
|
||||
ROCPROFILER_DEFINE_CATEGORY(category, memory_copy, "Async memory copy")
|
||||
ROCPROFILER_DEFINE_CATEGORY(category, memory_allocation, "Memory Allocation")
|
||||
@@ -87,6 +88,7 @@ ROCPROFILER_DEFINE_CATEGORY(category, memory_allocation, "Memory Allocation")
|
||||
ROCPROFILER_PERFETTO_CATEGORY(category::hip_api), \
|
||||
ROCPROFILER_PERFETTO_CATEGORY(category::marker_api), \
|
||||
ROCPROFILER_PERFETTO_CATEGORY(category::rccl_api), \
|
||||
ROCPROFILER_PERFETTO_CATEGORY(category::openmp), \
|
||||
ROCPROFILER_PERFETTO_CATEGORY(category::kernel_dispatch), \
|
||||
ROCPROFILER_PERFETTO_CATEGORY(category::memory_copy), \
|
||||
ROCPROFILER_PERFETTO_CATEGORY(category::memory_allocation)
|
||||
|
||||
@@ -358,6 +358,14 @@ save(ArchiveT& ar, rocprofiler_callback_tracing_rccl_api_data_t data)
|
||||
ROCP_SDK_SAVE_DATA_FIELD(retval);
|
||||
}
|
||||
|
||||
template <typename ArchiveT>
|
||||
void
|
||||
save(ArchiveT& ar, rocprofiler_callback_tracing_ompt_data_t data)
|
||||
{
|
||||
ROCP_SDK_SAVE_DATA_FIELD(size);
|
||||
// ROCP_SDK_SAVE_DATA_FIELD(args);
|
||||
}
|
||||
|
||||
template <typename ArchiveT>
|
||||
void
|
||||
save(ArchiveT& ar, rocprofiler_dispatch_counting_service_data_t data)
|
||||
@@ -443,6 +451,56 @@ save(ArchiveT& ar, rocprofiler_buffer_tracing_rccl_api_record_t data)
|
||||
save_buffer_tracing_api_record(ar, data);
|
||||
}
|
||||
|
||||
template <typename ArchiveT>
|
||||
void
|
||||
save(ArchiveT& ar, rocprofiler_buffer_tracing_ompt_target_t data)
|
||||
{
|
||||
ROCP_SDK_SAVE_DATA_VALUE("kind", kind);
|
||||
ROCP_SDK_SAVE_DATA_VALUE("device", device_num);
|
||||
ROCP_SDK_SAVE_DATA_VALUE("task_id", task_id);
|
||||
ROCP_SDK_SAVE_DATA_VALUE("target_id", target_id);
|
||||
}
|
||||
|
||||
template <typename ArchiveT>
|
||||
void
|
||||
save(ArchiveT& ar, rocprofiler_buffer_tracing_ompt_target_data_op_t data)
|
||||
{
|
||||
ROCP_SDK_SAVE_DATA_VALUE("host_op_id", host_op_id);
|
||||
ROCP_SDK_SAVE_DATA_VALUE("optype", optype);
|
||||
ROCP_SDK_SAVE_DATA_VALUE("src_device_num", src_device_num);
|
||||
ROCP_SDK_SAVE_DATA_VALUE("dst_device_num", dst_device_num);
|
||||
ROCP_SDK_SAVE_DATA_VALUE("bytes", bytes);
|
||||
}
|
||||
|
||||
template <typename ArchiveT>
|
||||
void
|
||||
save(ArchiveT& ar, rocprofiler_buffer_tracing_ompt_target_kernel_t data)
|
||||
{
|
||||
ROCP_SDK_SAVE_DATA_VALUE("host_op_id", host_op_id);
|
||||
ROCP_SDK_SAVE_DATA_VALUE("device_num", device_num);
|
||||
ROCP_SDK_SAVE_DATA_VALUE("requested_num_teams", requested_num_teams);
|
||||
}
|
||||
|
||||
template <typename ArchiveT>
|
||||
void
|
||||
save(ArchiveT& ar, rocprofiler_buffer_tracing_ompt_record_t data)
|
||||
{
|
||||
save_buffer_tracing_api_record(ar, data);
|
||||
|
||||
if(data.operation == ROCPROFILER_OMPT_ID_target_emi)
|
||||
{
|
||||
ROCP_SDK_SAVE_DATA_FIELD(target);
|
||||
}
|
||||
else if(data.operation == ROCPROFILER_OMPT_ID_target_data_op_emi)
|
||||
{
|
||||
ROCP_SDK_SAVE_DATA_FIELD(target_data_op);
|
||||
}
|
||||
else if(data.operation == ROCPROFILER_OMPT_ID_target_submit_emi)
|
||||
{
|
||||
ROCP_SDK_SAVE_DATA_FIELD(target_kernel);
|
||||
}
|
||||
}
|
||||
|
||||
template <typename ArchiveT>
|
||||
void
|
||||
save(ArchiveT& ar, rocprofiler_buffer_tracing_kernel_dispatch_record_t data)
|
||||
|
||||
@@ -67,7 +67,7 @@ typedef enum // NOLINT(performance-enum-size)
|
||||
ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_KERNEL_DISPATCH, ///<
|
||||
ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_SCRATCH_MEMORY, ///<
|
||||
ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_RCCL_API, ///<
|
||||
ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_OPENMP, ///<
|
||||
ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_OMPT, ///<
|
||||
ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_MEMORY_ALLOCATION, ///<
|
||||
ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_LAST,
|
||||
} rocprofiler_external_correlation_id_request_kind_t;
|
||||
|
||||
@@ -170,7 +170,7 @@ typedef enum // NOLINT(performance-enum-size)
|
||||
ROCPROFILER_CALLBACK_TRACING_KERNEL_DISPATCH, ///< Callbacks for kernel dispatches
|
||||
ROCPROFILER_CALLBACK_TRACING_MEMORY_COPY, ///< @see ::rocprofiler_memory_copy_operation_t
|
||||
ROCPROFILER_CALLBACK_TRACING_RCCL_API, ///< @RCCL tracing
|
||||
ROCPROFILER_CALLBACK_TRACING_OPENMP, ///< @see ::rocprofiler_ompt_operation_t
|
||||
ROCPROFILER_CALLBACK_TRACING_OMPT, ///< @see ::rocprofiler_ompt_operation_t
|
||||
ROCPROFILER_CALLBACK_TRACING_MEMORY_ALLOCATION, ///< @see
|
||||
///< ::rocprofiler_memory_allocation_operation_t
|
||||
ROCPROFILER_CALLBACK_TRACING_RUNTIME_INITIALIZATION, ///< Callback notifying that a runtime
|
||||
@@ -200,7 +200,7 @@ typedef enum // NOLINT(performance-enum-size)
|
||||
ROCPROFILER_BUFFER_TRACING_SCRATCH_MEMORY, ///< Buffer scratch memory reclaimation info
|
||||
ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT, ///< Correlation ID in no longer in use
|
||||
ROCPROFILER_BUFFER_TRACING_RCCL_API, ///< RCCL tracing
|
||||
ROCPROFILER_BUFFER_TRACING_OPENMP, ///< @see ::rocprofiler_ompt_operation_t
|
||||
ROCPROFILER_BUFFER_TRACING_OMPT, ///< @see ::rocprofiler_ompt_operation_t
|
||||
ROCPROFILER_BUFFER_TRACING_MEMORY_ALLOCATION, ///< @see
|
||||
///< ::rocprofiler_memory_allocation_operation_t
|
||||
ROCPROFILER_BUFFER_TRACING_RUNTIME_INITIALIZATION, ///< Record indicating a runtime library has
|
||||
|
||||
@@ -24,13 +24,14 @@
|
||||
|
||||
#include <rocprofiler-sdk/defines.h>
|
||||
#include <rocprofiler-sdk/fwd.h>
|
||||
#include <rocprofiler-sdk/openmp/omp-tools.h>
|
||||
#include <rocprofiler-sdk/registration.h>
|
||||
#include <rocprofiler-sdk/ompt/api_args.h>
|
||||
#include <rocprofiler-sdk/ompt/api_id.h>
|
||||
#include <rocprofiler-sdk/ompt/omp-tools.h>
|
||||
|
||||
/**
|
||||
* @defgroup OMPT_REGISTRATION Tool registration for OpenMP Tools
|
||||
*
|
||||
* Functions for enabling OpenMP support in tools which provide their own ompt_start_tool symbol but
|
||||
* Functions for enabling OMPT support in tools which provide their own ompt_start_tool symbol but
|
||||
* want to defer to rocprofiler-sdk for OMPT.
|
||||
*
|
||||
* @{
|
||||
@@ -38,9 +39,6 @@
|
||||
|
||||
ROCPROFILER_EXTERN_C_INIT
|
||||
|
||||
void
|
||||
rocprofiler_ompt_finalize_tool() ROCPROFILER_API;
|
||||
|
||||
rocprofiler_status_t
|
||||
rocprofiler_ompt_is_initialized(int* status) ROCPROFILER_API ROCPROFILER_NONNULL(1);
|
||||
|
||||
|
||||
@@ -0,0 +1,11 @@
|
||||
#
|
||||
#
|
||||
# Installation of public OMPT headers
|
||||
#
|
||||
#
|
||||
set(ROCPROFILER_OMPT_HEADER_FILES api_args.h api_id.h omp-tools.h)
|
||||
|
||||
install(
|
||||
FILES ${ROCPROFILER_OMPT_HEADER_FILES}
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/rocprofiler-sdk/ompt
|
||||
COMPONENT development)
|
||||
+20
-20
@@ -25,7 +25,7 @@
|
||||
#include <rocprofiler-sdk/defines.h>
|
||||
#include <rocprofiler-sdk/version.h>
|
||||
|
||||
#include <rocprofiler-sdk/openmp/omp-tools.h>
|
||||
#include <rocprofiler-sdk/ompt/omp-tools.h>
|
||||
|
||||
#include <stdint.h>
|
||||
|
||||
@@ -60,7 +60,7 @@ typedef struct rocprofiler_ompt_no_args
|
||||
char empty;
|
||||
} rocprofiler_ompt_no_args;
|
||||
|
||||
typedef union rocprofiler_ompt_api_args_t
|
||||
typedef union rocprofiler_ompt_args_t
|
||||
{
|
||||
// The ompt_data_t* values passed to the client tool are proxies.
|
||||
// This allows the client tool to use them as it would in their own
|
||||
@@ -150,11 +150,11 @@ typedef union rocprofiler_ompt_api_args_t
|
||||
uint64_t module_id;
|
||||
} device_load;
|
||||
|
||||
struct
|
||||
{
|
||||
int device_num;
|
||||
uint64_t module_id;
|
||||
} device_unload;
|
||||
// struct
|
||||
// {
|
||||
// int device_num;
|
||||
// uint64_t module_id;
|
||||
// } device_unload;
|
||||
|
||||
struct
|
||||
{
|
||||
@@ -299,10 +299,10 @@ typedef union rocprofiler_ompt_api_args_t
|
||||
ompt_data_t* target_data;
|
||||
ompt_data_t* host_op_id;
|
||||
ompt_target_data_op_t optype;
|
||||
void* src_addr;
|
||||
void* src_address;
|
||||
int src_device_num;
|
||||
void* dest_addr;
|
||||
int dest_device_num;
|
||||
void* dst_address;
|
||||
int dst_device_num;
|
||||
size_t bytes;
|
||||
const void* codeptr_ra;
|
||||
} target_data_op_emi;
|
||||
@@ -315,15 +315,15 @@ typedef union rocprofiler_ompt_api_args_t
|
||||
unsigned int requested_num_teams;
|
||||
} target_submit_emi;
|
||||
|
||||
struct
|
||||
{
|
||||
unsigned int nitems;
|
||||
void** host_addr;
|
||||
void** device_addr;
|
||||
size_t* bytes;
|
||||
unsigned int* mapping_flags;
|
||||
const void* codeptr_ra;
|
||||
} target_map_emi;
|
||||
// struct
|
||||
// {
|
||||
// unsigned int nitems;
|
||||
// void** host_addr;
|
||||
// void** device_addr;
|
||||
// size_t* bytes;
|
||||
// unsigned int* mapping_flags;
|
||||
// const void* codeptr_ra;
|
||||
// } target_map_emi;
|
||||
|
||||
struct
|
||||
{
|
||||
@@ -335,6 +335,6 @@ typedef union rocprofiler_ompt_api_args_t
|
||||
|
||||
rocprofiler_ompt_callback_functions_t callback_functions;
|
||||
|
||||
} rocprofiler_ompt_api_args_t;
|
||||
} rocprofiler_ompt_args_t;
|
||||
|
||||
ROCPROFILER_EXTERN_C_FINI
|
||||
@@ -0,0 +1,67 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
/**
|
||||
* @brief ROCProfiler enumeration of OMPT (OpenMP tools) tracing operations
|
||||
* NOTE: These are callbacks into the ROCProfiler SDK from the vendor-provided OMPT implementation
|
||||
*/
|
||||
typedef enum // NOLINT(performance-enum-size)
|
||||
{
|
||||
ROCPROFILER_OMPT_ID_NONE = -1,
|
||||
ROCPROFILER_OMPT_ID_thread_begin = 0,
|
||||
ROCPROFILER_OMPT_ID_thread_end,
|
||||
ROCPROFILER_OMPT_ID_parallel_begin,
|
||||
ROCPROFILER_OMPT_ID_parallel_end,
|
||||
ROCPROFILER_OMPT_ID_task_create,
|
||||
ROCPROFILER_OMPT_ID_task_schedule,
|
||||
ROCPROFILER_OMPT_ID_implicit_task,
|
||||
ROCPROFILER_OMPT_ID_device_initialize,
|
||||
ROCPROFILER_OMPT_ID_device_finalize,
|
||||
ROCPROFILER_OMPT_ID_device_load,
|
||||
// ROCPROFILER_OMPT_ID_device_unload,
|
||||
ROCPROFILER_OMPT_ID_sync_region_wait,
|
||||
ROCPROFILER_OMPT_ID_mutex_released,
|
||||
ROCPROFILER_OMPT_ID_dependences,
|
||||
ROCPROFILER_OMPT_ID_task_dependence,
|
||||
ROCPROFILER_OMPT_ID_work,
|
||||
ROCPROFILER_OMPT_ID_masked,
|
||||
ROCPROFILER_OMPT_ID_sync_region,
|
||||
ROCPROFILER_OMPT_ID_lock_init,
|
||||
ROCPROFILER_OMPT_ID_lock_destroy,
|
||||
ROCPROFILER_OMPT_ID_mutex_acquire,
|
||||
ROCPROFILER_OMPT_ID_mutex_acquired,
|
||||
ROCPROFILER_OMPT_ID_nest_lock,
|
||||
ROCPROFILER_OMPT_ID_flush,
|
||||
ROCPROFILER_OMPT_ID_cancel,
|
||||
ROCPROFILER_OMPT_ID_reduction,
|
||||
ROCPROFILER_OMPT_ID_dispatch,
|
||||
ROCPROFILER_OMPT_ID_target_emi,
|
||||
ROCPROFILER_OMPT_ID_target_data_op_emi,
|
||||
ROCPROFILER_OMPT_ID_target_submit_emi,
|
||||
// ROCPROFILER_OMPT_ID_target_map_emi,
|
||||
ROCPROFILER_OMPT_ID_error,
|
||||
ROCPROFILER_OMPT_ID_callback_functions, // fake to return struct of ompt callback function
|
||||
// pointers
|
||||
ROCPROFILER_OMPT_ID_LAST
|
||||
} rocprofiler_ompt_operation_t;
|
||||
@@ -1,11 +0,0 @@
|
||||
#
|
||||
#
|
||||
# Installation of public OpenMP headers
|
||||
#
|
||||
#
|
||||
set(ROCPROFILER_OPENMP_HEADER_FILES api_args.h api_id.h omp-tools.h)
|
||||
|
||||
install(
|
||||
FILES ${ROCPROFILER_OPENMP_HEADER_FILES}
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/rocprofiler-sdk/openmp
|
||||
COMPONENT development)
|
||||
@@ -1,67 +0,0 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
/**
|
||||
* @brief ROCProfiler enumeration of OPENMP (OpenMP tools) tracing operations
|
||||
* NOTE: These are callbacks into the ROCProfiler SDK from the vendor-provided OMPT implementation
|
||||
*/
|
||||
typedef enum // NOLINT(performance-enum-size)
|
||||
{
|
||||
ROCPROFILER_OPENMP_ID_NONE = -1,
|
||||
ROCPROFILER_OPENMP_ID_thread_begin = 0,
|
||||
ROCPROFILER_OPENMP_ID_thread_end,
|
||||
ROCPROFILER_OPENMP_ID_parallel_begin,
|
||||
ROCPROFILER_OPENMP_ID_parallel_end,
|
||||
ROCPROFILER_OPENMP_ID_task_create,
|
||||
ROCPROFILER_OPENMP_ID_task_schedule,
|
||||
ROCPROFILER_OPENMP_ID_implicit_task,
|
||||
ROCPROFILER_OPENMP_ID_device_initialize,
|
||||
ROCPROFILER_OPENMP_ID_device_finalize,
|
||||
ROCPROFILER_OPENMP_ID_device_load,
|
||||
ROCPROFILER_OPENMP_ID_device_unload,
|
||||
ROCPROFILER_OPENMP_ID_sync_region_wait,
|
||||
ROCPROFILER_OPENMP_ID_mutex_released,
|
||||
ROCPROFILER_OPENMP_ID_dependences,
|
||||
ROCPROFILER_OPENMP_ID_task_dependence,
|
||||
ROCPROFILER_OPENMP_ID_work,
|
||||
ROCPROFILER_OPENMP_ID_masked,
|
||||
ROCPROFILER_OPENMP_ID_sync_region,
|
||||
ROCPROFILER_OPENMP_ID_lock_init,
|
||||
ROCPROFILER_OPENMP_ID_lock_destroy,
|
||||
ROCPROFILER_OPENMP_ID_mutex_acquire,
|
||||
ROCPROFILER_OPENMP_ID_mutex_acquired,
|
||||
ROCPROFILER_OPENMP_ID_nest_lock,
|
||||
ROCPROFILER_OPENMP_ID_flush,
|
||||
ROCPROFILER_OPENMP_ID_cancel,
|
||||
ROCPROFILER_OPENMP_ID_reduction,
|
||||
ROCPROFILER_OPENMP_ID_dispatch,
|
||||
ROCPROFILER_OPENMP_ID_target_emi,
|
||||
ROCPROFILER_OPENMP_ID_target_data_op_emi,
|
||||
ROCPROFILER_OPENMP_ID_target_submit_emi,
|
||||
ROCPROFILER_OPENMP_ID_target_map_emi,
|
||||
ROCPROFILER_OPENMP_ID_error,
|
||||
ROCPROFILER_OPENMP_ID_callback_functions, // fake to return struct of ompt callback function
|
||||
// pointers
|
||||
ROCPROFILER_OPENMP_ID_LAST
|
||||
} rocprofiler_ompt_operation_t;
|
||||
@@ -5,7 +5,7 @@ rocprofiler_activate_clang_tidy()
|
||||
|
||||
set(ROCPROFILER_LIB_HEADERS
|
||||
agent.hpp buffer.hpp external_correlation.hpp intercept_table.hpp
|
||||
internal_threading.hpp registration.hpp runtime_initialization.hpp)
|
||||
internal_threading.hpp ompt.hpp registration.hpp runtime_initialization.hpp)
|
||||
set(ROCPROFILER_LIB_SOURCES
|
||||
agent.cpp
|
||||
buffer.cpp
|
||||
@@ -18,10 +18,11 @@ set(ROCPROFILER_LIB_SOURCES
|
||||
external_correlation.cpp
|
||||
intercept_table.cpp
|
||||
internal_threading.cpp
|
||||
ompt.cpp
|
||||
pc_sampling.cpp
|
||||
profile_config.cpp
|
||||
registration.cpp
|
||||
rocprofiler.cpp
|
||||
registration.cpp
|
||||
runtime_initialization.cpp)
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
@@ -52,6 +53,7 @@ add_subdirectory(kernel_dispatch)
|
||||
add_subdirectory(page_migration)
|
||||
add_subdirectory(rccl)
|
||||
add_subdirectory(details)
|
||||
add_subdirectory(ompt)
|
||||
|
||||
target_link_libraries(
|
||||
rocprofiler-sdk-object-library
|
||||
|
||||
@@ -30,6 +30,7 @@
|
||||
#include "lib/rocprofiler-sdk/hsa/scratch_memory.hpp"
|
||||
#include "lib/rocprofiler-sdk/kernel_dispatch/kernel_dispatch.hpp"
|
||||
#include "lib/rocprofiler-sdk/marker/marker.hpp"
|
||||
#include "lib/rocprofiler-sdk/ompt/ompt.hpp"
|
||||
#include "lib/rocprofiler-sdk/page_migration/page_migration.hpp"
|
||||
#include "lib/rocprofiler-sdk/rccl/rccl.hpp"
|
||||
#include "lib/rocprofiler-sdk/registration.hpp"
|
||||
@@ -88,7 +89,7 @@ ROCPROFILER_BUFFER_TRACING_KIND_STRING(PAGE_MIGRATION)
|
||||
ROCPROFILER_BUFFER_TRACING_KIND_STRING(SCRATCH_MEMORY)
|
||||
ROCPROFILER_BUFFER_TRACING_KIND_STRING(CORRELATION_ID_RETIREMENT)
|
||||
ROCPROFILER_BUFFER_TRACING_KIND_STRING(RCCL_API)
|
||||
ROCPROFILER_BUFFER_TRACING_KIND_STRING(OPENMP)
|
||||
ROCPROFILER_BUFFER_TRACING_KIND_STRING(OMPT)
|
||||
ROCPROFILER_BUFFER_TRACING_KIND_STRING(RUNTIME_INITIALIZATION)
|
||||
|
||||
template <size_t Idx, size_t... Tail>
|
||||
@@ -273,6 +274,11 @@ rocprofiler_query_buffer_tracing_kind_operation_name(rocprofiler_buffer_tracing_
|
||||
val = rocprofiler::page_migration::name_by_id(operation);
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_BUFFER_TRACING_OMPT:
|
||||
{
|
||||
val = rocprofiler::ompt::name_by_id(operation);
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_BUFFER_TRACING_RUNTIME_INITIALIZATION:
|
||||
{
|
||||
val = rocprofiler::runtime_init::name_by_id(operation);
|
||||
@@ -282,10 +288,6 @@ rocprofiler_query_buffer_tracing_kind_operation_name(rocprofiler_buffer_tracing_
|
||||
{
|
||||
return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED;
|
||||
}
|
||||
case ROCPROFILER_BUFFER_TRACING_OPENMP:
|
||||
{
|
||||
return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED;
|
||||
}
|
||||
};
|
||||
|
||||
if(!val)
|
||||
@@ -403,6 +405,11 @@ rocprofiler_iterate_buffer_tracing_kind_operations(
|
||||
ops = rocprofiler::page_migration::get_ids();
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_BUFFER_TRACING_OMPT:
|
||||
{
|
||||
ops = rocprofiler::ompt::get_ids();
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_BUFFER_TRACING_RUNTIME_INITIALIZATION:
|
||||
{
|
||||
ops = rocprofiler::runtime_init::get_ids();
|
||||
@@ -412,10 +419,6 @@ rocprofiler_iterate_buffer_tracing_kind_operations(
|
||||
{
|
||||
return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED;
|
||||
}
|
||||
case ROCPROFILER_BUFFER_TRACING_OPENMP:
|
||||
{
|
||||
return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED;
|
||||
}
|
||||
}
|
||||
|
||||
for(const auto& itr : ops)
|
||||
|
||||
@@ -30,6 +30,7 @@
|
||||
#include "lib/rocprofiler-sdk/hsa/scratch_memory.hpp"
|
||||
#include "lib/rocprofiler-sdk/kernel_dispatch/kernel_dispatch.hpp"
|
||||
#include "lib/rocprofiler-sdk/marker/marker.hpp"
|
||||
#include "lib/rocprofiler-sdk/ompt/ompt.hpp"
|
||||
#include "lib/rocprofiler-sdk/rccl/rccl.hpp"
|
||||
#include "lib/rocprofiler-sdk/registration.hpp"
|
||||
#include "lib/rocprofiler-sdk/runtime_initialization.hpp"
|
||||
@@ -85,7 +86,7 @@ ROCPROFILER_CALLBACK_TRACING_KIND_STRING(KERNEL_DISPATCH)
|
||||
ROCPROFILER_CALLBACK_TRACING_KIND_STRING(MEMORY_COPY)
|
||||
ROCPROFILER_CALLBACK_TRACING_KIND_STRING(MEMORY_ALLOCATION)
|
||||
ROCPROFILER_CALLBACK_TRACING_KIND_STRING(RCCL_API)
|
||||
ROCPROFILER_CALLBACK_TRACING_KIND_STRING(OPENMP)
|
||||
ROCPROFILER_CALLBACK_TRACING_KIND_STRING(OMPT)
|
||||
ROCPROFILER_CALLBACK_TRACING_KIND_STRING(RUNTIME_INITIALIZATION)
|
||||
|
||||
template <size_t Idx, size_t... Tail>
|
||||
@@ -255,9 +256,9 @@ rocprofiler_query_callback_tracing_kind_operation_name(rocprofiler_callback_trac
|
||||
val = rocprofiler::hsa::async_copy::name_by_id(operation);
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_CALLBACK_TRACING_OPENMP:
|
||||
case ROCPROFILER_CALLBACK_TRACING_OMPT:
|
||||
{
|
||||
return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED;
|
||||
val = rocprofiler::ompt::name_by_id(operation);
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_CALLBACK_TRACING_MEMORY_ALLOCATION:
|
||||
@@ -383,9 +384,9 @@ rocprofiler_iterate_callback_tracing_kind_operations(
|
||||
ops = rocprofiler::hsa::async_copy::get_ids();
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_CALLBACK_TRACING_OPENMP:
|
||||
case ROCPROFILER_CALLBACK_TRACING_OMPT:
|
||||
{
|
||||
return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED;
|
||||
ops = rocprofiler::ompt::get_ids();
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_CALLBACK_TRACING_MEMORY_ALLOCATION:
|
||||
@@ -522,13 +523,22 @@ rocprofiler_iterate_callback_tracing_kind_operation_args(
|
||||
user_data);
|
||||
return ROCPROFILER_STATUS_SUCCESS;
|
||||
}
|
||||
case ROCPROFILER_CALLBACK_TRACING_OMPT:
|
||||
{
|
||||
rocprofiler::ompt::iterate_args(
|
||||
record.operation,
|
||||
*static_cast<rocprofiler_callback_tracing_ompt_data_t*>(record.payload),
|
||||
callback,
|
||||
max_deref,
|
||||
user_data);
|
||||
return ROCPROFILER_STATUS_SUCCESS;
|
||||
}
|
||||
case ROCPROFILER_CALLBACK_TRACING_SCRATCH_MEMORY:
|
||||
case ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT:
|
||||
case ROCPROFILER_CALLBACK_TRACING_KERNEL_DISPATCH:
|
||||
case ROCPROFILER_CALLBACK_TRACING_MEMORY_COPY:
|
||||
case ROCPROFILER_CALLBACK_TRACING_MEMORY_ALLOCATION:
|
||||
case ROCPROFILER_CALLBACK_TRACING_RCCL_API:
|
||||
case ROCPROFILER_CALLBACK_TRACING_OPENMP:
|
||||
case ROCPROFILER_CALLBACK_TRACING_RUNTIME_INITIALIZATION:
|
||||
{
|
||||
return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED;
|
||||
|
||||
@@ -66,7 +66,7 @@ struct code_object_info;
|
||||
struct code_object_info<ROCPROFILER_CODE_OBJECT_##OPERATION> \
|
||||
{ \
|
||||
static constexpr auto operation_idx = ROCPROFILER_CODE_OBJECT_##OPERATION; \
|
||||
static constexpr auto name = #OPERATION; \
|
||||
static constexpr auto name = "CODE_OBJECT_" #OPERATION; \
|
||||
};
|
||||
|
||||
SPECIALIZE_CODE_OBJECT_INFO(NONE)
|
||||
|
||||
@@ -152,20 +152,53 @@ pop_latest_correlation_id(correlation_id* val)
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
if(get_latest_correlation_id_impl().empty())
|
||||
auto& stack = get_latest_correlation_id_impl();
|
||||
if(stack.empty())
|
||||
{
|
||||
ROCP_ERROR << "empty thread-local correlation id stack";
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
ROCP_ERROR_IF(get_latest_correlation_id_impl().back() != val)
|
||||
<< "pop_latest_correlation_id is happening out of order for " << val->internal
|
||||
<< ". top of stack is " << get_latest_correlation_id_impl().back()->internal;
|
||||
if(stack.back() != val)
|
||||
{
|
||||
ROCP_ERROR << "pop_latest_correlation_id is happening out of order for " << val->internal
|
||||
<< ". top of stack is " << stack.back()->internal;
|
||||
}
|
||||
|
||||
get_latest_correlation_id_impl().pop_back();
|
||||
stack.pop_back();
|
||||
|
||||
return (get_latest_correlation_id_impl().empty()) ? nullptr
|
||||
: get_latest_correlation_id_impl().back();
|
||||
return (stack.empty()) ? nullptr : stack.back();
|
||||
}
|
||||
|
||||
correlation_id*
|
||||
push_correlation_id(correlation_id* val)
|
||||
{
|
||||
if(!val)
|
||||
{
|
||||
ROCP_ERROR << "passed nullptr to correlation id";
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
val->thread_idx = common::get_tid();
|
||||
get_latest_correlation_id_impl().emplace_back(val);
|
||||
|
||||
return val;
|
||||
}
|
||||
|
||||
void
|
||||
dump_correlation_stack(const char* s)
|
||||
{
|
||||
auto& stack = get_latest_correlation_id_impl();
|
||||
auto info = std::stringstream{};
|
||||
info << s << ": tid: " << common::get_tid() << " :";
|
||||
for(const auto* itr : stack)
|
||||
{
|
||||
info << " " << itr->internal;
|
||||
;
|
||||
}
|
||||
info << "\n";
|
||||
printf("%s", info.str().c_str());
|
||||
}
|
||||
|
||||
} // namespace context
|
||||
} // namespace rocprofiler
|
||||
|
||||
@@ -86,6 +86,14 @@ get_latest_correlation_id();
|
||||
const correlation_id*
|
||||
pop_latest_correlation_id(correlation_id*);
|
||||
|
||||
// push correlation id
|
||||
correlation_id*
|
||||
push_correlation_id(correlation_id*);
|
||||
|
||||
// dump the cid stack for debugging
|
||||
void
|
||||
dump_correlation_stack(const char*);
|
||||
|
||||
/// permits tools opportunity to modify the correlation id based on the domain, op, and
|
||||
/// the rocprofiler generated correlation id
|
||||
struct correlation_tracing_service
|
||||
|
||||
@@ -0,0 +1,251 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#include "lib/rocprofiler-sdk/ompt/ompt.hpp"
|
||||
#include "lib/common/logging.hpp"
|
||||
#include "lib/common/static_object.hpp"
|
||||
#include "lib/rocprofiler-sdk/ompt/details/format.hpp" // NOLINT(unused-includes)
|
||||
#include "lib/rocprofiler-sdk/registration.hpp"
|
||||
|
||||
#include <rocprofiler-sdk/fwd.h>
|
||||
#include <rocprofiler-sdk/ompt.h>
|
||||
#include <rocprofiler-sdk/ompt/api_args.h>
|
||||
#include <rocprofiler-sdk/ompt/omp-tools.h>
|
||||
|
||||
#include <fmt/core.h>
|
||||
#include <cstdint>
|
||||
#include <iostream>
|
||||
|
||||
namespace rocprofiler
|
||||
{
|
||||
namespace ompt
|
||||
{
|
||||
namespace
|
||||
{
|
||||
ompt_start_tool_result_t*
|
||||
get_start_tool_result()
|
||||
{
|
||||
static auto*& obj = common::static_object<ompt_start_tool_result_t>::construct();
|
||||
return obj;
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
auto init_status = std::atomic<int>{0};
|
||||
auto fini_status = std::atomic<int>{0};
|
||||
ompt_finalize_tool_t tool_finalize = nullptr;
|
||||
ompt_set_callback_t set_callback = nullptr;
|
||||
rocprofiler_ompt_callback_functions_t ompt_callback_table = {};
|
||||
ompt_get_thread_data_t real_get_thread_data = nullptr;
|
||||
ompt_get_parallel_info_t real_get_parallel_info = nullptr;
|
||||
ompt_get_task_info_t real_get_task_info = nullptr;
|
||||
ompt_get_target_info_t real_get_target_info = nullptr;
|
||||
} // namespace
|
||||
|
||||
void
|
||||
set_ompt_callbacks()
|
||||
{
|
||||
// set all the ompt callbacks that might be used
|
||||
|
||||
auto cb = [](const char* cbname, ompt_callback_t* cbf, int cbnum) {
|
||||
ompt_set_result_t result = set_callback(static_cast<ompt_callbacks_t>(cbnum), *cbf);
|
||||
ROCP_WARNING_IF(result != ompt_set_always)
|
||||
<< "rocprofiler-sdk OpenMP tools set_callback returned " << result
|
||||
<< fmt::format(" (set result = {})", result) << " for " << cbname << " (id=" << cbnum
|
||||
<< ")";
|
||||
};
|
||||
ompt::update_table(cb);
|
||||
ompt::update_callback(ompt_callback_table);
|
||||
}
|
||||
|
||||
// proxies for some entry points that use ompt_data_t *
|
||||
ompt_data_t*
|
||||
proxy_get_thread_data()
|
||||
{
|
||||
ompt_data_t* real = real_get_thread_data();
|
||||
auto* ret = ompt::proxy_data_ptr(real);
|
||||
return ret;
|
||||
}
|
||||
|
||||
int
|
||||
proxy_get_parallel_info(int ancestor_level, ompt_data_t** parallel_data, int* team_size)
|
||||
{
|
||||
ompt_data_t* tdata;
|
||||
int tteam_size;
|
||||
int ret = real_get_parallel_info(ancestor_level, &tdata, &tteam_size);
|
||||
if(ret != 2) return ret;
|
||||
if(team_size != nullptr) *team_size = tteam_size;
|
||||
if(parallel_data != nullptr) *parallel_data = ompt::proxy_data_ptr(tdata);
|
||||
return ret;
|
||||
}
|
||||
|
||||
int
|
||||
proxy_get_task_info(int ancestor_level,
|
||||
int* flags,
|
||||
ompt_data_t** task_data,
|
||||
ompt_frame_t** task_frame,
|
||||
ompt_data_t** parallel_data,
|
||||
int* thread_num)
|
||||
{
|
||||
int tflags, tthread_num;
|
||||
ompt_data_t * ttask_data, *tparallel_data;
|
||||
ompt_frame_t* ttask_frame;
|
||||
int ret = real_get_task_info(
|
||||
ancestor_level, &tflags, &ttask_data, &ttask_frame, &tparallel_data, &tthread_num);
|
||||
if(ret != 2) return ret;
|
||||
if(flags != nullptr) *flags = tflags;
|
||||
if(task_data != nullptr) *task_data = ompt::proxy_data_ptr(ttask_data);
|
||||
if(task_frame != nullptr) *task_frame = ttask_frame;
|
||||
if(parallel_data != nullptr) *parallel_data = ompt::proxy_data_ptr(tparallel_data);
|
||||
if(thread_num != nullptr) *thread_num = tthread_num;
|
||||
return ret;
|
||||
}
|
||||
|
||||
int
|
||||
proxy_get_target_info(uint64_t* device_num, ompt_id_t* target_id, ompt_id_t* host_op_id)
|
||||
{
|
||||
uint64_t tdevice_num;
|
||||
ompt_id_t ttarget_id, thost_op_id;
|
||||
int ret = real_get_target_info(&tdevice_num, &ttarget_id, &thost_op_id);
|
||||
if(ret != 1) return ret;
|
||||
if(device_num != nullptr) *device_num = tdevice_num;
|
||||
if(target_id != nullptr) *target_id = ttarget_id;
|
||||
if(host_op_id != nullptr) *host_op_id = thost_op_id;
|
||||
return ret;
|
||||
}
|
||||
|
||||
#define SETCB(name) \
|
||||
ompt_callback_table.ompt_##name = reinterpret_cast<ompt_##name##_t>(lookup("ompt_" #name)); \
|
||||
LOG_IF(FATAL, ompt_callback_table.ompt_##name == nullptr) \
|
||||
<< "rocprofiler-sdk OMPT cannot find " \
|
||||
"ompt_" #name
|
||||
|
||||
#define PROXYCB(name) \
|
||||
SETCB(name); \
|
||||
real_##name = ompt_callback_table.ompt_##name; \
|
||||
ompt_callback_table.ompt_##name = proxy_##name
|
||||
|
||||
int
|
||||
initialize(ompt_function_lookup_t lookup, int /*initial_device_num*/, ompt_data_t* /*tool_data*/)
|
||||
{
|
||||
init_status.store(-1);
|
||||
|
||||
tool_finalize = static_cast<ompt_finalize_tool_t>(lookup("ompt_finalize_tool"));
|
||||
LOG_IF(FATAL, tool_finalize == nullptr)
|
||||
<< "rocprofiler-sdk OMPT cannot find ompt_finalize_tool";
|
||||
|
||||
set_callback = reinterpret_cast<ompt_set_callback_t>(lookup("ompt_set_callback"));
|
||||
LOG_IF(FATAL, set_callback == nullptr) << "rocprofiler-sdk OMPT cannot find ompt_set_callback";
|
||||
|
||||
SETCB(enumerate_states);
|
||||
SETCB(enumerate_mutex_impls);
|
||||
PROXYCB(get_thread_data); // <- need proxy
|
||||
SETCB(get_num_places);
|
||||
SETCB(get_place_proc_ids);
|
||||
SETCB(get_place_num);
|
||||
SETCB(get_partition_place_nums);
|
||||
SETCB(get_proc_id);
|
||||
SETCB(get_state);
|
||||
PROXYCB(get_parallel_info); // <= need proxy
|
||||
PROXYCB(get_task_info); // <= need proxy
|
||||
SETCB(get_task_memory);
|
||||
SETCB(get_num_devices);
|
||||
SETCB(get_num_procs);
|
||||
PROXYCB(get_target_info); // <= need proxy
|
||||
SETCB(get_unique_id);
|
||||
|
||||
set_ompt_callbacks();
|
||||
init_status.store(1);
|
||||
|
||||
return 1; // bizarre abberation in the OMPT spec, not 0
|
||||
}
|
||||
#undef SETCB
|
||||
|
||||
void
|
||||
finalize(ompt_data_t* /*tool_data*/)
|
||||
{
|
||||
fini_status.store(-1);
|
||||
|
||||
// do whatever for finalization
|
||||
|
||||
fini_status.store(1);
|
||||
}
|
||||
} // namespace
|
||||
|
||||
void
|
||||
finalize_ompt()
|
||||
{
|
||||
if(rocprofiler::ompt::tool_finalize) rocprofiler::ompt::tool_finalize();
|
||||
}
|
||||
} // namespace ompt
|
||||
} // namespace rocprofiler
|
||||
|
||||
extern "C" {
|
||||
|
||||
rocprofiler_status_t
|
||||
rocprofiler_ompt_is_initialized(int* status)
|
||||
{
|
||||
*status = ::rocprofiler::ompt::init_status.load();
|
||||
return ROCPROFILER_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
rocprofiler_status_t
|
||||
rocprofiler_ompt_is_finalized(int* status)
|
||||
{
|
||||
*status = ::rocprofiler::ompt::fini_status.load();
|
||||
return ROCPROFILER_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
ompt_start_tool_result_t*
|
||||
rocprofiler_ompt_start_tool(unsigned int /*omp_version*/, const char* /*runtime_version*/)
|
||||
{
|
||||
// log to clog since logging probably won't be initialized here
|
||||
auto _init_status = ::rocprofiler::ompt::init_status.load();
|
||||
if(_init_status != 0)
|
||||
{
|
||||
std::clog << "ERROR: rocprofiler-sdk OMPT backend has already been initialized: "
|
||||
<< _init_status << '\n';
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
// don't check contexts here, client tool may not be initialized
|
||||
auto* _result = ::rocprofiler::ompt::get_start_tool_result();
|
||||
|
||||
if(_result)
|
||||
{
|
||||
_result->initialize = ::rocprofiler::ompt::initialize;
|
||||
_result->finalize = ::rocprofiler::ompt::finalize;
|
||||
}
|
||||
|
||||
return _result;
|
||||
}
|
||||
|
||||
ompt_start_tool_result_t*
|
||||
ompt_start_tool(unsigned int omp_version, const char* runtime_version) ROCPROFILER_PUBLIC_API;
|
||||
|
||||
ompt_start_tool_result_t*
|
||||
ompt_start_tool(unsigned int omp_version, const char* runtime_version)
|
||||
{
|
||||
::rocprofiler::registration::initialize();
|
||||
return rocprofiler_ompt_start_tool(omp_version, runtime_version);
|
||||
}
|
||||
}
|
||||
+8
-2
@@ -22,5 +22,11 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <rocprofiler-sdk/openmp/api_args.h>
|
||||
#include <rocprofiler-sdk/openmp/api_id.h>
|
||||
namespace rocprofiler
|
||||
{
|
||||
namespace ompt
|
||||
{
|
||||
void
|
||||
finalize_ompt();
|
||||
} // namespace ompt
|
||||
} // namespace rocprofiler
|
||||
@@ -0,0 +1,8 @@
|
||||
#
|
||||
#
|
||||
set(ROCPROFILER_LIB_OMPT_SOURCES ompt.cpp)
|
||||
set(ROCPROFILER_LIB_OMPT_HEADERS defines.hpp ompt.hpp utils.hpp)
|
||||
|
||||
target_sources(rocprofiler-sdk-object-library PRIVATE ${ROCPROFILER_LIB_OMPT_SOURCES})
|
||||
|
||||
add_subdirectory(details)
|
||||
@@ -0,0 +1,155 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "lib/common/defines.hpp"
|
||||
|
||||
#define OMPT_INFO_DEFINITION_0( \
|
||||
OMPT_NAME, BACKEND_OMPT_ID, OMPT_ID, OMPT_FUNC_TYPE, OMPT_ARG, OMPT_FUNC_PTR) \
|
||||
namespace rocprofiler \
|
||||
{ \
|
||||
namespace ompt \
|
||||
{ \
|
||||
template <> \
|
||||
struct ompt_info<OMPT_ID> : ompt_domain_info \
|
||||
{ \
|
||||
static constexpr auto ompt_idx = BACKEND_OMPT_ID; \
|
||||
static constexpr auto operation_idx = OMPT_ID; \
|
||||
static constexpr auto name = OMPT_NAME; \
|
||||
static constexpr auto unsupported = false; \
|
||||
\
|
||||
using domain_type = ompt_domain_info; \
|
||||
using this_type = ompt_info<operation_idx>; \
|
||||
\
|
||||
static constexpr auto callback_domain_idx = domain_type::callback_domain_idx; \
|
||||
static constexpr auto buffered_domain_idx = domain_type::buffered_domain_idx; \
|
||||
\
|
||||
using domain_type::args_type; \
|
||||
using domain_type::retval_type; \
|
||||
using domain_type::callback_data_type; \
|
||||
\
|
||||
static constexpr auto offset() \
|
||||
{ \
|
||||
return offsetof(ompt_table_lookup::type, OMPT_FUNC_PTR); \
|
||||
} \
|
||||
\
|
||||
static auto& get_table() { return ompt_table_lookup{}(); } \
|
||||
\
|
||||
template <typename TableT> \
|
||||
static auto& get_table_func(TableT& _table) \
|
||||
{ \
|
||||
if constexpr(std::is_pointer<TableT>::value) \
|
||||
{ \
|
||||
return CHECK_NOTNULL(_table)->OMPT_FUNC_PTR; \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
return _table.OMPT_FUNC_PTR; \
|
||||
} \
|
||||
} \
|
||||
\
|
||||
static auto& get_table_func() { return get_table_func(get_table()); } \
|
||||
\
|
||||
template <typename DataT> \
|
||||
static auto& get_api_data_args(DataT& _data) \
|
||||
{ \
|
||||
return _data.OMPT_ARG; \
|
||||
} \
|
||||
\
|
||||
static std::vector<void*> as_arg_addr(callback_data_type) { return std::vector<void*>{}; } \
|
||||
\
|
||||
static std::vector<common::stringified_argument> as_arg_list(callback_data_type, int32_t) \
|
||||
{ \
|
||||
return {}; \
|
||||
} \
|
||||
}; \
|
||||
} \
|
||||
}
|
||||
|
||||
#define OMPT_INFO_DEFINITION_V( \
|
||||
OMPT_NAME, BACKEND_OMPT_ID, OMPT_ID, OMPT_FUNC_TYPE, OMPT_ARG, OMPT_FUNC_PTR, ...) \
|
||||
namespace rocprofiler \
|
||||
{ \
|
||||
namespace ompt \
|
||||
{ \
|
||||
template <> \
|
||||
struct ompt_info<OMPT_ID> : ompt_domain_info \
|
||||
{ \
|
||||
static constexpr auto ompt_idx = BACKEND_OMPT_ID; \
|
||||
static constexpr auto operation_idx = OMPT_ID; \
|
||||
static constexpr auto name = OMPT_NAME; \
|
||||
static constexpr auto unsupported = false; \
|
||||
\
|
||||
using domain_type = ompt_domain_info; \
|
||||
using this_type = ompt_info<operation_idx>; \
|
||||
\
|
||||
static constexpr auto callback_domain_idx = domain_type::callback_domain_idx; \
|
||||
static constexpr auto buffered_domain_idx = domain_type::buffered_domain_idx; \
|
||||
\
|
||||
using domain_type::args_type; \
|
||||
using domain_type::retval_type; \
|
||||
using domain_type::callback_data_type; \
|
||||
\
|
||||
static constexpr auto offset() \
|
||||
{ \
|
||||
return offsetof(ompt_table_lookup::type, OMPT_FUNC_PTR); \
|
||||
} \
|
||||
\
|
||||
static auto& get_table() { return ompt_table_lookup{}(); } \
|
||||
\
|
||||
template <typename TableT> \
|
||||
static auto& get_table_func(TableT& _table) \
|
||||
{ \
|
||||
if constexpr(std::is_pointer<TableT>::value) \
|
||||
{ \
|
||||
return CHECK_NOTNULL(_table)->OMPT_FUNC_PTR; \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
return _table.OMPT_FUNC_PTR; \
|
||||
} \
|
||||
} \
|
||||
\
|
||||
static auto& get_table_func() { return get_table_func(get_table()); } \
|
||||
\
|
||||
template <typename DataT> \
|
||||
static auto& get_api_data_args(DataT& _data) \
|
||||
{ \
|
||||
return _data.OMPT_ARG; \
|
||||
} \
|
||||
\
|
||||
static std::vector<void*> as_arg_addr(callback_data_type trace_data) \
|
||||
{ \
|
||||
return std::vector<void*>{ \
|
||||
GET_ADDR_MEMBER_FIELDS(get_api_data_args(trace_data.args), __VA_ARGS__)}; \
|
||||
} \
|
||||
\
|
||||
static auto as_arg_list(callback_data_type trace_data, int32_t max_deref) \
|
||||
{ \
|
||||
return utils::stringize( \
|
||||
max_deref, \
|
||||
GET_NAMED_MEMBER_FIELDS(get_api_data_args(trace_data.args), __VA_ARGS__)); \
|
||||
} \
|
||||
}; \
|
||||
} \
|
||||
}
|
||||
@@ -0,0 +1,9 @@
|
||||
#
|
||||
#
|
||||
#
|
||||
set(ROCPROFILER_LIB_OMPT_DETAILS_SOURCES)
|
||||
set(ROCPROFILER_LIB_OMPT_DETAILS_HEADERS format.hpp)
|
||||
|
||||
target_sources(
|
||||
rocprofiler-sdk-object-library PRIVATE ${ROCPROFILER_LIB_OMPT_DETAILS_SOURCES}
|
||||
${ROCPROFILER_LIB_OMPT_DETAILS_HEADERS})
|
||||
@@ -0,0 +1,279 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <rocprofiler-sdk/ompt/omp-tools.h>
|
||||
|
||||
#include "fmt/core.h"
|
||||
|
||||
#define ROCP_SDK_OMPT_FORMATTER(TYPE, ...) \
|
||||
template <> \
|
||||
struct formatter<TYPE> : rocprofiler::ompt::details::base_formatter \
|
||||
{ \
|
||||
template <typename Ctx> \
|
||||
auto format(const TYPE& v, Ctx& ctx) const \
|
||||
{ \
|
||||
return fmt::format_to(ctx.out(), __VA_ARGS__); \
|
||||
} \
|
||||
};
|
||||
|
||||
#define ROCP_SDK_OMPT_FORMAT_CASE_STMT(PREFIX, SUFFIX) \
|
||||
case PREFIX##_##SUFFIX: return fmt::format_to(ctx.out(), #SUFFIX)
|
||||
|
||||
namespace rocprofiler
|
||||
{
|
||||
namespace ompt
|
||||
{
|
||||
namespace details
|
||||
{
|
||||
struct base_formatter
|
||||
{
|
||||
template <typename ParseContext>
|
||||
constexpr auto parse(ParseContext& ctx)
|
||||
{
|
||||
return ctx.begin();
|
||||
}
|
||||
};
|
||||
} // namespace details
|
||||
} // namespace ompt
|
||||
} // namespace rocprofiler
|
||||
|
||||
namespace fmt
|
||||
{
|
||||
template <>
|
||||
struct formatter<ompt_set_result_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_set_result_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_set, error);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_set, never);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_set, impossible);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_set, sometimes);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_set, sometimes_paired);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_set, always);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_thread_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_thread_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_thread, initial);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_thread, worker);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_thread, other);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt_thread, unknown);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_scope_endpoint_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_scope_endpoint_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, scope_begin);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, scope_end);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, scope_beginend);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_dispatch_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_dispatch_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, dispatch_iteration);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, dispatch_section);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, dispatch_ws_loop_chunk);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, dispatch_taskloop_chunk);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, dispatch_distribute_chunk);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_sync_region_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_sync_region_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_barrier);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_barrier_implicit);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_barrier_explicit);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_barrier_implementation);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_taskwait);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_taskgroup);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_reduction);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_barrier_implicit_workshare);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_barrier_implicit_parallel);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, sync_region_barrier_teams);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_target_data_op_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_target_data_op_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_alloc);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_transfer_to_device);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_transfer_from_device);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_delete);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_associate);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_disassociate);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_alloc_async);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_transfer_to_device_async);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_transfer_from_device_async);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, target_data_delete_async);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_data_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_data_t& v, Ctx& ctx) const
|
||||
{
|
||||
return fmt::format_to(ctx.out(), "{}", v.value);
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_work_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_work_t& v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_loop);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_sections);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_single_executor);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_single_other);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_workshare);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_distribute);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_taskloop);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_scope);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_loop_static);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_loop_dynamic);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_loop_guided);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, work_loop_other);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_task_status_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_task_status_t& v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, task_complete);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, task_yield);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, task_cancel);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, task_detach);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, task_early_fulfill);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, task_late_fulfill);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, task_switch);
|
||||
ROCP_SDK_OMPT_FORMAT_CASE_STMT(ompt, taskwait_complete);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_frame_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_frame_t& v, Ctx& ctx) const
|
||||
{
|
||||
return fmt::format_to(
|
||||
ctx.out(),
|
||||
"{}exit_frame={}, enter_frame={}, exit_frame_flags={}, enter_frame_flags={}{}",
|
||||
'{',
|
||||
v.exit_frame,
|
||||
v.enter_frame,
|
||||
v.exit_frame_flags,
|
||||
v.enter_frame_flags,
|
||||
'}');
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_dependence_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_dependence_t& v, Ctx& ctx) const
|
||||
{
|
||||
// stub
|
||||
return fmt::format_to(ctx.out(), "(dependence)");
|
||||
(void) v;
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_dispatch_chunk_t> : rocprofiler::ompt::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_dispatch_chunk_t& v, Ctx& ctx) const
|
||||
{
|
||||
return fmt::format_to(
|
||||
ctx.out(), "{}start={}, iterations={}{}", '{', v.start, v.iterations, '}');
|
||||
}
|
||||
};
|
||||
} // namespace fmt
|
||||
|
||||
#undef ROCP_SDK_OMPT_FORMATTER
|
||||
#undef ROCP_SDK_OMPT_OSTREAM_FORMATTER
|
||||
#undef ROCP_SDK_OMPT_FORMAT_CASE_STMT
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,73 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
#if defined(ROCPROFILER_LIB_ROCPROFILER_OMPT_OMPT_CPP_IMPL) && \
|
||||
ROCPROFILER_LIB_ROCPROFILER_OMPT_OMPT_CPP_IMPL == 1
|
||||
|
||||
# include "lib/common/mpl.hpp"
|
||||
# include "lib/rocprofiler-sdk/ompt/defines.hpp"
|
||||
# include "lib/rocprofiler-sdk/ompt/ompt.hpp"
|
||||
# include "lib/rocprofiler-sdk/ompt/utils.hpp"
|
||||
|
||||
# include <rocprofiler-sdk/external_correlation.h>
|
||||
# include <rocprofiler-sdk/ompt.h>
|
||||
|
||||
# include <type_traits>
|
||||
|
||||
// clang-format off
|
||||
OMPT_INFO_DEFINITION_V("omp_thread_begin", ompt_callback_thread_begin, ROCPROFILER_OMPT_ID_thread_begin, ompt_callback_thread_begin_t, thread_begin, ompt_thread_begin_fn, thread_type, thread_data)
|
||||
OMPT_INFO_DEFINITION_V("omp_thread_end", ompt_callback_thread_end, ROCPROFILER_OMPT_ID_thread_end, ompt_callback_thread_end_t, thread_end, ompt_thread_end_fn, thread_data)
|
||||
OMPT_INFO_DEFINITION_V("omp_parallel_begin", ompt_callback_parallel_begin, ROCPROFILER_OMPT_ID_parallel_begin, ompt_callback_parallel_begin_t, parallel_begin, ompt_parallel_begin_fn, encountering_task_data, encountering_task_frame, parallel_data, requested_parallelism, flags, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_parallel_end", ompt_callback_parallel_end, ROCPROFILER_OMPT_ID_parallel_end, ompt_callback_parallel_end_t, parallel_end, ompt_parallel_end_fn, parallel_data, encountering_task_data, flags, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_task_create", ompt_callback_task_create, ROCPROFILER_OMPT_ID_task_create, ompt_callback_task_create_t, task_create, ompt_task_create_fn, encountering_task_data, encountering_task_frame, new_task_data, flags, has_dependences, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_task_schedule", ompt_callback_task_schedule, ROCPROFILER_OMPT_ID_task_schedule, ompt_callback_task_schedule_t, task_schedule, ompt_task_schedule_fn, prior_task_data, prior_task_status, next_task_data)
|
||||
OMPT_INFO_DEFINITION_V("omp_implicit_task", ompt_callback_implicit_task, ROCPROFILER_OMPT_ID_implicit_task, ompt_callback_implicit_task_t, implicit_task, ompt_implicit_task_fn, endpoint, parallel_data, task_data, actual_parallelism, index, flags)
|
||||
OMPT_INFO_DEFINITION_V("omp_device_initialize", ompt_callback_device_initialize, ROCPROFILER_OMPT_ID_device_initialize, ompt_callback_device_initialize_t, device_initialize, ompt_device_initialize_fn, device_num, type, device, lookup, documentation)
|
||||
OMPT_INFO_DEFINITION_V("omp_device_finalize", ompt_callback_device_finalize, ROCPROFILER_OMPT_ID_device_finalize, ompt_callback_device_finalize_t, device_finalize, ompt_device_finalize_fn, device_num)
|
||||
OMPT_INFO_DEFINITION_V("omp_device_load", ompt_callback_device_load, ROCPROFILER_OMPT_ID_device_load, ompt_callback_device_load_t, device_load, ompt_device_load_fn, device_num, filename, offset_in_file, vma_in_file, bytes, host_addr, device_addr, module_id)
|
||||
// OMPT_INFO_DEFINITION_V("omp_device_unload", ompt_callback_device_unload, ROCPROFILER_OMPT_ID_device_unload, ompt_callback_device_unload_t, device_unload, ompt_device_unload_fn, device_num, module_id)
|
||||
OMPT_INFO_DEFINITION_V("omp_sync_region_wait", ompt_callback_sync_region_wait, ROCPROFILER_OMPT_ID_sync_region_wait, ompt_callback_sync_region_wait_t, sync_region_wait, ompt_sync_region_wait_fn, kind, endpoint, parallel_data, task_data, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_mutex_released", ompt_callback_mutex_released, ROCPROFILER_OMPT_ID_mutex_released, ompt_callback_mutex_released_t, mutex_released, ompt_mutex_released_fn, kind, wait_id, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_dependences", ompt_callback_dependences, ROCPROFILER_OMPT_ID_dependences, ompt_callback_dependences_t, dependences, ompt_dependences_fn, task_data, deps, ndeps)
|
||||
OMPT_INFO_DEFINITION_V("omp_task_dependence", ompt_callback_task_dependence, ROCPROFILER_OMPT_ID_task_dependence, ompt_callback_task_dependence_t, task_dependence, ompt_task_dependence_fn, src_task_data, sink_task_data)
|
||||
OMPT_INFO_DEFINITION_V("omp_work", ompt_callback_work, ROCPROFILER_OMPT_ID_work, ompt_callback_work_t, work, ompt_work_fn, work_type, endpoint, parallel_data, task_data, count, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_masked", ompt_callback_masked, ROCPROFILER_OMPT_ID_masked, ompt_callback_masked_t, masked, ompt_masked_fn, endpoint, parallel_data, task_data, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_sync_region", ompt_callback_sync_region, ROCPROFILER_OMPT_ID_sync_region, ompt_callback_sync_region_t, sync_region, ompt_sync_region_fn, kind, endpoint, parallel_data, task_data, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_lock_init", ompt_callback_lock_init, ROCPROFILER_OMPT_ID_lock_init, ompt_callback_lock_init_t, lock_init, ompt_lock_init_fn, kind, hint, impl, wait_id, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_lock_destroy", ompt_callback_lock_destroy, ROCPROFILER_OMPT_ID_lock_destroy, ompt_callback_lock_destroy_t, lock_destroy, ompt_lock_destroy_fn, kind, wait_id, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_mutex_acquire", ompt_callback_mutex_acquire, ROCPROFILER_OMPT_ID_mutex_acquire, ompt_callback_mutex_acquire_t, mutex_acquire, ompt_mutex_acquire_fn, kind, hint, impl, wait_id, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_mutex_acquired", ompt_callback_mutex_acquired, ROCPROFILER_OMPT_ID_mutex_acquired, ompt_callback_mutex_acquired_t, mutex_acquired, ompt_mutex_acquired_fn, kind, wait_id, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_nest_lock", ompt_callback_nest_lock, ROCPROFILER_OMPT_ID_nest_lock, ompt_callback_nest_lock_t, nest_lock, ompt_nest_lock_fn, endpoint, wait_id, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_flush", ompt_callback_flush, ROCPROFILER_OMPT_ID_flush, ompt_callback_flush_t, flush, ompt_flush_fn, thread_data, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_cancel", ompt_callback_cancel, ROCPROFILER_OMPT_ID_cancel, ompt_callback_cancel_t, cancel, ompt_cancel_fn, task_data, flags, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_reduction", ompt_callback_reduction, ROCPROFILER_OMPT_ID_reduction, ompt_callback_reduction_t, reduction, ompt_reduction_fn, kind, endpoint, parallel_data, task_data, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_dispatch", ompt_callback_dispatch, ROCPROFILER_OMPT_ID_dispatch, ompt_callback_dispatch_t, dispatch, ompt_dispatch_fn, parallel_data, task_data, kind, instance)
|
||||
OMPT_INFO_DEFINITION_V("omp_target_emi", ompt_callback_target_emi, ROCPROFILER_OMPT_ID_target_emi, ompt_callback_target_emi_t, target_emi, ompt_target_emi_fn, kind, endpoint, device_num, task_data, target_task_data, target_data, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_target_data_op_emi", ompt_callback_target_data_op_emi, ROCPROFILER_OMPT_ID_target_data_op_emi, ompt_callback_target_data_op_emi_t, target_data_op_emi, ompt_target_data_op_emi_fn, endpoint, target_task_data, target_data, host_op_id, optype, src_address, src_device_num, dst_address, dst_device_num, bytes, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_target_submit_emi", ompt_callback_target_submit_emi, ROCPROFILER_OMPT_ID_target_submit_emi, ompt_callback_target_submit_emi_t, target_submit_emi, ompt_target_submit_emi_fn, endpoint, target_data, host_op_id, requested_num_teams)
|
||||
// OMPT_INFO_DEFINITION_V("omp_target_map_emi", ompt_callback_target_map_emi, ROCPROFILER_OMPT_ID_target_map_emi, ompt_callback_target_map_emi_t, target_map_emi, ompt_target_map_emi_fn, nitems, host_addr, device_addr, bytes, mapping_flags, codeptr_ra)
|
||||
OMPT_INFO_DEFINITION_V("omp_error", ompt_callback_error, ROCPROFILER_OMPT_ID_error, ompt_callback_error_t, error, ompt_error_fn, severity, message, length, codeptr_ra)
|
||||
// clang-format on
|
||||
|
||||
#else
|
||||
# error "Do not compile this file directly. It is included by lib/rocprofiler-sdk/ompt/ompt.cpp"
|
||||
#endif
|
||||
@@ -0,0 +1,200 @@
|
||||
// mit license
|
||||
//
|
||||
// copyright (c) 2023 advanced micro devices, inc. all rights reserved.
|
||||
//
|
||||
// permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "software"), to deal
|
||||
// in the software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the software, and to permit persons to whom the software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// the above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the software.
|
||||
//
|
||||
// the software is provided "as is", without warranty of any kind, express or
|
||||
// implied, including but not limited to the warranties of merchantability,
|
||||
// fitness for a particular purpose and noninfringement. in no event shall the
|
||||
// authors or copyright holders be liable for any claim, damages or other
|
||||
// liability, whether in an action of contract, tort or otherwise, arising from,
|
||||
// out of or in connection with the software or the use or other dealings in
|
||||
// the software.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "lib/rocprofiler-sdk/context/correlation_id.hpp"
|
||||
#include "lib/rocprofiler-sdk/tracing/fwd.hpp"
|
||||
|
||||
#include <rocprofiler-sdk/buffer_tracing.h>
|
||||
#include <rocprofiler-sdk/callback_tracing.h>
|
||||
#include <rocprofiler-sdk/external_correlation.h>
|
||||
#include <rocprofiler-sdk/fwd.h>
|
||||
#include <rocprofiler-sdk/ompt.h>
|
||||
#include <rocprofiler-sdk/ompt/api_id.h>
|
||||
#include <rocprofiler-sdk/ompt/omp-tools.h>
|
||||
|
||||
#include <cstdint>
|
||||
#include <deque>
|
||||
#include <vector>
|
||||
|
||||
namespace rocprofiler
|
||||
{
|
||||
namespace ompt
|
||||
{
|
||||
struct ompt_table
|
||||
{
|
||||
ompt_callback_thread_begin_t ompt_thread_begin_fn = nullptr;
|
||||
ompt_callback_thread_end_t ompt_thread_end_fn = nullptr;
|
||||
ompt_callback_parallel_begin_t ompt_parallel_begin_fn = nullptr;
|
||||
ompt_callback_parallel_end_t ompt_parallel_end_fn = nullptr;
|
||||
ompt_callback_task_create_t ompt_task_create_fn = nullptr;
|
||||
ompt_callback_task_schedule_t ompt_task_schedule_fn = nullptr;
|
||||
ompt_callback_implicit_task_t ompt_implicit_task_fn = nullptr;
|
||||
ompt_callback_device_initialize_t ompt_device_initialize_fn = nullptr;
|
||||
ompt_callback_device_finalize_t ompt_device_finalize_fn = nullptr;
|
||||
ompt_callback_device_load_t ompt_device_load_fn = nullptr;
|
||||
// ompt_callback_device_unload_t ompt_device_unload_fn = nullptr;
|
||||
ompt_callback_sync_region_t ompt_sync_region_wait_fn = nullptr;
|
||||
ompt_callback_mutex_t ompt_mutex_released_fn = nullptr;
|
||||
ompt_callback_dependences_t ompt_dependences_fn = nullptr;
|
||||
ompt_callback_task_dependence_t ompt_task_dependence_fn = nullptr;
|
||||
ompt_callback_work_t ompt_work_fn = nullptr;
|
||||
ompt_callback_masked_t ompt_masked_fn = nullptr;
|
||||
ompt_callback_target_map_t ompt_target_map_fn = nullptr;
|
||||
ompt_callback_sync_region_t ompt_sync_region_fn = nullptr;
|
||||
ompt_callback_mutex_acquire_t ompt_lock_init_fn = nullptr;
|
||||
ompt_callback_mutex_t ompt_lock_destroy_fn = nullptr;
|
||||
ompt_callback_mutex_acquire_t ompt_mutex_acquire_fn = nullptr;
|
||||
ompt_callback_mutex_t ompt_mutex_acquired_fn = nullptr;
|
||||
ompt_callback_nest_lock_t ompt_nest_lock_fn = nullptr;
|
||||
ompt_callback_flush_t ompt_flush_fn = nullptr;
|
||||
ompt_callback_cancel_t ompt_cancel_fn = nullptr;
|
||||
ompt_callback_sync_region_t ompt_reduction_fn = nullptr;
|
||||
ompt_callback_dispatch_t ompt_dispatch_fn = nullptr;
|
||||
ompt_callback_target_emi_t ompt_target_emi_fn = nullptr;
|
||||
ompt_callback_target_data_op_emi_t ompt_target_data_op_emi_fn = nullptr;
|
||||
ompt_callback_target_submit_emi_t ompt_target_submit_emi_fn = nullptr;
|
||||
// ompt_callback_target_map_emi_t ompt_target_map_emi_fn = nullptr;
|
||||
ompt_callback_error_t ompt_error_fn = nullptr;
|
||||
};
|
||||
|
||||
struct ompt_domain_info
|
||||
{
|
||||
using args_type = rocprofiler_ompt_args_t;
|
||||
using retval_type = void;
|
||||
using callback_data_type = rocprofiler_callback_tracing_ompt_data_t;
|
||||
using buffer_data_type = rocprofiler_buffer_tracing_ompt_record_t;
|
||||
using enum_type = rocprofiler_ompt_operation_t;
|
||||
static constexpr auto callback_domain_idx = ROCPROFILER_CALLBACK_TRACING_OMPT;
|
||||
static constexpr auto buffered_domain_idx = ROCPROFILER_BUFFER_TRACING_OMPT;
|
||||
static constexpr auto none = ROCPROFILER_OMPT_ID_NONE;
|
||||
static constexpr auto last = ROCPROFILER_OMPT_ID_LAST;
|
||||
static constexpr auto ompt_last = ROCPROFILER_OMPT_ID_callback_functions;
|
||||
static constexpr auto external_correlation_id_domain_idx =
|
||||
ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_OMPT;
|
||||
};
|
||||
|
||||
using buffer_ompt_record_t = rocprofiler_buffer_tracing_ompt_record_t;
|
||||
using callback_ompt_data_t = rocprofiler_callback_tracing_ompt_data_t;
|
||||
|
||||
// save state for ompt between callbacks
|
||||
struct ompt_save_state
|
||||
{
|
||||
uint64_t thr_id; // thread this was created on
|
||||
uint64_t start_timestamp; // timestamp when it was created
|
||||
rocprofiler_ompt_operation_t operation_idx; // for error checking
|
||||
context::correlation_id* corr_id; // correlation id
|
||||
tracing::external_correlation_id_map_t external_corr_ids;
|
||||
tracing::callback_context_data_vec_t callback_contexts;
|
||||
tracing::buffered_context_data_vec_t buffered_contexts;
|
||||
};
|
||||
|
||||
// proxy for ompt_data_t pointers received from OMPT callbacks
|
||||
// SDK gets index 0, client get index 1
|
||||
class ompt_data_proxy
|
||||
{
|
||||
public:
|
||||
ompt_data_t* get_client_ptr(ompt_data_t* ompt_ptr) { return get<1>(ompt_ptr); }
|
||||
ompt_data_t* get_internal_ptr(ompt_data_t* ompt_ptr) { return get<0>(ompt_ptr); }
|
||||
|
||||
private:
|
||||
struct proxy_ptrs
|
||||
{
|
||||
std::array<ompt_data_t, 2> v;
|
||||
};
|
||||
|
||||
// get the proxy pointer for idx. If ompt_ptr->ptr is 0,
|
||||
// allocate a new proxy struct and assign it to ompt_ptr->ptr
|
||||
// return the address of the requested proxy element,
|
||||
// or null if ompt_ptr is null
|
||||
template <int idx>
|
||||
ompt_data_t* get(ompt_data_t* ompt_ptr)
|
||||
{
|
||||
static constexpr proxy_ptrs nulval{};
|
||||
if(ompt_ptr == nullptr) return nullptr;
|
||||
if(ompt_ptr->ptr == nullptr)
|
||||
{
|
||||
std::lock_guard<std::mutex> lk(m);
|
||||
ompt_ptr->ptr = &proxies.emplace_back(nulval);
|
||||
}
|
||||
auto* ptr = static_cast<proxy_ptrs*>(ompt_ptr->ptr);
|
||||
return &(ptr->v[idx]);
|
||||
}
|
||||
|
||||
std::deque<proxy_ptrs> proxies;
|
||||
std::mutex m;
|
||||
};
|
||||
|
||||
// function to return client pointer for use outside
|
||||
ompt_data_t*
|
||||
proxy_data_ptr(ompt_data_t* real_ptr);
|
||||
|
||||
struct ompt_task_save_state
|
||||
{
|
||||
context::correlation_id* corr_id;
|
||||
int task_flags;
|
||||
};
|
||||
|
||||
template <size_t opidx>
|
||||
struct ompt_impl
|
||||
{
|
||||
template <typename DataArgst, typename... Args>
|
||||
static void set_data_args(DataArgst&, Args... args);
|
||||
|
||||
template <typename... Args>
|
||||
static void begin(ompt_data_t* data, Args... args);
|
||||
|
||||
template <typename... Args>
|
||||
static void end(ompt_data_t* data, Args... args);
|
||||
|
||||
template <typename... Args>
|
||||
static context::correlation_id* event_common(Args... args);
|
||||
|
||||
template <typename... Args>
|
||||
static void event(Args&&... args);
|
||||
};
|
||||
|
||||
template <size_t OpIdx>
|
||||
struct ompt_info;
|
||||
|
||||
const char*
|
||||
name_by_id(uint32_t id);
|
||||
|
||||
std::vector<uint32_t>
|
||||
get_ids();
|
||||
|
||||
void
|
||||
iterate_args(uint32_t id,
|
||||
const rocprofiler_callback_tracing_ompt_data_t& data,
|
||||
rocprofiler_callback_tracing_operation_args_cb_t callback,
|
||||
int32_t max_deref,
|
||||
void* user_data);
|
||||
|
||||
using ompt_update_func = void (*)(const char* cbname, ompt_callback_t* cbf, int cbnum);
|
||||
void update_table(ompt_update_func);
|
||||
|
||||
void
|
||||
update_callback(rocprofiler_ompt_callback_functions_t& cb_functions);
|
||||
|
||||
} // namespace ompt
|
||||
} // namespace rocprofiler
|
||||
@@ -0,0 +1,72 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "lib/common/stringize_arg.hpp"
|
||||
#include "lib/rocprofiler-sdk/ompt/details/format.hpp"
|
||||
|
||||
#include <rocprofiler-sdk/ompt/omp-tools.h>
|
||||
#include <rocprofiler-sdk/version.h>
|
||||
|
||||
#include <fmt/core.h>
|
||||
#include <fmt/ranges.h>
|
||||
|
||||
#include <sstream>
|
||||
#include <type_traits>
|
||||
|
||||
namespace rocprofiler
|
||||
{
|
||||
namespace ompt
|
||||
{
|
||||
namespace utils
|
||||
{
|
||||
template <typename Tp>
|
||||
auto
|
||||
stringize_impl(const Tp& _v)
|
||||
{
|
||||
using value_type = std::decay_t<Tp>;
|
||||
|
||||
if constexpr(fmt::is_formattable<value_type>::value && !std::is_pointer<value_type>::value)
|
||||
{
|
||||
return fmt::format("{}", _v);
|
||||
}
|
||||
else
|
||||
{
|
||||
auto _ss = std::stringstream{};
|
||||
_ss << _v;
|
||||
return _ss.str();
|
||||
}
|
||||
}
|
||||
|
||||
template <typename... Args>
|
||||
auto
|
||||
stringize(int32_t max_deref, Args... args)
|
||||
{
|
||||
using array_type = common::stringified_argument_array_t<sizeof...(Args)>;
|
||||
(void) max_deref;
|
||||
return array_type{common::stringize_arg(
|
||||
max_deref, args, [](const auto& _v) { return stringize_impl(_v); })...};
|
||||
}
|
||||
} // namespace utils
|
||||
} // namespace ompt
|
||||
} // namespace rocprofiler
|
||||
@@ -1,8 +0,0 @@
|
||||
#
|
||||
#
|
||||
set(ROCPROFILER_LIB_OPENMP_SOURCES openmp.cpp)
|
||||
set(ROCPROFILER_LIB_OPENMP_HEADERS defines.hpp openmp.hpp utils.hpp)
|
||||
|
||||
target_sources(rocprofiler-object-library PRIVATE ${ROCPROFILER_LIB_OPENMP_SOURCES})
|
||||
|
||||
add_subdirectory(details)
|
||||
@@ -1,9 +0,0 @@
|
||||
#
|
||||
#
|
||||
#
|
||||
set(ROCPROFILER_LIB_OPENMP_DETAILS_SOURCES)
|
||||
set(ROCPROFILER_LIB_OPENMP_DETAILS_HEADERS format.hpp)
|
||||
|
||||
target_sources(
|
||||
rocprofiler-object-library PRIVATE ${ROCPROFILER_LIB_OPENMP_DETAILS_SOURCES}
|
||||
${ROCPROFILER_LIB_OPENMP_DETAILS_HEADERS})
|
||||
@@ -1,279 +0,0 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <rocprofiler-sdk/openmp/omp-tools.h>
|
||||
|
||||
#include "fmt/core.h"
|
||||
|
||||
#define ROCP_SDK_OPENMP_FORMATTER(TYPE, ...) \
|
||||
template <> \
|
||||
struct formatter<TYPE> : rocprofiler::openmp::details::base_formatter \
|
||||
{ \
|
||||
template <typename Ctx> \
|
||||
auto format(const TYPE& v, Ctx& ctx) const \
|
||||
{ \
|
||||
return fmt::format_to(ctx.out(), __VA_ARGS__); \
|
||||
} \
|
||||
};
|
||||
|
||||
#define ROCP_SDK_OPENMP_FORMAT_CASE_STMT(PREFIX, SUFFIX) \
|
||||
case PREFIX##_##SUFFIX: return fmt::format_to(ctx.out(), #SUFFIX)
|
||||
|
||||
namespace rocprofiler
|
||||
{
|
||||
namespace openmp
|
||||
{
|
||||
namespace details
|
||||
{
|
||||
struct base_formatter
|
||||
{
|
||||
template <typename ParseContext>
|
||||
constexpr auto parse(ParseContext& ctx)
|
||||
{
|
||||
return ctx.begin();
|
||||
}
|
||||
};
|
||||
} // namespace details
|
||||
} // namespace openmp
|
||||
} // namespace rocprofiler
|
||||
|
||||
namespace fmt
|
||||
{
|
||||
template <>
|
||||
struct formatter<ompt_set_result_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_set_result_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_set, error);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_set, never);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_set, impossible);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_set, sometimes);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_set, sometimes_paired);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_set, always);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_thread_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_thread_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_thread, initial);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_thread, worker);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_thread, other);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt_thread, unknown);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_scope_endpoint_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_scope_endpoint_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, scope_begin);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, scope_end);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, scope_beginend);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_dispatch_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_dispatch_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, dispatch_iteration);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, dispatch_section);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, dispatch_ws_loop_chunk);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, dispatch_taskloop_chunk);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, dispatch_distribute_chunk);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_sync_region_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_sync_region_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_barrier);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_barrier_implicit);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_barrier_explicit);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_barrier_implementation);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_taskwait);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_taskgroup);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_reduction);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_barrier_implicit_workshare);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_barrier_implicit_parallel);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, sync_region_barrier_teams);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_target_data_op_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(ompt_target_data_op_t v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_alloc);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_transfer_to_device);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_transfer_from_device);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_delete);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_associate);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_disassociate);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_alloc_async);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_transfer_to_device_async);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_transfer_from_device_async);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, target_data_delete_async);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_data_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_data_t& v, Ctx& ctx) const
|
||||
{
|
||||
return fmt::format_to(ctx.out(), "{}", v.value);
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_work_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_work_t& v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_loop);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_sections);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_single_executor);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_single_other);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_workshare);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_distribute);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_taskloop);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_scope);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_loop_static);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_loop_dynamic);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_loop_guided);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, work_loop_other);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_task_status_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_task_status_t& v, Ctx& ctx) const
|
||||
{
|
||||
switch(v)
|
||||
{
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, task_complete);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, task_yield);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, task_cancel);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, task_detach);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, task_early_fulfill);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, task_late_fulfill);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, task_switch);
|
||||
ROCP_SDK_OPENMP_FORMAT_CASE_STMT(ompt, taskwait_complete);
|
||||
}
|
||||
return fmt::format_to(ctx.out(), "Unknown");
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_frame_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_frame_t& v, Ctx& ctx) const
|
||||
{
|
||||
return fmt::format_to(
|
||||
ctx.out(),
|
||||
"{}exit_frame={}, enter_frame={}, exit_frame_flags={}, enter_frame_flags={}{}",
|
||||
'{',
|
||||
v.exit_frame,
|
||||
v.enter_frame,
|
||||
v.exit_frame_flags,
|
||||
v.enter_frame_flags,
|
||||
'}');
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_dependence_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_dependence_t& v, Ctx& ctx) const
|
||||
{
|
||||
// stub
|
||||
return fmt::format_to(ctx.out(), "(dependence)");
|
||||
(void) v;
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct formatter<ompt_dispatch_chunk_t> : rocprofiler::openmp::details::base_formatter
|
||||
{
|
||||
template <typename Ctx>
|
||||
auto format(const ompt_dispatch_chunk_t& v, Ctx& ctx) const
|
||||
{
|
||||
return fmt::format_to(
|
||||
ctx.out(), "{}start={}, iterations={}{}", '{', v.start, v.iterations, '}');
|
||||
}
|
||||
};
|
||||
} // namespace fmt
|
||||
|
||||
#undef ROCP_SDK_OPENMP_FORMATTER
|
||||
#undef ROCP_SDK_OPENMP_OSTREAM_FORMATTER
|
||||
#undef ROCP_SDK_OPENMP_FORMAT_CASE_STMT
|
||||
@@ -41,6 +41,7 @@
|
||||
#include "lib/rocprofiler-sdk/intercept_table.hpp"
|
||||
#include "lib/rocprofiler-sdk/internal_threading.hpp"
|
||||
#include "lib/rocprofiler-sdk/marker/marker.hpp"
|
||||
#include "lib/rocprofiler-sdk/ompt.hpp"
|
||||
#include "lib/rocprofiler-sdk/page_migration/page_migration.hpp"
|
||||
#include "lib/rocprofiler-sdk/pc_sampling/code_object.hpp"
|
||||
#include "lib/rocprofiler-sdk/pc_sampling/service.hpp"
|
||||
@@ -51,6 +52,7 @@
|
||||
#include <rocprofiler-sdk/fwd.h>
|
||||
#include <rocprofiler-sdk/hsa.h>
|
||||
#include <rocprofiler-sdk/marker.h>
|
||||
#include <rocprofiler-sdk/ompt.h>
|
||||
#include <rocprofiler-sdk/version.h>
|
||||
|
||||
#include <hsa/hsa_api_trace.h>
|
||||
@@ -639,6 +641,7 @@ finalize()
|
||||
counters::device_counting_service_finalize();
|
||||
hsa::queue_controller_fini();
|
||||
thread_trace::finalize();
|
||||
ompt::finalize_ompt();
|
||||
page_migration::finalize();
|
||||
#if ROCPROFILER_SDK_HSA_PC_SAMPLING > 0
|
||||
// WARNING: this must precede `code_object::finalize()`
|
||||
|
||||
@@ -11,6 +11,10 @@ race:libamdhip64.so
|
||||
# data race arising from hsa runtime
|
||||
race:libhsa-runtime64.so
|
||||
|
||||
# data race(s) arising from OpenMP
|
||||
race:__kmp_resume_template
|
||||
race:__kmp_suspend_initialize_thread
|
||||
|
||||
# unlock of an unlocked mutex (or by a wrong thread)
|
||||
mutex:libhsa-runtime64.so
|
||||
|
||||
|
||||
@@ -61,7 +61,8 @@ add_subdirectory(page-migration)
|
||||
add_subdirectory(pc_sampling)
|
||||
add_subdirectory(thread-trace)
|
||||
add_subdirectory(hip-graph-tracing)
|
||||
add_subdirectory(counter-collection)
|
||||
add_subdirectory(openmp-tools)
|
||||
|
||||
# rocprofv3 validation tests
|
||||
add_subdirectory(rocprofv3)
|
||||
add_subdirectory(counter-collection)
|
||||
|
||||
@@ -12,6 +12,7 @@ set(CMAKE_BUILD_RPATH
|
||||
# applications used by integration tests which DO link to rocprofiler-sdk-roctx
|
||||
add_subdirectory(reproducible-runtime)
|
||||
add_subdirectory(transpose)
|
||||
add_subdirectory(openmp)
|
||||
|
||||
set(CMAKE_BUILD_RPATH "\$ORIGIN:\$ORIGIN/../lib")
|
||||
|
||||
|
||||
@@ -0,0 +1,41 @@
|
||||
#
|
||||
#
|
||||
#
|
||||
cmake_minimum_required(VERSION 3.21.0 FATAL_ERROR)
|
||||
|
||||
if(NOT OMP_TARGET_COMPILER)
|
||||
find_program(
|
||||
amdclangpp_EXECUTABLE
|
||||
NAMES amdclang++
|
||||
HINTS ${ROCM_PATH} ENV ROCM_PATH /opt/rocm
|
||||
PATHS ${ROCM_PATH} ENV ROCM_PATH /opt/rocm
|
||||
PATH_SUFFIXES bin llvm/bin NO_CACHE)
|
||||
mark_as_advanced(amdclangpp_EXECUTABLE)
|
||||
|
||||
if(amdclangpp_EXECUTABLE)
|
||||
set(OMP_TARGET_COMPILER
|
||||
"${amdclangpp_EXECUTABLE}"
|
||||
CACHE FILEPATH "")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
project(rocprofiler-tests-bin-openmp LANGUAGES CXX)
|
||||
|
||||
find_package(rocprofiler-sdk REQUIRED)
|
||||
|
||||
set(DEFAULT_GPU_TARGETS
|
||||
"gfx906"
|
||||
"gfx908"
|
||||
"gfx90a"
|
||||
"gfx940"
|
||||
"gfx941"
|
||||
"gfx942"
|
||||
"gfx1100"
|
||||
"gfx1101"
|
||||
"gfx1102")
|
||||
|
||||
set(OPENMP_GPU_TARGETS
|
||||
"${DEFAULT_GPU_TARGETS}"
|
||||
CACHE STRING "GPU targets to compile for")
|
||||
|
||||
add_subdirectory(target)
|
||||
@@ -0,0 +1,23 @@
|
||||
#
|
||||
#
|
||||
#
|
||||
|
||||
set(CMAKE_BUILD_TYPE "RelWithDebInfo")
|
||||
|
||||
find_package(Threads REQUIRED)
|
||||
find_package(rocprofiler-sdk-roctx REQUIRED)
|
||||
|
||||
add_executable(openmp-target)
|
||||
target_sources(openmp-target PRIVATE openmp-target.cpp)
|
||||
target_link_libraries(openmp-target PRIVATE Threads::Threads
|
||||
rocprofiler-sdk-roctx::rocprofiler-sdk-roctx)
|
||||
target_compile_options(openmp-target PRIVATE -fopenmp)
|
||||
target_link_options(openmp-target PRIVATE -fopenmp)
|
||||
|
||||
foreach(_TARGET ${OPENMP_GPU_TARGETS})
|
||||
target_compile_options(openmp-target PRIVATE --offload-arch=${_TARGET})
|
||||
target_link_options(openmp-target PRIVATE --offload-arch=${_TARGET})
|
||||
endforeach()
|
||||
|
||||
include(rocprofiler-sdk-custom-compilation)
|
||||
rocprofiler_sdk_custom_compilation(TARGET openmp-target COMPILER ${OMP_TARGET_COMPILER})
|
||||
@@ -0,0 +1,155 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include <rocprofiler-sdk-roctx/roctx.h>
|
||||
|
||||
#include <math.h>
|
||||
#include <stdio.h>
|
||||
|
||||
constexpr float EPS_FLOAT = 1.0e-7f;
|
||||
constexpr double EPS_DOUBLE = 1.0e-15;
|
||||
|
||||
#pragma omp declare target
|
||||
template <typename T>
|
||||
T
|
||||
mul(T a, T b)
|
||||
{
|
||||
volatile T c = a * b;
|
||||
return c;
|
||||
}
|
||||
#pragma omp end declare target
|
||||
|
||||
template <typename T>
|
||||
void
|
||||
vmul(T* a, T* b, T* c, int N)
|
||||
{
|
||||
#pragma omp target map(to : a [0:N], b [0:N]) map(from : c [0:N])
|
||||
#pragma omp teams distribute parallel for
|
||||
for(int i = 0; i < N; i++)
|
||||
{
|
||||
for(int j = 0; j < 100000; ++j)
|
||||
c[i] = mul(a[i], b[i]);
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
main()
|
||||
{
|
||||
auto range_id = roctxRangeStart("main");
|
||||
|
||||
constexpr int N = 100000;
|
||||
int a_i[N], b_i[N], c_i[N], validate_i[N];
|
||||
float a_f[N], b_f[N], c_f[N], validate_f[N];
|
||||
double a_d[N], b_d[N], c_d[N], validate_d[N];
|
||||
int N_errors = 0;
|
||||
bool flag = false;
|
||||
|
||||
roctxMark("initialization");
|
||||
|
||||
#pragma omp parallel for
|
||||
for(int i = 0; i < N; ++i)
|
||||
{
|
||||
a_f[i] = a_i[i] = i + 1;
|
||||
b_f[i] = b_i[i] = i + 2;
|
||||
a_d[i] = a_i[i];
|
||||
b_d[i] = b_i[i];
|
||||
validate_i[i] = a_i[i] * b_i[i];
|
||||
validate_f[i] = a_f[i] * b_f[i];
|
||||
validate_d[i] = a_d[i] * b_d[i];
|
||||
}
|
||||
|
||||
vmul(a_i, b_i, c_i, N);
|
||||
vmul(a_f, b_f, c_f, N);
|
||||
|
||||
auto tid = roctx_thread_id_t{};
|
||||
// get the thread id recognized by rocprofiler-sdk from roctx
|
||||
roctxGetThreadId(&tid);
|
||||
// pause API tracing
|
||||
roctxProfilerPause(tid);
|
||||
|
||||
// we don't expect to see the third vmul
|
||||
vmul(a_d, b_d, c_d, N);
|
||||
|
||||
// resume API tracing
|
||||
roctxProfilerResume(tid);
|
||||
|
||||
for(int i = 0; i < N; i++)
|
||||
{
|
||||
if(c_i[i] != validate_i[i])
|
||||
{
|
||||
++N_errors;
|
||||
// print 1st bad index
|
||||
if(!flag)
|
||||
{
|
||||
printf(
|
||||
"First fail: c_i[%d](%d) != validate_i[%d](%d)\n", i, c_i[i], i, validate_i[i]);
|
||||
flag = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
flag = false;
|
||||
for(int i = 0; i < N; i++)
|
||||
{
|
||||
if(fabs(c_f[i] - validate_f[i]) > EPS_FLOAT)
|
||||
{
|
||||
++N_errors;
|
||||
// print 1st bad index
|
||||
if(!flag)
|
||||
{
|
||||
printf("First fail: c_f[%d](%f) != validate_f[%d](%f)\n",
|
||||
i,
|
||||
static_cast<double>(c_f[i]),
|
||||
i,
|
||||
static_cast<double>(validate_f[i]));
|
||||
flag = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
flag = false;
|
||||
for(int i = 0; i < N; i++)
|
||||
{
|
||||
if(fabs(c_d[i] - validate_d[i]) > EPS_DOUBLE)
|
||||
{
|
||||
++N_errors;
|
||||
// print 1st bad index
|
||||
if(!flag)
|
||||
{
|
||||
printf(
|
||||
"First fail: c_d[%d](%f) != validate_d[%d](%f)\n", i, c_d[i], i, validate_d[i]);
|
||||
flag = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
if(N_errors == 0)
|
||||
{
|
||||
printf("Success\n");
|
||||
return 0;
|
||||
}
|
||||
else
|
||||
{
|
||||
printf("Total %d failures\n", N_errors);
|
||||
printf("Fail\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
roctxRangeStop(range_id);
|
||||
}
|
||||
@@ -0,0 +1,75 @@
|
||||
#
|
||||
#
|
||||
#
|
||||
cmake_minimum_required(VERSION 3.21.0 FATAL_ERROR)
|
||||
|
||||
project(
|
||||
rocprofiler-tests-openmp-tools
|
||||
LANGUAGES CXX
|
||||
VERSION 0.0.0)
|
||||
|
||||
find_package(rocprofiler-sdk REQUIRED)
|
||||
|
||||
set(PYTEST_ARGS)
|
||||
if(ROCPROFILER_MEMCHECK MATCHES "(Address|Thread)Sanitizer" OR ROCPROFILER_BUILD_CODECOV)
|
||||
set(PYTEST_ARGS -k "not test_total_runtime")
|
||||
endif()
|
||||
|
||||
if(ROCPROFILER_MEMCHECK_PRELOAD_ENV)
|
||||
set(PRELOAD_ENV
|
||||
"${ROCPROFILER_MEMCHECK_PRELOAD_ENV}:$<TARGET_FILE:rocprofiler-sdk-json-tool>")
|
||||
else()
|
||||
set(PRELOAD_ENV "LD_PRELOAD=$<TARGET_FILE:rocprofiler-sdk-json-tool>")
|
||||
endif()
|
||||
|
||||
if(ROCPROFILER_MEMCHECK STREQUAL "ThreadSanitizer")
|
||||
set(IS_THREAD_SANITIZER ON)
|
||||
else()
|
||||
set(IS_THREAD_SANITIZER OFF)
|
||||
endif()
|
||||
|
||||
add_test(NAME test-openmp-tools-execute COMMAND $<TARGET_FILE:openmp-target>)
|
||||
|
||||
set(openmp-tools-env
|
||||
"${PRELOAD_ENV}"
|
||||
"OMP_NUM_THREADS=2"
|
||||
"OMP_DISPLAY_ENV=1"
|
||||
"OMP_TARGET_OFFLOAD=mandatory"
|
||||
"ROCR_VISIBLE_DEVICES=0"
|
||||
"ROCPROFILER_TOOL_OUTPUT_FILE=openmp-tools-test.json"
|
||||
"LD_LIBRARY_PATH=$<TARGET_FILE_DIR:rocprofiler-sdk::rocprofiler-sdk-shared-library>:$ENV{LD_LIBRARY_PATH}"
|
||||
)
|
||||
|
||||
set_tests_properties(
|
||||
test-openmp-tools-execute
|
||||
PROPERTIES TIMEOUT
|
||||
100
|
||||
LABELS
|
||||
"integration-tests;openmp-target"
|
||||
ENVIRONMENT
|
||||
"${openmp-tools-env}"
|
||||
FAIL_REGULAR_EXPRESSION
|
||||
"${ROCPROFILER_DEFAULT_FAIL_REGEX}"
|
||||
DISABLED
|
||||
"${IS_THREAD_SANITIZER}")
|
||||
|
||||
# copy to binary directory
|
||||
rocprofiler_configure_pytest_files(COPY validate.py conftest.py CONFIG pytest.ini)
|
||||
|
||||
add_test(
|
||||
NAME test-openmp-tools-validate
|
||||
COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/validate.py ${PYTEST_ARGS}
|
||||
--input ${CMAKE_CURRENT_BINARY_DIR}/openmp-tools-test.json)
|
||||
|
||||
set_tests_properties(
|
||||
test-openmp-tools-validate
|
||||
PROPERTIES TIMEOUT
|
||||
45
|
||||
LABELS
|
||||
"integration-tests;openmp-target"
|
||||
DEPENDS
|
||||
test-openmp-tools-execute
|
||||
FAIL_REGULAR_EXPRESSION
|
||||
"${ROCPROFILER_DEFAULT_FAIL_REGEX}"
|
||||
DISABLED
|
||||
"${IS_THREAD_SANITIZER}")
|
||||
@@ -0,0 +1,22 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import json
|
||||
import pytest
|
||||
|
||||
from rocprofiler_sdk.pytest_utils.dotdict import dotdict
|
||||
|
||||
|
||||
def pytest_addoption(parser):
|
||||
parser.addoption(
|
||||
"--input",
|
||||
action="store",
|
||||
default="openmp-tools-test.json",
|
||||
help="Input JSON",
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def input_data(request):
|
||||
filename = request.config.getoption("--input")
|
||||
with open(filename, "r") as inp:
|
||||
return dotdict(json.load(inp))
|
||||
@@ -0,0 +1,5 @@
|
||||
|
||||
[pytest]
|
||||
addopts = --durations=20 -rA -s -vv
|
||||
testpaths = validate.py
|
||||
pythonpath = @ROCPROFILER_SDK_TESTS_BINARY_DIR@/pytest-packages
|
||||
@@ -0,0 +1,322 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import sys
|
||||
import pytest
|
||||
|
||||
|
||||
# helper function
|
||||
def node_exists(name, data, min_len=1):
|
||||
assert name in data
|
||||
assert data[name] is not None
|
||||
if isinstance(data[name], (list, tuple, dict, set)):
|
||||
assert len(data[name]) >= min_len
|
||||
|
||||
|
||||
def test_data_structure(input_data):
|
||||
"""verify minimum amount of expected data is present"""
|
||||
data = input_data
|
||||
|
||||
node_exists("rocprofiler-sdk-json-tool", data)
|
||||
|
||||
sdk_data = data["rocprofiler-sdk-json-tool"]
|
||||
|
||||
node_exists("metadata", sdk_data)
|
||||
node_exists("pid", sdk_data["metadata"])
|
||||
node_exists("main_tid", sdk_data["metadata"])
|
||||
node_exists("init_time", sdk_data["metadata"])
|
||||
node_exists("fini_time", sdk_data["metadata"])
|
||||
|
||||
node_exists("agents", sdk_data)
|
||||
node_exists("call_stack", sdk_data)
|
||||
node_exists("callback_records", sdk_data)
|
||||
node_exists("buffer_records", sdk_data)
|
||||
|
||||
node_exists("names", sdk_data.callback_records)
|
||||
node_exists("code_objects", sdk_data.callback_records)
|
||||
node_exists("kernel_symbols", sdk_data.callback_records)
|
||||
node_exists("hsa_api_traces", sdk_data.callback_records)
|
||||
node_exists("hip_api_traces", sdk_data.callback_records, 0)
|
||||
node_exists("marker_api_traces", sdk_data.callback_records)
|
||||
node_exists("rccl_api_traces", sdk_data.callback_records, 0)
|
||||
node_exists("ompt_traces", sdk_data.callback_records)
|
||||
node_exists("kernel_dispatch", sdk_data.callback_records)
|
||||
|
||||
node_exists("names", sdk_data.buffer_records)
|
||||
node_exists("kernel_dispatch", sdk_data.buffer_records)
|
||||
node_exists("memory_copies", sdk_data.buffer_records, 0)
|
||||
node_exists("hsa_api_traces", sdk_data.buffer_records)
|
||||
node_exists("hip_api_traces", sdk_data.buffer_records, 0)
|
||||
node_exists("marker_api_traces", sdk_data.buffer_records)
|
||||
node_exists("rccl_api_traces", sdk_data.buffer_records, 0)
|
||||
node_exists("ompt_traces", sdk_data.buffer_records)
|
||||
node_exists("retired_correlation_ids", sdk_data.buffer_records)
|
||||
|
||||
|
||||
def test_size_entries(input_data):
|
||||
# check that size fields are > 0 but account for function arguments
|
||||
# which are named "size"
|
||||
def check_size(data, bt):
|
||||
if "size" in data.keys():
|
||||
if isinstance(data.size, str) and bt.endswith('["args"]'):
|
||||
pass
|
||||
else:
|
||||
assert data.size > 0, f"origin: {bt}"
|
||||
|
||||
# recursively check the entire data structure
|
||||
def iterate_data(data, bt):
|
||||
if isinstance(data, (list, tuple)):
|
||||
for i, itr in enumerate(data):
|
||||
if isinstance(itr, dict):
|
||||
check_size(itr, f"{bt}[{i}]")
|
||||
iterate_data(itr, f"{bt}[{i}]")
|
||||
elif isinstance(data, dict):
|
||||
check_size(data, f"{bt}")
|
||||
for key, itr in data.items():
|
||||
iterate_data(itr, f'{bt}["{key}"]')
|
||||
|
||||
# start recursive check over entire JSON dict
|
||||
iterate_data(input_data, "input_data")
|
||||
|
||||
|
||||
def test_timestamps(input_data):
|
||||
data = input_data
|
||||
sdk_data = data["rocprofiler-sdk-json-tool"]
|
||||
|
||||
cb_start = {}
|
||||
cb_end = {}
|
||||
for titr in [
|
||||
"hsa_api_traces",
|
||||
"marker_api_traces",
|
||||
"hip_api_traces",
|
||||
"rccl_api_traces",
|
||||
"ompt_traces",
|
||||
]:
|
||||
for itr in sdk_data.callback_records[titr]:
|
||||
cid = itr.correlation_id.internal
|
||||
phase = itr.phase
|
||||
if phase == 1:
|
||||
cb_start[cid] = itr.timestamp
|
||||
elif phase == 2:
|
||||
cb_end[cid] = itr.timestamp
|
||||
assert cb_start[cid] <= itr.timestamp
|
||||
|
||||
for itr in sdk_data.buffer_records[titr]:
|
||||
assert itr.start_timestamp <= itr.end_timestamp
|
||||
|
||||
for titr in ["kernel_dispatch", "memory_copies"]:
|
||||
for itr in sdk_data.buffer_records[titr]:
|
||||
assert itr.start_timestamp < itr.end_timestamp, f"[{titr}] {itr}"
|
||||
assert itr.correlation_id.internal > 0, f"[{titr}] {itr}"
|
||||
assert itr.correlation_id.external > 0, f"[{titr}] {itr}"
|
||||
assert sdk_data.metadata.init_time < itr.start_timestamp, f"[{titr}] {itr}"
|
||||
assert sdk_data.metadata.init_time < itr.end_timestamp, f"[{titr}] {itr}"
|
||||
assert sdk_data.metadata.fini_time > itr.start_timestamp, f"[{titr}] {itr}"
|
||||
assert sdk_data.metadata.fini_time > itr.end_timestamp, f"[{titr}] {itr}"
|
||||
|
||||
api_start = cb_start[itr.correlation_id.internal]
|
||||
# api_end = cb_end[itr.correlation_id.internal]
|
||||
assert api_start < itr.start_timestamp, f"[{titr}] {itr}"
|
||||
# assert api_end <= itr.end_timestamp, f"[{titr}] {itr}"
|
||||
|
||||
|
||||
def test_total_runtime(input_data):
|
||||
sdk_data = input_data["rocprofiler-sdk-json-tool"]
|
||||
|
||||
runtime_data = []
|
||||
for itr in sdk_data.buffer_records.kernel_dispatch:
|
||||
elapsed = itr.end_timestamp - itr.start_timestamp
|
||||
runtime_data.append(elapsed) # in nanoseconds
|
||||
|
||||
expected_runtime = 1.0e-6 # one millisecond
|
||||
|
||||
assert sum(runtime_data) >= expected_runtime
|
||||
|
||||
|
||||
def test_internal_correlation_ids(input_data):
|
||||
data = input_data
|
||||
sdk_data = data["rocprofiler-sdk-json-tool"]
|
||||
|
||||
api_corr_ids = []
|
||||
for titr in [
|
||||
"hsa_api_traces",
|
||||
"marker_api_traces",
|
||||
"hip_api_traces",
|
||||
"rccl_api_traces",
|
||||
"ompt_traces",
|
||||
]:
|
||||
for itr in sdk_data.callback_records[titr]:
|
||||
api_corr_ids.append(itr.correlation_id.internal)
|
||||
|
||||
for itr in sdk_data.buffer_records[titr]:
|
||||
api_corr_ids.append(itr.correlation_id.internal)
|
||||
|
||||
api_corr_ids_sorted = sorted(api_corr_ids)
|
||||
api_corr_ids_unique = list(set(api_corr_ids))
|
||||
|
||||
for itr in sdk_data.buffer_records.kernel_dispatch:
|
||||
assert itr.correlation_id.internal in api_corr_ids_unique
|
||||
|
||||
for itr in sdk_data.buffer_records.memory_copies:
|
||||
assert itr.correlation_id.internal in api_corr_ids_unique
|
||||
|
||||
len_corr_id_unq = len(api_corr_ids_unique)
|
||||
assert len(api_corr_ids) != len_corr_id_unq
|
||||
assert max(api_corr_ids_sorted) == len_corr_id_unq
|
||||
|
||||
|
||||
def test_retired_correlation_ids(input_data):
|
||||
data = input_data
|
||||
sdk_data = data["rocprofiler-sdk-json-tool"]
|
||||
|
||||
def _sort_dict(inp):
|
||||
return dict(sorted(inp.items()))
|
||||
|
||||
api_corr_ids = {}
|
||||
for titr in [
|
||||
"hsa_api_traces",
|
||||
"marker_api_traces",
|
||||
"hip_api_traces",
|
||||
"rccl_api_traces",
|
||||
"ompt_traces",
|
||||
]:
|
||||
for itr in sdk_data.buffer_records[titr]:
|
||||
corr_id = itr.correlation_id.internal
|
||||
assert corr_id not in api_corr_ids.keys()
|
||||
api_corr_ids[corr_id] = itr
|
||||
|
||||
async_corr_ids = {}
|
||||
for titr in ["kernel_dispatch", "memory_copies"]:
|
||||
for itr in sdk_data.buffer_records[titr]:
|
||||
corr_id = itr.correlation_id.internal
|
||||
assert corr_id not in async_corr_ids.keys()
|
||||
async_corr_ids[corr_id] = itr
|
||||
|
||||
retired_corr_ids = {}
|
||||
for itr in sdk_data.buffer_records.retired_correlation_ids:
|
||||
corr_id = itr.internal_correlation_id
|
||||
assert corr_id not in retired_corr_ids.keys()
|
||||
retired_corr_ids[corr_id] = itr
|
||||
|
||||
api_corr_ids = _sort_dict(api_corr_ids)
|
||||
async_corr_ids = _sort_dict(async_corr_ids)
|
||||
retired_corr_ids = _sort_dict(retired_corr_ids)
|
||||
|
||||
for cid, itr in async_corr_ids.items():
|
||||
assert cid in retired_corr_ids.keys()
|
||||
retired_ts = retired_corr_ids[cid].timestamp
|
||||
end_ts = itr.end_timestamp
|
||||
assert (retired_ts - end_ts) > 0, f"correlation-id: {cid}, data: {itr}"
|
||||
|
||||
for cid, itr in api_corr_ids.items():
|
||||
assert cid in retired_corr_ids.keys()
|
||||
retired_ts = retired_corr_ids[cid].timestamp
|
||||
end_ts = itr.end_timestamp
|
||||
assert (retired_ts - end_ts) > 0, f"correlation-id: {cid}, data: {itr}"
|
||||
|
||||
assert len(api_corr_ids.keys()) == (len(retired_corr_ids.keys()))
|
||||
|
||||
|
||||
def test_external_correlation_ids(input_data):
|
||||
data = input_data
|
||||
sdk_data = data["rocprofiler-sdk-json-tool"]
|
||||
|
||||
extern_corr_ids = []
|
||||
for titr in [
|
||||
"hsa_api_traces",
|
||||
"marker_api_traces",
|
||||
"hip_api_traces",
|
||||
"rccl_api_traces",
|
||||
"ompt_traces",
|
||||
]:
|
||||
for itr in sdk_data.callback_records[titr]:
|
||||
assert itr.correlation_id.external > 0
|
||||
assert itr.thread_id == itr.correlation_id.external
|
||||
extern_corr_ids.append(itr.correlation_id.external)
|
||||
|
||||
extern_corr_ids = list(set(sorted(extern_corr_ids)))
|
||||
for titr in [
|
||||
"hsa_api_traces",
|
||||
"marker_api_traces",
|
||||
"hip_api_traces",
|
||||
"rccl_api_traces",
|
||||
"ompt_traces",
|
||||
]:
|
||||
for itr in sdk_data.buffer_records[titr]:
|
||||
assert itr.correlation_id.external > 0, f"[{titr}] {itr}"
|
||||
assert itr.thread_id == itr.correlation_id.external, f"[{titr}] {itr}"
|
||||
assert itr.thread_id in extern_corr_ids, f"[{titr}] {itr}"
|
||||
assert itr.correlation_id.external in extern_corr_ids, f"[{titr}] {itr}"
|
||||
|
||||
for titr in ["kernel_dispatch", "memory_copies"]:
|
||||
for itr in sdk_data.buffer_records[titr]:
|
||||
assert itr.correlation_id.external > 0, f"[{titr}] {itr}"
|
||||
assert itr.correlation_id.external in extern_corr_ids, f"[{titr}] {itr}"
|
||||
|
||||
|
||||
def test_kernel_ids(input_data):
|
||||
data = input_data
|
||||
sdk_data = data["rocprofiler-sdk-json-tool"]
|
||||
|
||||
symbol_info = {}
|
||||
for itr in sdk_data.callback_records.kernel_symbols:
|
||||
phase = itr.phase
|
||||
payload = itr.payload
|
||||
kern_id = payload.kernel_id
|
||||
|
||||
assert phase == 1 or phase == 2
|
||||
assert kern_id > 0
|
||||
if phase == 1:
|
||||
assert len(payload.kernel_name) > 0
|
||||
symbol_info[kern_id] = payload
|
||||
elif phase == 2:
|
||||
assert payload.kernel_id in symbol_info.keys()
|
||||
assert payload.kernel_name == symbol_info[kern_id].kernel_name
|
||||
|
||||
for itr in sdk_data.buffer_records.kernel_dispatch:
|
||||
assert itr.dispatch_info.kernel_id in symbol_info.keys()
|
||||
|
||||
for itr in sdk_data.callback_records.kernel_dispatch:
|
||||
assert itr.payload.dispatch_info.kernel_id in symbol_info.keys()
|
||||
|
||||
|
||||
def test_kernel_dispatch_ids(input_data):
|
||||
data = input_data
|
||||
sdk_data = data["rocprofiler-sdk-json-tool"]
|
||||
|
||||
num_dispatches = len(sdk_data.buffer_records.kernel_dispatch)
|
||||
num_cb_dispatches = len(sdk_data.callback_records.kernel_dispatch)
|
||||
|
||||
assert num_cb_dispatches == (3 * num_dispatches)
|
||||
|
||||
bf_seq_ids = []
|
||||
for itr in sdk_data.buffer_records.kernel_dispatch:
|
||||
bf_seq_ids.append(itr.dispatch_info.dispatch_id)
|
||||
|
||||
cb_seq_ids = []
|
||||
for itr in sdk_data.callback_records.kernel_dispatch:
|
||||
cb_seq_ids.append(itr.payload.dispatch_info.dispatch_id)
|
||||
|
||||
bf_seq_ids = sorted(bf_seq_ids)
|
||||
cb_seq_ids = sorted(cb_seq_ids)
|
||||
|
||||
assert (3 * len(bf_seq_ids)) == len(cb_seq_ids)
|
||||
|
||||
assert bf_seq_ids[0] == cb_seq_ids[0]
|
||||
assert bf_seq_ids[-1] == cb_seq_ids[-1]
|
||||
|
||||
def get_uniq(data):
|
||||
return list(set(data))
|
||||
|
||||
bf_seq_ids_uniq = get_uniq(bf_seq_ids)
|
||||
cb_seq_ids_uniq = get_uniq(cb_seq_ids)
|
||||
|
||||
assert bf_seq_ids == bf_seq_ids_uniq
|
||||
assert len(cb_seq_ids) == (3 * len(cb_seq_ids_uniq))
|
||||
assert len(bf_seq_ids) == num_dispatches
|
||||
assert len(bf_seq_ids_uniq) == num_dispatches
|
||||
assert len(cb_seq_ids_uniq) == num_dispatches
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
exit_code = pytest.main(["-x", __file__] + sys.argv[1:])
|
||||
sys.exit(exit_code)
|
||||
@@ -378,6 +378,23 @@ struct rccl_api_callback_record_t
|
||||
}
|
||||
};
|
||||
|
||||
struct ompt_callback_record_t
|
||||
{
|
||||
uint64_t timestamp = 0;
|
||||
rocprofiler_callback_tracing_record_t record = {};
|
||||
rocprofiler_callback_tracing_ompt_data_t payload = {};
|
||||
callback_arg_array_t args = {};
|
||||
|
||||
template <typename ArchiveT>
|
||||
void save(ArchiveT& ar) const
|
||||
{
|
||||
ar(cereal::make_nvp("timestamp", timestamp));
|
||||
cereal::save(ar, record);
|
||||
ar(cereal::make_nvp("payload", payload));
|
||||
serialize_args(ar, args);
|
||||
}
|
||||
};
|
||||
|
||||
struct kernel_dispatch_callback_record_t
|
||||
{
|
||||
uint64_t timestamp = 0;
|
||||
@@ -518,6 +535,7 @@ auto kernel_dispatch_cb_records = std::deque<kernel_dispatch_callback_record_
|
||||
auto memory_copy_cb_records = std::deque<memory_copy_callback_record_t>{};
|
||||
auto memory_allocation_cb_records = std::deque<memory_allocation_callback_record_t>{};
|
||||
auto rccl_api_cb_records = std::deque<rccl_api_callback_record_t>{};
|
||||
auto ompt_cb_records = std::deque<ompt_callback_record_t>{};
|
||||
|
||||
int
|
||||
set_external_correlation_id(rocprofiler_thread_id_t thr_id,
|
||||
@@ -753,6 +771,18 @@ tool_tracing_callback(rocprofiler_callback_tracing_record_t record,
|
||||
rccl_api_cb_records.emplace_back(
|
||||
rccl_api_callback_record_t{ts, record, *data, std::move(args)});
|
||||
}
|
||||
else if(record.kind == ROCPROFILER_CALLBACK_TRACING_OMPT)
|
||||
{
|
||||
auto* data = static_cast<rocprofiler_callback_tracing_ompt_data_t*>(record.payload);
|
||||
auto args = callback_arg_array_t{};
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
|
||||
rocprofiler_iterate_callback_tracing_kind_operation_args(
|
||||
record, save_args, record.phase, &args);
|
||||
|
||||
static auto _mutex = std::mutex{};
|
||||
auto _lk = std::unique_lock<std::mutex>{_mutex};
|
||||
ompt_cb_records.emplace_back(ompt_callback_record_t{ts, record, *data, std::move(args)});
|
||||
}
|
||||
else if(record.kind == ROCPROFILER_CALLBACK_TRACING_RUNTIME_INITIALIZATION)
|
||||
{
|
||||
auto* data = static_cast<rocprofiler_callback_tracing_runtime_initialization_data_t*>(
|
||||
@@ -787,6 +817,7 @@ auto page_migration_records = std::deque<rocprofiler_buffer_tracing_page_migrati
|
||||
auto corr_id_retire_records =
|
||||
std::deque<rocprofiler_buffer_tracing_correlation_id_retirement_record_t>{};
|
||||
auto rccl_api_bf_records = std::deque<rocprofiler_buffer_tracing_rccl_api_record_t>{};
|
||||
auto ompt_bf_records = std::deque<rocprofiler_buffer_tracing_ompt_record_t>{};
|
||||
|
||||
void
|
||||
tool_tracing_buffered(rocprofiler_context_id_t /*context*/,
|
||||
@@ -898,6 +929,13 @@ tool_tracing_buffered(rocprofiler_context_id_t /*context*/,
|
||||
|
||||
rccl_api_bf_records.emplace_back(*record);
|
||||
}
|
||||
else if(header->kind == ROCPROFILER_BUFFER_TRACING_OMPT)
|
||||
{
|
||||
auto* record =
|
||||
static_cast<rocprofiler_buffer_tracing_ompt_record_t*>(header->payload);
|
||||
|
||||
ompt_bf_records.emplace_back(*record);
|
||||
}
|
||||
else if(header->kind == ROCPROFILER_BUFFER_TRACING_RUNTIME_INITIALIZATION)
|
||||
{
|
||||
auto* record =
|
||||
@@ -986,6 +1024,7 @@ rocprofiler_context_id_t hip_api_callback_ctx = {0};
|
||||
rocprofiler_context_id_t marker_api_callback_ctx = {0};
|
||||
rocprofiler_context_id_t code_object_ctx = {0};
|
||||
rocprofiler_context_id_t rccl_api_callback_ctx = {0};
|
||||
rocprofiler_context_id_t ompt_callback_ctx = {0};
|
||||
rocprofiler_context_id_t hsa_api_buffered_ctx = {0};
|
||||
rocprofiler_context_id_t hip_api_buffered_ctx = {0};
|
||||
rocprofiler_context_id_t marker_api_buffered_ctx = {0};
|
||||
@@ -994,6 +1033,7 @@ rocprofiler_context_id_t memory_copy_buffered_ctx = {0};
|
||||
rocprofiler_context_id_t memory_allocation_callback_ctx = {0};
|
||||
rocprofiler_context_id_t memory_allocation_buffered_ctx = {0};
|
||||
rocprofiler_context_id_t rccl_api_buffered_ctx = {0};
|
||||
rocprofiler_context_id_t ompt_buffered_ctx = {0};
|
||||
rocprofiler_context_id_t counter_collection_ctx = {0};
|
||||
rocprofiler_context_id_t scratch_memory_ctx = {0};
|
||||
rocprofiler_context_id_t corr_id_retire_ctx = {0};
|
||||
@@ -1015,6 +1055,7 @@ rocprofiler_buffer_id_t counter_collection_buffer = {};
|
||||
rocprofiler_buffer_id_t scratch_memory_buffer = {};
|
||||
rocprofiler_buffer_id_t corr_id_retire_buffer = {};
|
||||
rocprofiler_buffer_id_t rccl_api_buffered_buffer = {};
|
||||
rocprofiler_buffer_id_t ompt_buffered_buffer = {};
|
||||
|
||||
auto contexts = std::unordered_map<std::string_view, rocprofiler_context_id_t*>{
|
||||
{"RUNTIME_INIT_CALLBACK", &runtime_init_callback_ctx},
|
||||
@@ -1026,6 +1067,7 @@ auto contexts = std::unordered_map<std::string_view, rocprofiler_context_id_t*>{
|
||||
{"MEMORY_COPY_CALLBACK", &memory_copy_callback_ctx},
|
||||
{"MEMORY_ALLOCATION_CALLBACK", &memory_allocation_callback_ctx},
|
||||
{"RCCL_API_CALLBACK", &rccl_api_callback_ctx},
|
||||
{"OMPT_CALLBACK", &ompt_callback_ctx},
|
||||
{"RUNTIME_INIT_BUFFERED", &runtime_init_buffered_ctx},
|
||||
{"HSA_API_BUFFERED", &hsa_api_buffered_ctx},
|
||||
{"HIP_API_BUFFERED", &hip_api_buffered_ctx},
|
||||
@@ -1038,9 +1080,10 @@ auto contexts = std::unordered_map<std::string_view, rocprofiler_context_id_t*>{
|
||||
{"SCRATCH_MEMORY", &scratch_memory_ctx},
|
||||
{"CORRELATION_ID_RETIREMENT", &corr_id_retire_ctx},
|
||||
{"RCCL_API_BUFFERED", &rccl_api_buffered_ctx},
|
||||
{"OMPT_BUFFERED", &ompt_buffered_ctx},
|
||||
};
|
||||
|
||||
auto buffers = std::array<rocprofiler_buffer_id_t*, 12>{&runtime_init_buffered_buffer,
|
||||
auto buffers = std::array<rocprofiler_buffer_id_t*, 13>{&runtime_init_buffered_buffer,
|
||||
&hsa_api_buffered_buffer,
|
||||
&hip_api_buffered_buffer,
|
||||
&marker_api_buffered_buffer,
|
||||
@@ -1051,7 +1094,8 @@ auto buffers = std::array<rocprofiler_buffer_id_t*, 12>{&runtime_init_buffered_b
|
||||
&page_migration_buffer,
|
||||
&counter_collection_buffer,
|
||||
&corr_id_retire_buffer,
|
||||
&rccl_api_buffered_buffer};
|
||||
&rccl_api_buffered_buffer,
|
||||
&ompt_buffered_buffer};
|
||||
|
||||
auto agents = std::vector<rocprofiler_agent_t>{};
|
||||
auto agents_map = std::unordered_map<rocprofiler_agent_id_t, rocprofiler_agent_t>{};
|
||||
@@ -1217,6 +1261,15 @@ tool_init(rocprofiler_client_finalize_t fini_func, void* tool_data)
|
||||
nullptr),
|
||||
"rccl api callback tracing service configure");
|
||||
|
||||
ROCPROFILER_CALL(
|
||||
rocprofiler_configure_callback_tracing_service(ompt_callback_ctx,
|
||||
ROCPROFILER_CALLBACK_TRACING_OMPT,
|
||||
nullptr,
|
||||
0,
|
||||
tool_tracing_callback,
|
||||
nullptr),
|
||||
"ompt callback tracing service configure");
|
||||
|
||||
constexpr auto buffer_size = 8192;
|
||||
constexpr auto watermark = 7936;
|
||||
|
||||
@@ -1328,6 +1381,15 @@ tool_init(rocprofiler_client_finalize_t fini_func, void* tool_data)
|
||||
&rccl_api_buffered_buffer),
|
||||
"buffer creation");
|
||||
|
||||
ROCPROFILER_CALL(rocprofiler_create_buffer(ompt_buffered_ctx,
|
||||
buffer_size,
|
||||
watermark,
|
||||
ROCPROFILER_BUFFER_POLICY_LOSSLESS,
|
||||
tool_tracing_buffered,
|
||||
tool_data,
|
||||
&ompt_buffered_buffer),
|
||||
"buffer creation");
|
||||
|
||||
ROCPROFILER_CALL(rocprofiler_configure_buffer_tracing_service(
|
||||
runtime_init_buffered_ctx,
|
||||
ROCPROFILER_BUFFER_TRACING_RUNTIME_INITIALIZATION,
|
||||
@@ -1443,6 +1505,11 @@ tool_init(rocprofiler_client_finalize_t fini_func, void* tool_data)
|
||||
rccl_api_buffered_buffer),
|
||||
"buffer tracing service for rccl api configure");
|
||||
|
||||
ROCPROFILER_CALL(
|
||||
rocprofiler_configure_buffer_tracing_service(
|
||||
ompt_buffered_ctx, ROCPROFILER_BUFFER_TRACING_OMPT, nullptr, 0, ompt_buffered_buffer),
|
||||
"buffer tracing service for ompt configure");
|
||||
|
||||
ROCPROFILER_CALL(
|
||||
rocprofiler_configure_buffered_dispatch_counting_service(
|
||||
counter_collection_ctx, counter_collection_buffer, dispatch_callback, nullptr),
|
||||
@@ -1592,6 +1659,7 @@ tool_fini(void* tool_data)
|
||||
<< ", memory_copy_callback_records=" << memory_copy_cb_records.size()
|
||||
<< ", memory_allocation_callback_records=" << memory_allocation_cb_records.size()
|
||||
<< ", rccl_api_callback_records=" << rccl_api_cb_records.size()
|
||||
<< ", ompt_callback_records=" << ompt_cb_records.size()
|
||||
<< ", kernel_dispatch_bf_records=" << kernel_dispatch_bf_records.size()
|
||||
<< ", memory_copy_bf_records=" << memory_copy_bf_records.size()
|
||||
<< ", memory_allocation_bf_records=" << memory_allocation_bf_records.size()
|
||||
@@ -1603,6 +1671,7 @@ tool_fini(void* tool_data)
|
||||
<< ", marker_api_bf_records=" << marker_api_bf_records.size()
|
||||
<< ", corr_id_retire_records=" << corr_id_retire_records.size()
|
||||
<< ", rccl_api_bf_records=" << rccl_api_bf_records.size()
|
||||
<< ", ompt_bf_records=" << ompt_bf_records.size()
|
||||
<< ", counter_collection_value_records=" << counter_collection_bf_records.size()
|
||||
<< "...\n"
|
||||
<< std::flush;
|
||||
@@ -1694,6 +1763,7 @@ write_json(call_stack_t* _call_stack)
|
||||
json_ar(cereal::make_nvp("hip_api_traces", hip_api_cb_records));
|
||||
json_ar(cereal::make_nvp("marker_api_traces", marker_api_cb_records));
|
||||
json_ar(cereal::make_nvp("rccl_api_traces", rccl_api_cb_records));
|
||||
json_ar(cereal::make_nvp("ompt_traces", ompt_cb_records));
|
||||
json_ar(cereal::make_nvp("scratch_memory_traces", scratch_memory_cb_records));
|
||||
json_ar(cereal::make_nvp("kernel_dispatch", kernel_dispatch_cb_records));
|
||||
json_ar(cereal::make_nvp("memory_copies", memory_copy_cb_records));
|
||||
@@ -1721,6 +1791,7 @@ write_json(call_stack_t* _call_stack)
|
||||
json_ar(cereal::make_nvp("hip_api_traces", hip_api_bf_records));
|
||||
json_ar(cereal::make_nvp("marker_api_traces", marker_api_bf_records));
|
||||
json_ar(cereal::make_nvp("rccl_api_traces", rccl_api_bf_records));
|
||||
json_ar(cereal::make_nvp("ompt_traces", ompt_bf_records));
|
||||
json_ar(cereal::make_nvp("retired_correlation_ids", corr_id_retire_records));
|
||||
json_ar(cereal::make_nvp("counter_collection", counter_collection_bf_records));
|
||||
} catch(std::exception& e)
|
||||
@@ -1791,6 +1862,8 @@ write_perfetto()
|
||||
tids.emplace(itr.thread_id);
|
||||
for(auto itr : rccl_api_bf_records)
|
||||
tids.emplace(itr.thread_id);
|
||||
for(auto itr : ompt_bf_records)
|
||||
tids.emplace(itr.thread_id);
|
||||
|
||||
for(auto itr : memory_copy_bf_records)
|
||||
{
|
||||
@@ -2014,6 +2087,45 @@ write_perfetto()
|
||||
itr.end_timestamp);
|
||||
}
|
||||
|
||||
for(auto itr : ompt_bf_records)
|
||||
{
|
||||
auto name = buffer_names.at(itr.kind, itr.operation);
|
||||
auto& track = thread_tracks.at(itr.thread_id);
|
||||
|
||||
auto _args = callback_arg_array_t{};
|
||||
auto ritr = std::find_if(
|
||||
ompt_cb_records.begin(), ompt_cb_records.end(), [&itr](const auto& citr) {
|
||||
return (citr.record.correlation_id.internal == itr.correlation_id.internal &&
|
||||
!citr.args.empty());
|
||||
});
|
||||
if(ritr != ompt_cb_records.end()) _args = ritr->args;
|
||||
|
||||
TRACE_EVENT_BEGIN(sdk::perfetto_category<sdk::category::openmp>::name,
|
||||
::perfetto::StaticString(name.data()),
|
||||
track,
|
||||
itr.start_timestamp,
|
||||
::perfetto::Flow::ProcessScoped(itr.correlation_id.internal),
|
||||
"begin_ns",
|
||||
itr.start_timestamp,
|
||||
"tid",
|
||||
itr.thread_id,
|
||||
"kind",
|
||||
itr.kind,
|
||||
"operation",
|
||||
itr.operation,
|
||||
"corr_id",
|
||||
itr.correlation_id.internal,
|
||||
[&](::perfetto::EventContext ctx) {
|
||||
for(const auto& aitr : _args)
|
||||
sdk::add_perfetto_annotation(ctx, aitr.first, aitr.second);
|
||||
});
|
||||
TRACE_EVENT_END(sdk::perfetto_category<sdk::category::openmp>::name,
|
||||
track,
|
||||
itr.end_timestamp,
|
||||
"end_ns",
|
||||
itr.end_timestamp);
|
||||
}
|
||||
|
||||
for(auto itr : memory_copy_bf_records)
|
||||
{
|
||||
auto name = buffer_names.at(itr.kind, itr.operation);
|
||||
|
||||
Reference in New Issue
Block a user