Sampling support + testing + omnitrace namespace (#19)

* omnitrace namespace

* Kokkos + Lulesh example/tests

* Sampling support + more

- OMNITRACE_BUILD_TESTING option
- sampling support
- pthread_gotcha
- fixes to labels for mpi_gotcha, fork_gotcha, omnitrace_component
- tasking::block_signals, tasking::unblock_signals
- instrumentation mode option in omnitrace exe
- argument option groups in omnitrace exe
- categories in omnitrace settings
- remove TIMEMORY_ prefixed options

* Release workflow updates

* Updated settings printing

* Fixed defaults in README

* Tweak setting defaults in README

* CMake fixes

* cmake-format

* clang-format

* LULESH_USE_MPI OFF

* LULESH_USE_MPI fix

* timemory add_secondary fix

* timemory ambiguous internal namespace fix

* Update timemory submodule

* Handle output path/prefix in omnitrace

- updated timemory
- updated test environment

* sampling + papi fix

* Fix to sampling without PAPI

* Fix for using too many processors in CI

* formatting

* Updated CI

- minor cmake tweaks
- updated timemory submodule

* Updated CI

* Updated CI

* CI + timemory updates

- data race fixes

* CI updates + debug for sampling

* Sampling updates

- moved tasking::{block,unblock}_signals to sampling namespace
- improvements to sampling w.r.t. thread-locality

* Minimum OMNITRACE_THREAD_COUNT of 128

* Handle multiple dims in sampler data

* Configure libunwind support for timemory

* Improved safeguards for sampling

- updated CI
- lulesh runtime-instrument test tweak

* formatting

* CI updates + sampler updates + misc

- fixed stack-buffer-overflow in omnitrace (get_*file_line_info)
- test labels
- steady_clock instead of system_clock in sampler
- update dyninst submodule with upgradePlaceholder fix
- disable OMNITRACE_BUILD_TESTING by default

* Updated timemory submodule

- hidden visibility for timemory
- storage finalizers do not capture this

* Update timemory submodule

- component visibility updates

* Reworked header includes

- use <...> for timemory headers
- always include <library/defines.hpp>

* Rename some config options

* Update PTL submodule

* Update kokkos submodule

* Updated sampling

* Updated CI

* Reworked instrumentation exe

- lowered min-address-range threshold to 256
- extended whole function exclude

* CI fix + timemory submodule update

- TIMEMORY_VISIBLE on component base
- RelWithDebugInfo -> RelWithDebInfo
- Info output for parallel-overhead

* Sampling flags + transpose update + CI update

- disable critical trace for parallel-overhead in CI
- SA_RESTART only in sampler
- reworked transpose example to use fewer threads

* CI update

- removed ubuntu-focal-external-debug
- reduced data artifacts upload

* CI timeouts

- updated timemory submodule
- minor tweaks to omnitrace exe logging

* LICENSE updates (partial)

* CI Test stage timeout extension

* Docker and Packaging updates

* Miscellaneous fixes/tweaks

- gpu.hpp / gpu.cpp
- disable roctracer component if no devices
- re-enable InstrStackFrames by default
- disable sampling by default
- pthread_gotcha::m_enable_sampling is false by default
- timemory submodule update w/ sampler and pop(tid) updates
- fix minor bug in sampler logic
- CMake: OMNITRACE_USE_HIP option
- roctracer + timemory fix

* Replaced OMNITRACE_USE_ROCTRACER with OMNITRACE_USE_HIP where appropriate

* cmake format

* Sampler deadlock fixes

* Removed debug messages from sampler

* Fix for MPI detection + test tweaks + misc

* Sampler deadlock fixes + misc

- removed papi_tot_ins
- pthread_gotcha blocks signals globally until sampler is setup
- metadata specialization for sampling components
- OMNITRACE_INSTRUMENTATION_MODE -> OMNITRACE_MODE
- default sampling delay increased to 0.05 from 1.0e-6
- removed {block,unblock}_signals from critical_trace and ptl
    - no longer necessary to use
- sampling delay minimum is 1.0e-3
- OMNITRACE_BUILD_HIDDEN_VISIBILITY

* omnitrace-avail + libunwind update + restructure

- restructured omnitrace components
- build custom omnitrace-avail executable
- updated libunwind to avoid malloc in get_unw_backtrace

* Fix remaining reorganization issues

- removed some duplicate code
- fixed some trait specializations after implicit instatiation
- formatting

* ensure_storage fix + avail improvements

- fix ensure_storage when component not avail
- suppress irrelevant info in omnitrace-avail

* Delay settings initialization

- slight tweak to tests w/ MPI

* Disable OpenMPI testing w/ ubuntu-bionic

- MPI testing is hanging bc of network interface issue on system:

> [[20462,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
>   Host: fv-az19-371
> Another transport will be used instead, although this may result in
> lower performance.
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.

[ROCm/rocprofiler-systems commit: 778af2a760]
Этот коммит содержится в:
Jonathan R. Madsen
2022-01-24 20:49:17 -06:00
коммит произвёл GitHub
родитель 9d5ebf9c3b
Коммит f17ff12a66
78 изменённых файлов: 14296 добавлений и 1071 удалений
+11
Просмотреть файл
@@ -18,6 +18,17 @@ parse:
kwargs:
VARIABLES: '*'
CONDITION: '*'
omnitrace_add_test:
kwargs:
NAME: '*'
TARGET: '*'
MPI: '*'
NUM_PROCS: '*'
REWRITE_ARGS: '*'
RUNTIME_ARGS: '*'
RUN_ARGS: '*'
ENVIRONMENT: '*'
LABELS: '*'
override_spec: {}
vartags: []
proptags: []
+189 -43
Просмотреть файл
@@ -24,9 +24,9 @@ jobs:
- name: Install Packages
run:
sudo apt-get update &&
sudo apt-get install -y build-essential python3-pip libtbb-dev libboost-{atomic,system,thread,date-time,filesystem,timer}-dev ${{ matrix.compiler }} ${{ matrix.mpi }} &&
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip libtbb-dev libboost-{atomic,system,thread,date-time,filesystem,timer}-dev ${{ matrix.compiler }} ${{ matrix.mpi }} &&
python3 -m pip install --upgrade pip &&
python3 -m pip install 'cmake==3.15.3'
python3 -m pip install 'cmake==3.16.3'
- name: Configure Env
run:
@@ -44,15 +44,17 @@ jobs:
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
-DOMNITRACE_USE_MPI=${USE_MPI}
-DOMNITRACE_USE_ROCTRACER=OFF
-DOMNITRACE_BUILD_TESTING=ON
-DOMNITRACE_BUILD_DYNINST=ON
-DOMNITRACE_USE_MPI=${USE_MPI}
-DOMNITRACE_USE_HIP=OFF
-DDYNINST_BUILD_ELFUTILS=ON
-DDYNINST_BUILD_LIBIBERTY=ON
-DDYNINST_BUILD_SHARED_LIBS=ON
-DDYNINST_BUILD_STATIC_LIBS=OFF
- name: Build
timeout-minutes: 45
run:
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
@@ -61,31 +63,40 @@ jobs:
cmake --build ${{ github.workspace }}/build --target install --parallel 2
- name: Test
timeout-minutes: 30
working-directory: ${{ github.workspace }}/build
run:
ctest -V --output-log ${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal.log
- name: Test Install
timeout-minutes: 10
run:
omnitrace --help &&
omnitrace -- sleep 1 &&
omnitrace -o sleep.inst -- sleep &&
./sleep.inst 1 &&
rm ./sleep.inst
omnitrace -e -v 1 -o ls.inst -- ls &&
./ls.inst &&
rm ./ls.inst &&
omnitrace -e -v 1 -- ls
- name: Artifacts
- name: CTest Artifacts
uses: actions/upload-artifact@v2
with:
name: ctest-log
path: |
${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal.log
${{ github.workspace }}/build/*.log
- name: Data Artifacts
uses: actions/upload-artifact@v2
with:
name: data-files
path: |
${{ github.workspace }}/build/omnitrace-tests-output/*.txt
ubuntu-bionic:
runs-on: ubuntu-18.04
strategy:
matrix:
compiler: ['g++-7', 'g++-8']
mpi: [ '', 'libmpich-dev mpich', 'libopenmpi-dev openmpi-bin libfabric-dev' ]
mpi: [ '', 'libmpich-dev mpich' ]
steps:
- uses: actions/checkout@v2
@@ -93,9 +104,9 @@ jobs:
- name: Install Packages
run:
sudo apt-get update &&
sudo apt-get install -y build-essential python3-pip ${{ matrix.compiler }} ${{ matrix.mpi }} &&
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip ${{ matrix.compiler }} ${{ matrix.mpi }} &&
python3 -m pip install --upgrade pip &&
python3 -m pip install 'cmake==3.15.3'
python3 -m pip install 'cmake==3.16.3'
- name: Configure Env
run:
@@ -113,15 +124,17 @@ jobs:
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
-DOMNITRACE_USE_MPI=${USE_MPI}
-DOMNITRACE_USE_ROCTRACER=OFF
-DOMNITRACE_BUILD_TESTING=ON
-DOMNITRACE_BUILD_DYNINST=ON
-DOMNITRACE_USE_MPI=${USE_MPI}
-DOMNITRACE_USE_HIP=OFF
-DDYNINST_BUILD_TBB=ON
-DDYNINST_BUILD_BOOST=ON
-DDYNINST_BUILD_ELFUTILS=ON
-DDYNINST_BUILD_LIBIBERTY=ON
- name: Build
timeout-minutes: 45
run:
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
@@ -130,24 +143,33 @@ jobs:
cmake --build ${{ github.workspace }}/build --target install --parallel 2
- name: Test
timeout-minutes: 30
working-directory: ${{ github.workspace }}/build
run:
ctest -V --output-log ${{ github.workspace }}/build/omnitrace-ctest-ubuntu-bionic.log
- name: Test Install
timeout-minutes: 10
run:
omnitrace --help &&
omnitrace -- sleep 1 &&
omnitrace -o sleep.inst -- sleep &&
./sleep.inst 1 &&
rm ./sleep.inst
omnitrace -e -v 1 -o ls.inst -- ls &&
./ls.inst &&
rm ./ls.inst &&
omnitrace -e -v 1 -- ls
- name: Artifacts
- name: CTest Artifacts
uses: actions/upload-artifact@v2
with:
name: ctest-log
path: |
${{ github.workspace }}/build/omnitrace-ctest-ubuntu-bionic.log
${{ github.workspace }}/build/*.log
- name: Data Artifacts
uses: actions/upload-artifact@v2
with:
name: data-files
path: |
${{ github.workspace }}/build/omnitrace-tests-output/*.txt
ubuntu-focal-external:
runs-on: ubuntu-20.04
@@ -161,15 +183,15 @@ jobs:
- name: Install Packages
run:
sudo apt-get update &&
sudo apt-get install -y build-essential python3-pip libboost-{atomic,system,thread,date-time,filesystem,timer}-dev libtbb-dev libiberty-dev ${{ matrix.compiler }} &&
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip libboost-{atomic,system,thread,date-time,filesystem,timer}-dev libtbb-dev libiberty-dev ${{ matrix.compiler }} &&
sudo python3 -m pip install --upgrade pip &&
python3 -m pip install 'cmake==3.15.3'
python3 -m pip install 'cmake==3.16.3'
- name: Configure Env
run:
echo "CC=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')" >> $GITHUB_ENV &&
echo "CXX=${{ matrix.compiler }}" >> $GITHUB_ENV &&
echo "CMAKE_PREFIX_PATH=/opt/opt/dyninst:/opt/elfutils:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
echo "CMAKE_PREFIX_PATH=/opt/dyninst:/opt/elfutils:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
echo "/opt/omnitrace/bin:/opt/dyninst/bin:/opt/elfutils/bin:${HOME}/.local/bin" >> $GITHUB_PATH &&
echo "LD_LIBRARY_PATH=/opt/omnitrace/lib:/opt/dyninst/lib:/opt/elfutils/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
@@ -193,7 +215,7 @@ jobs:
cmake -B build
-DCMAKE_C_COMPILER=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_INSTALL_PREFIX=/opt/dyninst &&
cmake --build build --target all --parallel 2 &&
cmake --build build --target install --parallel 2 &&
@@ -205,12 +227,14 @@ jobs:
cmake -B ${{ github.workspace }}/build
-DCMAKE_C_COMPILER=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
-DCMAKE_BUILD_TYPE=RelWithDebInfo
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
-DOMNITRACE_BUILD_TESTING=ON
-DOMNITRACE_USE_MPI=OFF
-DOMNITRACE_USE_ROCTRACER=OFF
-DOMNITRACE_USE_HIP=OFF
- name: Build
timeout-minutes: 45
run:
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
@@ -219,6 +243,7 @@ jobs:
cmake --build ${{ github.workspace }}/build --target install --parallel 2
- name: Test
timeout-minutes: 30
working-directory: ${{ github.workspace }}/build
run:
ldd ./omnitrace &&
@@ -226,20 +251,28 @@ jobs:
ctest -V --output-log ${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal-external.log
- name: Test Install
timeout-minutes: 10
run:
ldd $(which omnitrace) &&
omnitrace --help &&
omnitrace -- sleep 1 &&
omnitrace -o sleep.inst -- sleep &&
./sleep.inst 1 &&
rm ./sleep.inst
omnitrace -e -v 1 -o ls.inst -- ls &&
./ls.inst &&
rm ./ls.inst &&
omnitrace -e -v 1 -- ls
- name: Artifacts
- name: CTest Artifacts
uses: actions/upload-artifact@v2
with:
name: ctest-log
path: |
${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal-external.log
${{ github.workspace }}/build/*.log
- name: Data Artifacts
uses: actions/upload-artifact@v2
with:
name: data-files
path: |
${{ github.workspace }}/build/omnitrace-tests-output/*.txt
ubuntu-focal-dyninst-package:
runs-on: ubuntu-20.04
@@ -253,15 +286,15 @@ jobs:
- name: Install Packages
run:
sudo apt-get update &&
sudo apt-get install -y build-essential python3-pip ${{ matrix.compiler }} &&
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip ${{ matrix.compiler }} &&
sudo python3 -m pip install --upgrade pip &&
python3 -m pip install 'cmake==3.15.3'
python3 -m pip install 'cmake==3.16.3'
- name: Configure Env
run:
echo "CC=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')" >> $GITHUB_ENV &&
echo "CXX=${{ matrix.compiler }}" >> $GITHUB_ENV &&
echo "CMAKE_PREFIX_PATH=/opt/opt/dyninst:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
echo "CMAKE_PREFIX_PATH=/opt/dyninst:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
echo "/opt/omnitrace/bin:/opt/dyninst/bin:${HOME}/.local/bin" >> $GITHUB_PATH &&
echo "LD_LIBRARY_PATH=/opt/omnitrace/lib:/opt/dyninst/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
@@ -292,10 +325,12 @@ jobs:
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
-DOMNITRACE_BUILD_TESTING=ON
-DOMNITRACE_USE_MPI=OFF
-DOMNITRACE_USE_ROCTRACER=OFF
-DOMNITRACE_USE_HIP=OFF
- name: Build
timeout-minutes: 45
run:
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
@@ -304,6 +339,7 @@ jobs:
cmake --build ${{ github.workspace }}/build --target install --parallel 2
- name: Test
timeout-minutes: 30
working-directory: ${{ github.workspace }}/build
run:
ldd ./omnitrace &&
@@ -311,17 +347,127 @@ jobs:
ctest -V --output-log ${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal-dyninst-package.log
- name: Test Install
timeout-minutes: 10
run:
ldd $(which omnitrace) &&
omnitrace --help &&
omnitrace -- sleep 1 &&
omnitrace -o sleep.inst -- sleep &&
./sleep.inst 1 &&
rm ./sleep.inst
omnitrace -e -v 1 -o ls.inst -- ls &&
./ls.inst &&
rm ./ls.inst &&
omnitrace -e -v 1 -- ls
- name: Artifacts
- name: CTest Artifacts
uses: actions/upload-artifact@v2
with:
name: ctest-log
path: |
${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal-dyninst-package.log
${{ github.workspace }}/build/*.log
- name: Data Artifacts
uses: actions/upload-artifact@v2
with:
name: data-files
path: |
${{ github.workspace }}/build/omnitrace-tests-output/*.txt
ubuntu-focal-external-rocm:
runs-on: ubuntu-20.04
strategy:
matrix:
compiler: ['g++']
rocm_version: ['4.3', '4.3.1', '4.5']
mpi: [ 'libmpich-dev mpich', 'libopenmpi-dev openmpi-bin libfabric-dev' ]
steps:
- uses: actions/checkout@v2
- name: Install Packages
run:
echo '1' | sudo tee /proc/sys/kernel/perf_event_paranoid &&
sudo apt-get update &&
sudo apt-get install -y software-properties-common wget gnupg2 &&
sudo wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add - &&
echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/${{ matrix.rocm_version }}/ ubuntu main" | sudo tee /etc/apt/sources.list.d/rocm.list &&
sudo apt-get update &&
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip libboost-{atomic,system,thread,date-time,filesystem,timer}-dev libtbb-dev libiberty-dev ${{ matrix.compiler }} libnuma-dev rocm-dev rocm-utils roctracer-dev rocprofiler-dev hip-base hsa-amd-aqlprofile hsa-rocr-dev hsakmt-roct-dev ${{ matrix.mpi }} libpapi-dev &&
sudo python3 -m pip install --upgrade pip &&
python3 -m pip install 'cmake==3.16.3'
- name: Configure Env
run:
echo "CC=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')" >> $GITHUB_ENV &&
echo "CXX=${{ matrix.compiler }}" >> $GITHUB_ENV &&
echo "CMAKE_PREFIX_PATH=/opt/dyninst:/opt/elfutils:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
echo "/opt/omnitrace/bin:/opt/dyninst/bin:/opt/elfutils/bin:${HOME}/.local/bin" >> $GITHUB_PATH &&
echo "LD_LIBRARY_PATH=/opt/omnitrace/lib:/opt/dyninst/lib:/opt/elfutils/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
- name: Install ElfUtils
run:
pushd external &&
wget https://sourceware.org/elfutils/ftp/${ELFUTILS_DOWNLOAD_VERSION}/elfutils-${ELFUTILS_DOWNLOAD_VERSION}.tar.bz2 &&
tar xjf elfutils-${ELFUTILS_DOWNLOAD_VERSION}.tar.bz2 &&
pushd elfutils-${ELFUTILS_DOWNLOAD_VERSION} &&
CFLAGS="-O3" ./configure --enable-install-elfh --prefix=/opt/elfutils --disable-libdebuginfod --disable-debuginfod &&
make -j2 &&
make install -j2 &&
popd &&
rm -rf elfutils*
- name: Install Dyninst
run:
cmake --version &&
git submodule update --init external/dyninst &&
cd external/dyninst &&
cmake -B build
-DCMAKE_C_COMPILER=${CC}
-DCMAKE_CXX_COMPILER=${CXX}
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_INSTALL_PREFIX=/opt/dyninst &&
cmake --build build --target all --parallel 2 &&
cmake --build build --target install --parallel 2 &&
rm -rf build
- name: Configure CMake
run:
cmake --version &&
cmake -B ${{ github.workspace }}/build
-DCMAKE_C_COMPILER=${CC}
-DCMAKE_CXX_COMPILER=${CXX}
-DCMAKE_BUILD_TYPE=RelWithDebInfo
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
-DOMNITRACE_BUILD_TESTING=OFF
-DOMNITRACE_BUILD_DEVELOPER=ON
-DOMNITRACE_BUILD_EXTRA_OPTIMIZATIONS=OFF
-DOMNITRACE_BUILD_LTO=OFF
-DOMNITRACE_USE_MPI=OFF
-DOMNITRACE_USE_MPI_HEADERS=ON
-DOMNITRACE_USE_HIP=ON
-DOMNITRACE_MAX_THREADS=256
-DOMNITRACE_USE_SANITIZER=OFF
-DTIMEMORY_USE_PAPI=ON
- name: Build
timeout-minutes: 45
run:
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
- name: Install
run:
cmake --build ${{ github.workspace }}/build --target install --parallel 2
- name: Test
timeout-minutes: 30
working-directory: ${{ github.workspace }}/build
run:
ldd ./omnitrace &&
./omnitrace --help
- name: Test Install
timeout-minutes: 10
run:
ldd $(which omnitrace) &&
omnitrace --help &&
omnitrace -e -v 1 -o ls.inst -- ls &&
./ls.inst &&
rm ./ls.inst &&
omnitrace -e -v 1 -- ls
+3
Просмотреть файл
@@ -13,3 +13,6 @@
[submodule "external/PTL"]
path = external/PTL
url = https://github.com/jrmadsen/PTL.git
[submodule "external/kokkos"]
path = examples/lulesh/external/kokkos
url = https://github.com/kokkos/kokkos.git
+101 -52
Просмотреть файл
@@ -58,21 +58,37 @@ set(CMAKE_CXX_STANDARD
17
CACHE STRING "CXX language standard")
omnitrace_add_feature(CMAKE_CXX_STANDARD "CXX language standard")
omnitrace_add_feature(CMAKE_BUILD_TYPE "Build optimization level")
omnitrace_add_option(CMAKE_CXX_STANDARD_REQUIRED "Require C++ language standard" ON)
omnitrace_add_option(CMAKE_CXX_EXTENSIONS "Compiler specific language extensions" OFF)
omnitrace_add_option(CMAKE_INSTALL_RPATH_USE_LINK_PATH "Enable rpath to linked libraries"
ON)
omnitrace_add_option(OMNITRACE_USE_CLANG_TIDY "Enable clang-tidy" OFF)
omnitrace_add_option(OMNITRACE_USE_MPI "Enable MPI support" OFF)
omnitrace_add_option(OMNITRACE_CUSTOM_DATA_SOURCE "Enable custom data source" OFF)
omnitrace_add_option(OMNITRACE_USE_ROCTRACER "Enable roctracer support" ON)
omnitrace_add_option(OMNITRACE_BUILD_DYNINST "Build dyninst from submodule" OFF)
omnitrace_add_option(OMNITRACE_USE_HIP "Enable HIP support" ON)
omnitrace_add_option(OMNITRACE_USE_ROCTRACER "Enable roctracer support"
${OMNITRACE_USE_HIP})
omnitrace_add_option(OMNITRACE_USE_MPI_HEADERS
"Enable wrapping MPI functions w/o enabling MPI dependency" OFF)
omnitrace_add_option(OMNITRACE_BUILD_DYNINST "Build dyninst from submodule" OFF)
omnitrace_add_option(OMNITRACE_BUILD_TESTING "Enable building the testing suite" OFF)
omnitrace_add_option(OMNITRACE_CUSTOM_DATA_SOURCE "Enable custom data source" OFF)
omnitrace_add_option(OMNITRACE_BUILD_HIDDEN_VISIBILITY
"Build with hidden visibility (disable for Debug builds)" ON)
if(NOT OMNITRACE_USE_HIP)
set(OMNITRACE_USE_ROCTRACER
OFF
CACHE BOOL "Disabled via OMNITRACE_USE_HIP=OFF" FORCE)
endif()
include(ProcessorCount)
processorcount(OMNITRACE_PROCESSOR_COUNT)
math(EXPR OMNITRACE_THREAD_COUNT "8 * ${OMNITRACE_PROCESSOR_COUNT}")
math(EXPR OMNITRACE_THREAD_COUNT "16 * ${OMNITRACE_PROCESSOR_COUNT}")
if(OMNITRACE_THREAD_COUNT LESS 128)
set(OMNITRACE_THREAD_COUNT 128)
endif()
set(OMNITRACE_MAX_THREADS
"${OMNITRACE_THREAD_COUNT}"
CACHE
@@ -81,24 +97,25 @@ set(OMNITRACE_MAX_THREADS
)
omnitrace_add_feature(
OMNITRACE_MAX_THREADS
"Maximum number of total threads supported in the host application (default: 8 * nproc)"
"Maximum number of total threads supported in the host application (default: max of 128 or 16 * nproc)"
)
# ensure synced
set(TIMEMORY_USE_MPI
${OMNITRACE_USE_MPI}
CACHE BOOL "Enable MPI support" FORCE)
# default visibility settings
set(CMAKE_C_VISIBILITY_PRESET "default")
set(CMAKE_CXX_VISIBILITY_PRESET "default")
set(CMAKE_VISIBILITY_INLINES_HIDDEN OFF)
set(CMAKE_C_VISIBILITY_PRESET
"default"
CACHE STRING "Visibility preset for non-inline C functions")
set(CMAKE_CXX_VISIBILITY_PRESET
"default"
CACHE STRING "Visibility preset for non-inline C++ functions/objects")
set(CMAKE_VISIBILITY_INLINES_HIDDEN
OFF
CACHE BOOL "Visibility preset for inline functions")
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
include(Formatting) # format target
include(Packages) # finds third-party libraries
if(OMNITRACE_USE_ROCTRACER)
if(OMNITRACE_USE_HIP OR OMNITRACE_USE_ROCTRACER)
find_package(HIP QUIET)
if(HIP_VERSION_MAJOR GREATER_EQUAL 4 AND HIP_VERSION_MINOR GREATER 3)
set(roctracer_kfdwrapper_LIBRARY)
@@ -116,9 +133,11 @@ configure_file(${PROJECT_SOURCE_DIR}/include/library/defines.hpp.in
omnitrace_activate_clang_tidy()
# custom visibility settings
set(CMAKE_C_VISIBILITY_PRESET "hidden")
set(CMAKE_CXX_VISIBILITY_PRESET "hidden")
set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)
if(OMNITRACE_BUILD_HIDDEN_VISIBILITY)
set(CMAKE_C_VISIBILITY_PRESET "hidden")
set(CMAKE_CXX_VISIBILITY_PRESET "hidden")
set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)
endif()
if(OMNITRACE_BUILD_LTO)
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION ON)
@@ -134,13 +153,17 @@ set(library_sources
${CMAKE_CURRENT_LIST_DIR}/src/library.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/config.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/critical_trace.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/fork_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/omnitrace_component.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/mpi_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/gpu.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/perfetto.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/ptl.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/sampling.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/thread_data.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/timemory.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/components/backtrace.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/components/fork_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/components/mpi_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/components/omnitrace.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/components/pthread_gotcha.cpp
${perfetto_DIR}/sdk/perfetto.cc)
set(library_headers
@@ -150,49 +173,53 @@ set(library_headers
${CMAKE_CURRENT_LIST_DIR}/include/library/common.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/critical_trace.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/debug.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/fork_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/omnitrace_component.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/mpi_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/gpu.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/perfetto.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/ptl.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/sampling.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/state.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/thread_data.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/timemory.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/components/fwd.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/components/backtrace.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/components/fork_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/components/mpi_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/components/omnitrace.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/components/pthread_gotcha.hpp
${perfetto_DIR}/sdk/perfetto.h)
if(NOT TIMEMORY_USE_PERFETTO)
endif()
add_library(omnitrace-library SHARED ${library_sources} ${library_headers})
if(OMNITRACE_USE_ROCTRACER)
target_sources(
omnitrace-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/include/library/roctracer.hpp
${CMAKE_CURRENT_LIST_DIR}/src/library/roctracer.cpp
${CMAKE_CURRENT_LIST_DIR}/include/library/roctracer_callbacks.hpp
${CMAKE_CURRENT_LIST_DIR}/src/library/roctracer_callbacks.cpp)
PRIVATE
${CMAKE_CURRENT_LIST_DIR}/src/library/components/roctracer.cpp
${CMAKE_CURRENT_LIST_DIR}/src/library/components/roctracer_callbacks.cpp
${CMAKE_CURRENT_LIST_DIR}/include/library/components/roctracer.hpp
${CMAKE_CURRENT_LIST_DIR}/include/library/components/roctracer_callbacks.hpp)
endif()
target_include_directories(omnitrace-library SYSTEM PRIVATE ${perfetto_DIR}/sdk)
target_compile_definitions(
omnitrace-library
PRIVATE $<IF:$<BOOL:${OMNITRACE_CUSTOM_DATA_SOURCE}>,CUSTOM_DATA_SOURCE,>)
PRIVATE OMNITRACE_MAX_THREADS=${OMNITRACE_MAX_THREADS}
$<IF:$<BOOL:${OMNITRACE_CUSTOM_DATA_SOURCE}>,CUSTOM_DATA_SOURCE,>)
target_link_libraries(
omnitrace-library
PRIVATE omnitrace::omnitrace-headers
omnitrace::omnitrace-threading
omnitrace::omnitrace-compile-options
omnitrace::omnitrace-roctracer
omnitrace::omnitrace-mpi
omnitrace::omnitrace-ptl
$<BUILD_INTERFACE:timemory::timemory-headers>
$<BUILD_INTERFACE:timemory::timemory-gotcha>
$<BUILD_INTERFACE:timemory::timemory-cxx-shared>
$<IF:$<BOOL:${OMNITRACE_USE_SANITIZER}>,omnitrace::omnitrace-sanitizer,>)
PUBLIC $<BUILD_INTERFACE:omnitrace::omnitrace-headers>
$<BUILD_INTERFACE:omnitrace::omnitrace-threading>
$<BUILD_INTERFACE:omnitrace::omnitrace-compile-options>
$<BUILD_INTERFACE:omnitrace::omnitrace-hip>
$<BUILD_INTERFACE:omnitrace::omnitrace-roctracer>
$<BUILD_INTERFACE:omnitrace::omnitrace-mpi>
$<BUILD_INTERFACE:omnitrace::omnitrace-ptl>
$<BUILD_INTERFACE:timemory::timemory-headers>
$<BUILD_INTERFACE:timemory::timemory-gotcha>
$<BUILD_INTERFACE:timemory::timemory-cxx-shared>
$<IF:$<BOOL:${OMNITRACE_USE_SANITIZER}>,omnitrace::omnitrace-sanitizer,>)
if(OMNITRACE_DYNINST_API_RT)
get_filename_component(OMNITRACE_DYNINST_API_RT_DIR "${OMNITRACE_DYNINST_API_RT}"
@@ -200,14 +227,35 @@ if(OMNITRACE_DYNINST_API_RT)
endif()
set_target_properties(
omnitrace-library PROPERTIES OUTPUT_NAME omnitrace
INSTALL_RPATH "\$ORIGIN:\$ORIGIN/dyninst-tpls/libs")
omnitrace-library
PROPERTIES OUTPUT_NAME omnitrace
INSTALL_RPATH
"\$ORIGIN:\$ORIGIN/timemory/libunwind:\$ORIGIN/dyninst-tpls/libs")
install(
TARGETS omnitrace-library
DESTINATION ${CMAKE_INSTALL_LIBDIR}
OPTIONAL)
# ------------------------------------------------------------------------------#
#
# omnitrace-avail target
#
# ------------------------------------------------------------------------------#
add_executable(omnitrace-avail ${CMAKE_CURRENT_LIST_DIR}/src/avail.cpp
${CMAKE_CURRENT_LIST_DIR}/include/avail.hpp)
target_include_directories(omnitrace-avail PRIVATE ${CMAKE_CURRENT_LIST_DIR}/include)
target_compile_definitions(omnitrace-avail PRIVATE OMNITRACE_EXTERN_COMPONENTS=0)
target_link_libraries(omnitrace-avail PRIVATE omnitrace-library)
set_target_properties(omnitrace-avail PROPERTIES INSTALL_RPATH_USE_LINK_PATH ON)
install(
TARGETS omnitrace-avail
DESTINATION bin
OPTIONAL)
# ------------------------------------------------------------------------------#
#
# omnitrace-exe target
@@ -234,7 +282,7 @@ set_target_properties(
OUTPUT_NAME omnitrace
INSTALL_RPATH_USE_LINK_PATH ON
INSTALL_RPATH
"\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}:\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}/dyninst-tpls/lib"
"\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}:\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}/timemory/libunwind:\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}/dyninst-tpls/lib"
)
install(
@@ -242,9 +290,6 @@ install(
DESTINATION ${CMAKE_INSTALL_BINDIR}
OPTIONAL)
# build the timemory-avail exe
add_dependencies(omnitrace-exe timemory-avail)
# ------------------------------------------------------------------------------#
#
# miscellaneous installs
@@ -275,7 +320,9 @@ install(
#
# ------------------------------------------------------------------------------#
add_subdirectory(examples)
if(OMNITRACE_BUILD_TESTING)
add_subdirectory(examples)
endif()
# ------------------------------------------------------------------------------#
#
@@ -283,10 +330,12 @@ add_subdirectory(examples)
#
# ------------------------------------------------------------------------------#
include(CTest)
enable_testing()
if(OMNITRACE_BUILD_TESTING)
include(CTest)
enable_testing()
add_subdirectory(tests)
add_subdirectory(tests)
endif()
# ------------------------------------------------------------------------------#
#
+11 -17
Просмотреть файл
@@ -1,27 +1,21 @@
Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
MIT License
Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
with the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimers.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimers in the
documentation and/or other materials provided with the distribution.
* Neither the names of Advanced Micro Devices, Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this Software without specific prior written permission.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
THE SOFTWARE.
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+48 -14
Просмотреть файл
@@ -53,19 +53,53 @@ omnitrace <omnitrace-options> -- <exe-or-library> <exe-options>
## Omnitrace Library Environment Settings
| Environment Variable | Default Value | Description |
|-----------------------------|-------------------------------|----------------------------------------------------------------------------------|
| `OMNITRACE_DEBUG` | `false` | Enable debugging statements |
| `OMNITRACE_USE_PERFETTO` | `true` | Collect profiling data via perfetto |
| `OMNITRACE_USE_TIMEMORY` | `false` | Collection profiling data via timemory |
| `OMNITRACE_SAMPLE_RATE` | `1` | Invoke perfetto and/or timemory once every N function calls |
| `OMNITRACE_USE_MPI` | `true` | Label perfetto output files via rank instead of PID |
| `OMNITRACE_OUTPUT_FILE` | `perfetto-trace.%rank%.proto` | Output file for perfetto (may use `%pid`) |
| `OMNITRACE_BACKEND` | `"inprocess"` | Configure perfetto to use either "inprocess" data management, "system", or "all" |
| `OMNITRACE_COMPONENTS` | `"wall_clock"` | Timemory components to activate when enabled |
| `OMNITRACE_SHMEM_SIZE_HINT` | `40960` | Hint for perfetto shared memory buffer |
| `OMNITRACE_BUFFER_SIZE_KB` | `1024000` | Maximum amount of memory perfetto will use to collect data in-process |
| `TIMEMORY_TIME_OUTPUT` | `true` | Create unique output subdirectory with date and launch time |
| Environment Variable | Default Value | Description |
|--------------------------------------------|--------------------------|------------------------------------------------------------------------------------------------------------------|
| `OMNITRACE_USE_PERFETTO` | `false` | Enable perfetto backend |
| `OMNITRACE_USE_PID` | `true` | Enable tagging filenames with process identifier (either MPI rank or pid) |
| `OMNITRACE_USE_ROCTRACER` | `true` | Enable ROCM tracing |
| `OMNITRACE_USE_SAMPLING` | `true` | Enable statistical sampling of call-stack |
| `OMNITRACE_USE_TIMEMORY` | `false` | Enable timemory backend |
| `OMNITRACE_BACKEND` | `inprocess` | Specify the perfetto backend to activate. Options are: 'inprocess', 'system', or 'all' |
| `OMNITRACE_BUFFER_SIZE_KB` | `1024000` | Size of perfetto buffer (in KB) |
| `OMNITRACE_COUT_OUTPUT` | `false` | Write output to stdout |
| `OMNITRACE_CRITICAL_TRACE` | `false` | Enable generation of the critical trace |
| `OMNITRACE_CRITICAL_TRACE_BUFFER_COUNT` | `2000` | Number of critical trace records to store in thread-local memory before submitting to shared buffer |
| `OMNITRACE_CRITICAL_TRACE_COUNT` | `0` | Number of critical trace to export (0 == all) |
| `OMNITRACE_CRITICAL_TRACE_DEBUG` | `false` | Enable debugging for critical trace |
| `OMNITRACE_CRITICAL_TRACE_NUM_THREADS` | `8` | Number of threads to use when generating the critical trace |
| `OMNITRACE_CRITICAL_TRACE_PER_ROW` | `0` | How many critical traces per row in perfetto (0 == all in one row) |
| `OMNITRACE_CRITICAL_TRACE_SERIALIZE_NAMES` | `false` | Include names in serialization of critical trace (mainly for debugging) |
| `OMNITRACE_DIFF_OUTPUT` | `false` | Generate a difference output vs. a pre-existing output (see also: TIMEMORY_INPUT_PATH and TIMEMORY_INPUT_PREFIX) |
| `OMNITRACE_FLAT_SAMPLING` | `false` | Ignore hierarchy in all statistical sampling entries |
| `OMNITRACE_INSTRUMENTATION_INTERVAL` | `1` | Instrumentation only takes measurements once every N function calls (not statistical) |
| `OMNITRACE_JSON_OUTPUT` | `true` | Write json output files |
| `OMNITRACE_MEMORY_PRECISION` | `-1` | Set the precision for components with 'is_memory_category' type-trait |
| `OMNITRACE_MEMORY_SCIENTIFIC` | `false` | Set the numerical reporting format for components with 'is_memory_category' type-trait |
| `OMNITRACE_MEMORY_UNITS` | `""` | Set the units for components with 'uses_memory_units' type-trait |
| `OMNITRACE_OUTPUT_FILE` | `""` | Perfetto filename |
| `OMNITRACE_OUTPUT_PATH` | `omnitrace-{EXE}-output` | Explicitly specify the output folder for results |
| `OMNITRACE_OUTPUT_PREFIX` | `""` | Explicitly specify a prefix for all output files |
| `OMNITRACE_PRECISION` | `-1` | Set the global output precision for components |
| `OMNITRACE_ROCTRACER_FLAT_PROFILE` | `false` | Ignore hierarchy in all kernels entries with timemory backend |
| `OMNITRACE_ROCTRACER_HSA_ACTIVITY` | `false` | Enable HSA activity tracing support |
| `OMNITRACE_ROCTRACER_HSA_API` | `false` | Enable HSA API tracing support |
| `OMNITRACE_ROCTRACER_HSA_API_TYPES` | `""` | HSA API type to collect |
| `OMNITRACE_ROCTRACER_TIMELINE_PROFILE` | `false` | Create unique entries for every kernel with timemory backend |
| `OMNITRACE_SAMPLING_DELAY` | `1e-06` | Number of seconds to delay activating the statistical sampling |
| `OMNITRACE_SAMPLING_FREQ` | `10` | Number of software interrupts per second when OMNITTRACE_USE_SAMPLING=ON |
| `OMNITRACE_SCIENTIFIC` | `false` | Set the global numerical reporting to scientific format |
| `OMNITRACE_SETTINGS_DESC` | `false` | Provide descriptions when printing settings |
| `OMNITRACE_SHMEM_SIZE_HINT_KB` | `40960` | Hint for shared-memory buffer size in perfetto (in KB) |
| `OMNITRACE_TEXT_OUTPUT` | `true` | Write text output files |
| `OMNITRACE_TIMELINE_SAMPLING` | `false` | Create unique entries for every sample when statistical sampling is enabled |
| `OMNITRACE_TIMEMORY_COMPONENTS` | `wall_clock` | List of components to collect via timemory (see timemory-avail) |
| `OMNITRACE_TIME_FORMAT` | `%F_%I.%M_%p` | Customize the folder generation when TIMEMORY_TIME_OUTPUT is enabled (see also: strftime) |
| `OMNITRACE_TIME_OUTPUT` | `true` | Output data to subfolder w/ a timestamp (see also: TIMEMORY_TIME_FORMAT) |
| `OMNITRACE_TIMING_PRECISION` | `6` | Set the precision for components with 'is_timing_category' type-trait |
| `OMNITRACE_TIMING_SCIENTIFIC` | `false` | Set the numerical reporting format for components with 'is_timing_category' type-trait |
| `OMNITRACE_TIMING_UNITS` | `""` | Set the units for components with 'uses_timing_units' type-trait |
| `OMNITRACE_TREE_OUTPUT` | `true` | Write hierarchical json output files |
### Example Omnitrace Instrumentation
@@ -165,7 +199,7 @@ variable. The special character sequences `%pid%` and `%rank%` will be replaced
## Merging the traces from rocprof and omnitrace
> NOTE: Using `rocprof` externally is deprecated. The current version has built-in support for
> NOTE: Using `rocprof` externally for tracing is deprecated. The current version has built-in support for
> recording the GPU activity and HIP API calls. If you want to use an external rocprof, either
> configure CMake with `-DOMNITRACE_USE_ROCTRACER=OFF` or explicitly set `TIMEMORY_ROCTRACER_ENABLED=OFF` in the
> environment.
+5
Просмотреть файл
@@ -45,6 +45,11 @@ if(OMNITRACE_CLANG_FORMAT_EXE)
file(GLOB_RECURSE headers ${PROJECT_SOURCE_DIR}/include/*.hpp)
file(GLOB_RECURSE examples ${PROJECT_SOURCE_DIR}/examples/*.cpp
${PROJECT_SOURCE_DIR}/examples/*.hpp)
file(GLOB_RECURSE external ${PROJECT_SOURCE_DIR}/examples/lulesh/external/*.cpp
${PROJECT_SOURCE_DIR}/examples/lulesh/external/*.hpp)
if(external)
list(REMOVE_ITEM examples ${external})
endif()
add_custom_target(
format-omnitrace
${OMNITRACE_CLANG_FORMAT_EXE} -i ${sources} ${headers} ${examples}
+55 -6
Просмотреть файл
@@ -13,6 +13,7 @@ omnitrace_add_interface_library(omnitrace-threading "Enables multithreading supp
omnitrace_add_interface_library(
omnitrace-dyninst
"Provides flags and libraries for Dyninst (dynamic instrumentation)")
omnitrace_add_interface_library(omnitrace-hip "Provides flags and libraries for HIP")
omnitrace_add_interface_library(omnitrace-roctracer
"Provides flags and libraries for roctracer")
omnitrace_add_interface_library(omnitrace-mpi "Provides MPI or MPI headers")
@@ -24,6 +25,9 @@ target_include_directories(omnitrace-headers INTERFACE ${PROJECT_SOURCE_DIR}/inc
# include threading because of rooflines
target_link_libraries(omnitrace-headers INTERFACE omnitrace-threading)
# ensure the env overrides the appending /opt/rocm later
string(REPLACE ":" ";" CMAKE_PREFIX_PATH "$ENV{CMAKE_PREFIX_PATH};${CMAKE_PREFIX_PATH}")
# ----------------------------------------------------------------------------------------#
#
# Threading
@@ -47,6 +51,19 @@ if(pthread_LIBRARY AND NOT WIN32)
target_link_libraries(omnitrace-threading INTERFACE ${pthread_LIBRARY})
endif()
# ----------------------------------------------------------------------------------------#
#
# HIP
#
# ----------------------------------------------------------------------------------------#
if(OMNITRACE_USE_HIP)
list(APPEND CMAKE_PREFIX_PATH /opt/rocm)
find_package(hip ${omnitrace_FIND_QUIETLY} REQUIRED)
target_compile_definitions(omnitrace-hip INTERFACE OMNITRACE_USE_HIP)
target_link_libraries(omnitrace-hip INTERFACE hip::host)
endif()
# ----------------------------------------------------------------------------------------#
#
# roctracer
@@ -56,9 +73,9 @@ endif()
if(OMNITRACE_USE_ROCTRACER)
list(APPEND CMAKE_PREFIX_PATH /opt/rocm)
find_package(roctracer ${omnitrace_FIND_QUIETLY} REQUIRED)
find_package(hip ${omnitrace_FIND_QUIETLY} REQUIRED)
target_compile_definitions(omnitrace-roctracer INTERFACE OMNITRACE_USE_ROCTRACER)
target_link_libraries(omnitrace-roctracer INTERFACE hip::host roctracer::roctracer)
target_link_libraries(omnitrace-roctracer INTERFACE roctracer::roctracer
omnitrace::omnitrace-hip)
set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_RPATH}:${roctracer_LIBRARY_DIRS}")
endif()
@@ -297,45 +314,77 @@ set(TIMEMORY_BUILD_TOOLS
set(TIMEMORY_BUILD_EXCLUDE_FROM_ALL
ON
CACHE BOOL "Set timemory to only build dependencies")
set(TIMEMORY_BUILD_HIDDEN_VISIBILITY
ON
CACHE BOOL "Build timemory with hidden visibility")
set(TIMEMORY_QUIET_CONFIG
ON
CACHE BOOL "Make timemory configuration quieter")
# timemory feature settings
set(TIMEMORY_USE_MPI
${OMNITRACE_USE_MPI}
CACHE BOOL "Enable MPI support in timemory" FORCE)
set(TIMEMORY_USE_GOTCHA
ON
CACHE BOOL "Enable GOTCHA support in timemory")
set(TIMEMORY_USE_PERFETTO
OFF
CACHE BOOL "Disable perfetto support in timemory")
set(TIMEMORY_USE_LIBUNWIND
ON
CACHE BOOL "Enable libunwind support in timemory")
# timemory feature build settings
set(TIMEMORY_BUILD_GOTCHA
ON
CACHE BOOL "Enable building GOTCHA library from submodule")
set(TIMEMORY_BUILD_LIBUNWIND
ON
CACHE BOOL "Enable building libunwind library from submodule")
set(TIMEMORY_BUILD_EXTRA_OPTIMIZATIONS
${OMNITRACE_BUILD_EXTRA_OPTIMIZATIONS}
CACHE BOOL "Enable building GOTCHA library from submodule" FORCE)
# timemory build settings
set(TIMEMORY_TLS_MODEL
"global-dynamic"
CACHE STRING "Thread-local static model" FORCE)
set(TIMEMORY_SETTINGS_PREFIX
"OMNITRACE_"
CACHE STRING "Prefix used for settings and environment variables")
mark_as_advanced(TIMEMORY_SETTINGS_PREFIX)
omnitrace_checkout_git_submodule(
RELATIVE_PATH external/timemory
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
REPO_URL https://github.com/NERSC/timemory.git
REPO_BRANCH gpu-kernel-instrumentation)
omnitrace_save_variables(BUILD_CONFIG VARIABLES BUILD_SHARED_LIBS BUILD_STATIC_LIBS
CMAKE_POSITION_INDEPENDENT_CODE)
omnitrace_save_variables(
BUILD_CONFIG VARIABLES BUILD_SHARED_LIBS BUILD_STATIC_LIBS
CMAKE_POSITION_INDEPENDENT_CODE CMAKE_PREFIX_PATH)
# ensure timemory builds PIC static libs so that we don't have to install timemory shared
# lib
set(BUILD_SHARED_LIBS ON)
set(BUILD_STATIC_LIBS OFF)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(TIMEMORY_CTP_OPTIONS GLOBAL)
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
# results in undefined symbols to component::base<T>::load()
set(TIMEMORY_BUILD_HIDDEN_VISIBILITY
OFF
CACHE BOOL "" FORCE)
endif()
add_subdirectory(external/timemory)
omnitrace_restore_variables(BUILD_CONFIG VARIABLES BUILD_SHARED_LIBS BUILD_STATIC_LIBS
CMAKE_POSITION_INDEPENDENT_CODE)
omnitrace_restore_variables(
BUILD_CONFIG VARIABLES BUILD_SHARED_LIBS BUILD_STATIC_LIBS
CMAKE_POSITION_INDEPENDENT_CODE CMAKE_PREFIX_PATH)
# ----------------------------------------------------------------------------------------#
#
+3 -2
Просмотреть файл
@@ -13,12 +13,13 @@ WORKDIR /tmp
SHELL [ "/bin/bash", "-c" ]
ARG EXTRA_PACKAGES=""
ARG ROCM_REPO_VERSION="debian"
RUN apt-get update && \
apt-get dist-upgrade -y && \
apt-get install -y build-essential cmake libnuma-dev wget gnupg2 m4 bash-completion git-core && \
apt-get install -y build-essential cmake libnuma-dev wget gnupg2 m4 bash-completion git-core autoconf libtool autotools-dev && \
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - && \
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | tee /etc/apt/sources.list.d/rocm.list && \
echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/${ROCM_REPO_VERSION}/ ubuntu main" | tee /etc/apt/sources.list.d/rocm.list && \
apt-get update && \
apt-get dist-upgrade -y && \
apt-get install -y rocm-dev rocm-utils roctracer-dev rocprofiler-dev hip-base hsa-amd-aqlprofile hsa-rocr-dev hsakmt-roct-dev ${EXTRA_PACKAGES}
+22
Просмотреть файл
@@ -0,0 +1,22 @@
#!/usr/bin/env bash
if [ ! -f CMakeLists.txt ]; then
echo "Error! Execute script from source directory"
exit 1
fi
set -e
build-release()
{
CONTAINER=$1
ROCM_VERSION=$2
CODE_VERSION=$3
docker run -it --rm -v ${PWD}:/home/omnitrace --env ROCM_VERSION=${ROCM_VERSION} --env VERSION=${CODE_VERSION} ${CONTAINER} /home/omnitrace/scripts/build-release.sh
}
CODE_VERSION=$(cat VERSION)
build-release jrmadsen/omnitrace-base-rocm-4.5 4.5.0 ${CODE_VERSION}
build-release jrmadsen/omnitrace-base-rocm-4.3 4.3.0 ${CODE_VERSION}
build-release jrmadsen/omnitrace-base-rocm-4.3.1 4.3.1 ${CODE_VERSION}
+8
Просмотреть файл
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
: ${ROCM_VERSIONS:="4.5 4.3 4.3.1"}
for i in ${ROCM_VERSIONS}
do
docker build . --tag jrmadsen/omnitrace-base-rocm-${i} --build-arg ROCM_REPO_VERSION=${i}
done
+4
Просмотреть файл
@@ -7,3 +7,7 @@ set(CMAKE_CXX_VISIBILITY_PRESET "default")
add_subdirectory(transpose)
add_subdirectory(parallel-overhead)
option(BUILD_SHARED_LIBS "Build dynamic libraries" ON)
add_subdirectory(lulesh)
+60
Просмотреть файл
@@ -0,0 +1,60 @@
cmake_minimum_required(VERSION 3.15 FATAL_ERROR)
project(lulesh LANGUAGES C CXX)
list(INSERT CMAKE_MODULE_PATH 0 ${PROJECT_SOURCE_DIR}/cmake/Modules)
add_subdirectory(external)
set(CMAKE_CXX_EXTENSIONS OFF)
if("${CMAKE_BUILD_TYPE}" STREQUAL "")
set(CMAKE_BUILD_TYPE
"RelWithDebInfo"
CACHE STRING "CMake build type" FORCE)
endif()
if(DEFINED OMNITRACE_USE_MPI)
option(LULESH_USE_MPI "Enable MPI" ${OMNITRACE_USE_MPI})
else()
option(LULESH_USE_MPI "Enable MPI" OFF)
endif()
add_library(lulesh-mpi INTERFACE)
if(LULESH_USE_MPI)
find_package(MPI REQUIRED)
target_compile_definitions(lulesh-mpi INTERFACE USE_MPI=1)
target_link_libraries(lulesh-mpi INTERFACE MPI::MPI_C MPI::MPI_CXX)
else()
target_compile_definitions(lulesh-mpi INTERFACE USE_MPI=0)
endif()
if(NOT TARGET Kokkos::kokkos)
find_package(Kokkos REQUIRED)
endif()
file(GLOB headers ${PROJECT_SOURCE_DIR}/*.h ${PROJECT_SOURCE_DIR}/*.hxx)
file(GLOB sources ${PROJECT_SOURCE_DIR}/*.cc)
add_executable(${PROJECT_NAME} ${sources} ${headers})
target_include_directories(${PROJECT_NAME} PRIVATE ${PROJECT_SOURCE_DIR}/includes)
target_link_libraries(${PROJECT_NAME} PRIVATE Kokkos::kokkos lulesh-mpi)
if(NOT CMAKE_PROJECT_NAME STREQUAL PROJECT_NAME)
set_target_properties(${PROJECT_NAME} PROPERTIES RUNTIME_OUTPUT_DIRECTORY
${CMAKE_BINARY_DIR})
endif()
enable_testing()
if(LULESH_USE_MPI)
add_test(
NAME lulesh
COMMAND ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} 8
$<TARGET_FILE:${PROJECT_NAME}> -i 100 -s 20 -p
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
else()
add_test(
NAME lulesh
COMMAND $<TARGET_FILE:${PROJECT_NAME}> -i 100 -s 20 -p
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
endif()
+315
Просмотреть файл
@@ -0,0 +1,315 @@
# include guard
include_guard(DIRECTORY)
# MacroUtilities - useful macros and functions for generic tasks
#
include(CMakeDependentOption)
include(CMakeParseArguments)
# -----------------------------------------------------------------------
# function - capitalize - make a string capitalized (first letter is capital) usage:
# capitalize("SHARED" CShared) message(STATUS "-- CShared is \"${CShared}\"") $ -- CShared
# is "Shared"
function(CAPITALIZE str var)
# make string lower
string(TOLOWER "${str}" str)
string(SUBSTRING "${str}" 0 1 _first)
string(TOUPPER "${_first}" _first)
string(SUBSTRING "${str}" 1 -1 _remainder)
string(CONCAT str "${_first}" "${_remainder}")
set(${var}
"${str}"
PARENT_SCOPE)
endfunction()
# ----------------------------------------------------------------------------------------#
# macro CHECKOUT_GIT_SUBMODULE()
#
# Run "git submodule update" if a file in a submodule does not exist
#
# ARGS: RECURSIVE (option) -- add "--recursive" flag RELATIVE_PATH (one value) --
# typically the relative path to submodule from PROJECT_SOURCE_DIR WORKING_DIRECTORY (one
# value) -- (default: PROJECT_SOURCE_DIR) TEST_FILE (one value) -- file to check for
# (default: CMakeLists.txt) ADDITIONAL_CMDS (many value) -- any addition commands to pass
#
function(CHECKOUT_GIT_SUBMODULE)
# parse args
cmake_parse_arguments(
CHECKOUT "RECURSIVE"
"RELATIVE_PATH;WORKING_DIRECTORY;TEST_FILE;REPO_URL;REPO_BRANCH"
"ADDITIONAL_CMDS" ${ARGN})
if(NOT CHECKOUT_WORKING_DIRECTORY)
set(CHECKOUT_WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
endif()
if(NOT CHECKOUT_TEST_FILE)
set(CHECKOUT_TEST_FILE "Makefile")
endif()
# default assumption
if(NOT CHECKOUT_REPO_BRANCH)
set(CHECKOUT_REPO_BRANCH "master")
endif()
find_package(Git)
set(_DIR "${CHECKOUT_WORKING_DIRECTORY}/${CHECKOUT_RELATIVE_PATH}")
# ensure the (possibly empty) directory exists
if(NOT EXISTS "${_DIR}")
if(NOT CHECKOUT_REPO_URL)
message(FATAL_ERROR "submodule directory does not exist")
endif()
endif()
# if this file exists --> project has been checked out if not exists --> not been
# checked out
set(_TEST_FILE "${_DIR}/${CHECKOUT_TEST_FILE}")
# assuming a .gitmodules file exists
set(_SUBMODULE "${PROJECT_SOURCE_DIR}/.gitmodules")
set(_TEST_FILE_EXISTS OFF)
if(EXISTS "${_TEST_FILE}" AND NOT IS_DIRECTORY "${_TEST_FILE}")
set(_TEST_FILE_EXISTS ON)
endif()
if(_TEST_FILE_EXISTS)
return()
endif()
find_package(Git REQUIRED)
set(_SUBMODULE_EXISTS OFF)
if(EXISTS "${_SUBMODULE}" AND NOT IS_DIRECTORY "${_SUBMODULE}")
set(_SUBMODULE_EXISTS ON)
endif()
set(_HAS_REPO_URL OFF)
if(NOT "${CHECKOUT_REPO_URL}" STREQUAL "")
set(_HAS_REPO_URL ON)
endif()
# if the module has not been checked out
if(NOT _TEST_FILE_EXISTS AND _SUBMODULE_EXISTS)
# perform the checkout
execute_process(
COMMAND ${GIT_EXECUTABLE} submodule update --init ${_RECURSE}
${CHECKOUT_ADDITIONAL_CMDS} ${CHECKOUT_RELATIVE_PATH}
WORKING_DIRECTORY ${CHECKOUT_WORKING_DIRECTORY}
RESULT_VARIABLE RET)
# check the return code
if(RET GREATER 0)
set(_CMD "${GIT_EXECUTABLE} submodule update --init ${_RECURSE}
${CHECKOUT_ADDITIONAL_CMDS} ${CHECKOUT_RELATIVE_PATH}")
message(STATUS "function(CHECKOUT_GIT_SUBMODULE) failed.")
message(FATAL_ERROR "Command: \"${_CMD}\"")
else()
set(_TEST_FILE_EXISTS ON)
endif()
endif()
if(NOT _TEST_FILE_EXISTS AND _HAS_REPO_URL)
message(
STATUS "Checking out '${CHECKOUT_REPO_URL}' @ '${CHECKOUT_REPO_BRANCH}'...")
# remove the existing directory
if(EXISTS "${_DIR}")
execute_process(COMMAND ${CMAKE_COMMAND} -E remove_directory ${_DIR})
endif()
# perform the checkout
execute_process(
COMMAND
${GIT_EXECUTABLE} clone -b ${CHECKOUT_REPO_BRANCH}
${CHECKOUT_ADDITIONAL_CMDS} ${CHECKOUT_REPO_URL} ${CHECKOUT_RELATIVE_PATH}
WORKING_DIRECTORY ${CHECKOUT_WORKING_DIRECTORY}
RESULT_VARIABLE RET)
# perform the submodule update
if(CHECKOUT_RECURSIVE
AND EXISTS "${_DIR}"
AND IS_DIRECTORY "${_DIR}")
execute_process(
COMMAND ${GIT_EXECUTABLE} submodule update --init ${_RECURSE}
WORKING_DIRECTORY ${_DIR}
RESULT_VARIABLE RET)
endif()
# check the return code
if(RET GREATER 0)
set(_CMD
"${GIT_EXECUTABLE} clone -b ${CHECKOUT_REPO_BRANCH}
${CHECKOUT_ADDITIONAL_CMDS} ${CHECKOUT_REPO_URL} ${CHECKOUT_RELATIVE_PATH}"
)
message(STATUS "function(CHECKOUT_GIT_SUBMODULE) failed.")
message(FATAL_ERROR "Command: \"${_CMD}\"")
else()
set(_TEST_FILE_EXISTS ON)
endif()
endif()
if(NOT EXISTS "${_TEST_FILE}" OR NOT _TEST_FILE_EXISTS)
message(
FATAL_ERROR
"Error checking out submodule: '${CHECKOUT_RELATIVE_PATH}' to '${_DIR}'")
endif()
endfunction()
# ----------------------------------------------------------------------------------------#
# require variable
#
function(CHECK_REQUIRED VAR)
if(NOT DEFINED ${VAR} OR "${${VAR}}" STREQUAL "")
message(FATAL_ERROR "Variable '${VAR}' must be defined and not empty")
endif()
endfunction()
# -----------------------------------------------------------------------
# function add_feature(<NAME> <DOCSTRING>) Add a project feature, whose activation is
# specified by the existence of the variable <NAME>, to the list of enabled/disabled
# features, plus a docstring describing the feature
#
function(ADD_FEATURE _var _description)
set(EXTRA_DESC "")
foreach(currentArg ${ARGN})
if(NOT "${currentArg}" STREQUAL "${_var}" AND NOT "${currentArg}" STREQUAL
"${_description}")
set(EXTRA_DESC "${EXTA_DESC}${currentArg}")
endif()
endforeach()
set_property(GLOBAL APPEND PROPERTY ${PROJECT_NAME}_FEATURES ${_var})
set_property(GLOBAL PROPERTY ${_var}_DESCRIPTION "${_description}${EXTRA_DESC}")
if("CMAKE_DEFINE" IN_LIST ARGN)
set_property(GLOBAL APPEND PROPERTY ${PROJECT_NAME}_CMAKE_DEFINES
"${_var} @${_var}@")
endif()
endfunction()
# ----------------------------------------------------------------------------------------#
# function add_option(<OPTION_NAME> <DOCSRING> <DEFAULT_SETTING> [NO_FEATURE]) Add an
# option and add as a feature if NO_FEATURE is not provided
#
function(ADD_OPTION _NAME _MESSAGE _DEFAULT)
option(${_NAME} "${_MESSAGE}" ${_DEFAULT})
if("NO_FEATURE" IN_LIST ARGN)
mark_as_advanced(${_NAME})
else()
add_feature(${_NAME} "${_MESSAGE}")
endif()
if("ADVANCED" IN_LIST ARGN)
mark_as_advanced(${_NAME})
endif()
endfunction()
# ----------------------------------------------------------------------------------------#
# function print_enabled_features() Print enabled features plus their docstrings.
#
function(PRINT_ENABLED_FEATURES)
set(_basemsg "The following features are defined/enabled (+):")
set(_currentFeatureText "${_basemsg}")
get_property(_features GLOBAL PROPERTY ${PROJECT_NAME}_FEATURES)
if(NOT "${_features}" STREQUAL "")
list(REMOVE_DUPLICATES _features)
list(SORT _features)
endif()
foreach(_feature ${_features})
if(${_feature})
# add feature to text
set(_currentFeatureText "${_currentFeatureText}\n ${_feature}")
# get description
get_property(_desc GLOBAL PROPERTY ${_feature}_DESCRIPTION)
# print description, if not standard ON/OFF, print what is set to
if(_desc)
if(NOT "${${_feature}}" STREQUAL "ON" AND NOT "${${_feature}}" STREQUAL
"TRUE")
set(_currentFeatureText
"${_currentFeatureText}: ${_desc} -- [\"${${_feature}}\"]")
else()
string(REGEX REPLACE "^${PROJECT_NAME}_USE_" "" _feature_tmp
"${_feature}")
string(TOLOWER "${_feature_tmp}" _feature_tmp_l)
capitalize("${_feature_tmp}" _feature_tmp_c)
foreach(_var _feature _feature_tmp _feature_tmp_l _feature_tmp_c)
set(_ver "${${${_var}}_VERSION}")
if(NOT "${_ver}" STREQUAL "")
set(_desc "${_desc} -- [found version ${_ver}]")
break()
endif()
unset(_ver)
endforeach()
set(_currentFeatureText "${_currentFeatureText}: ${_desc}")
endif()
set(_desc NOTFOUND)
endif()
endif()
endforeach()
if(NOT "${_currentFeatureText}" STREQUAL "${_basemsg}")
message(STATUS "${_currentFeatureText}\n")
endif()
endfunction()
# ----------------------------------------------------------------------------------------#
# function print_disabled_features() Print disabled features plus their docstrings.
#
function(PRINT_DISABLED_FEATURES)
set(_basemsg "The following features are NOT defined/enabled (-):")
set(_currentFeatureText "${_basemsg}")
get_property(_features GLOBAL PROPERTY ${PROJECT_NAME}_FEATURES)
if(NOT "${_features}" STREQUAL "")
list(REMOVE_DUPLICATES _features)
list(SORT _features)
endif()
foreach(_feature ${_features})
if(NOT ${_feature})
set(_currentFeatureText "${_currentFeatureText}\n ${_feature}")
get_property(_desc GLOBAL PROPERTY ${_feature}_DESCRIPTION)
if(_desc)
set(_currentFeatureText "${_currentFeatureText}: ${_desc}")
set(_desc NOTFOUND)
endif(_desc)
endif()
endforeach(_feature)
if(NOT "${_currentFeatureText}" STREQUAL "${_basemsg}")
message(STATUS "${_currentFeatureText}\n")
endif()
endfunction()
# ----------------------------------------------------------------------------------------#
# function print_features() Print all features plus their docstrings.
#
function(PRINT_FEATURES)
message(STATUS "")
print_enabled_features()
print_disabled_features()
endfunction()
# ----------------------------------------------------------------------------------------#
# macro ADD_SUBPROJECT() Does a git submodule update + add_subdirectory
#
macro(ADD_SUBPROJECT PACKAGE_NAME)
# parse args
cmake_parse_arguments(PACKAGE "SUBMODULE" "DIRECTORY" "" ${ARGN})
if(NOT PACKAGE_DIRECTORY)
set(PACKAGE_DIRECTORY ${PACKAGE_NAME})
endif()
# if specified in options
if("${PACKAGE_NAME}" IN_LIST PROJECTS)
if(PACKAGE_SUBMODULE)
checkout_git_submodule(RECURSIVE RELATIVE_PATH ${PACKAGE_DIRECTORY})
endif()
if(NOT EXISTS "${PROJECT_SOURCE_DIR}/${PACKAGE_DIRECTORY}/CMakeLists.txt")
message(
STATUS
"Warning! '${PROJECT_SOURCE_DIR}/${PACKAGE_DIRECTORY}/CMakeLists.txt' does not exist!"
)
else()
add_subdirectory(${PACKAGE_DIRECTORY})
endif()
endif()
endmacro()
+28
Просмотреть файл
@@ -0,0 +1,28 @@
set(Kokkos_ENABLE_SERIAL
ON
CACHE BOOL "Enable Serial")
set(Kokkos_ENABLE_OPENMP
ON
CACHE BOOL "Enable OpenMP")
if(USE_CUDA)
set(Kokkos_ENABLE_CUDA
ON
CACHE BOOL "Enable CUDA")
set(Kokkos_ENABLE_CUDA_UVM
ON
CACHE BOOL "Enable CUDA UVM")
set(Kokkos_ENABLE_CUDA_LAMBDA
ON
CACHE BOOL "Enable CUDA UVM")
set(Kokkos_ENABLE_CUDA_CONSTEXPR
ON
CACHE BOOL "Enable CUDA UVM")
endif()
checkout_git_submodule(
RELATIVE_PATH external/kokkos WORKING_DIRECTORY ${PROJECT_SOURCE_DIR} REPO_URL
https://github.com/kokkos/kokkos.git REPO_BRANCH develop)
set(CMAKE_SKIP_INSTALL_ALL_DEPENDENCY ON)
add_subdirectory(kokkos)
поставляемый Подмодуль
+1
Submodule projects/rocprofiler-systems/examples/lulesh/external/kokkos added at 56468253ef
+127
Просмотреть файл
@@ -0,0 +1,127 @@
/*!
******************************************************************************
*
* \file
*
* \brief RAJA header file for simple class that can be used to
* time code sections.
*
* \author Rich Hornung, Center for Applied Scientific Computing, LLNL
* \author Jeff Keasler, Applications, Simulations And Quality, LLNL
*
******************************************************************************
*/
#ifndef RAJA_Timer_HXX
#define RAJA_Timer_HXX
#if defined(RAJA_USE_CYCLE)
# include "./cycle.h"
typedef ticks TimeType;
#elif defined(RAJA_USE_CLOCK)
# include <time.h>
typedef clock_t TimeType;
#elif defined(RAJA_USE_GETTIME)
# include <time.h>
typedef timespec TimeType;
#else
# error RAJA_TIMER_TYPE is undefined!
#endif
namespace RAJA
{
/*!
******************************************************************************
*
* \brief Simple timer class to time code sections.
*
******************************************************************************
*/
class Timer
{
public:
#if defined(RAJA_USE_CYCLE) || defined(RAJA_USE_CLOCK)
Timer()
: telapsed(0)
{
;
}
#endif
#if defined(RAJA_USE_GETTIME)
Timer()
: telapsed(0)
, stime_elapsed(0)
, nstime_elapsed(0)
{
;
}
#endif
#if defined(RAJA_USE_CYCLE)
void start() { tstart = getticks(); }
void stop()
{
tstop = getticks();
set_elapsed();
}
long double elapsed() { return static_cast<long double>(telapsed); }
#endif
#if defined(RAJA_USE_CLOCK)
void start() { tstart = clock(); }
void stop()
{
tstop = clock();
set_elapsed();
}
long double elapsed() { return static_cast<long double>(telapsed) / CLOCKS_PER_SEC; }
#endif
#if defined(RAJA_USE_GETTIME)
# if 0
void start() { clock_gettime(CLOCK_REALTIME, &tstart); }
void stop() { clock_gettime(CLOCK_REALTIME, &tstop); set_elapsed(); }
# else
void start() { clock_gettime(CLOCK_MONOTONIC, &tstart); }
void stop()
{
clock_gettime(CLOCK_MONOTONIC, &tstop);
set_elapsed();
}
# endif
long double elapsed() { return (stime_elapsed + nstime_elapsed); }
#endif
private:
TimeType tstart;
TimeType tstop;
long double telapsed;
#if defined(RAJA_USE_CYCLE) || defined(RAJA_USE_CLOCK)
void set_elapsed() { telapsed += (tstop - tstart); }
#elif defined(RAJA_USE_GETTIME)
long double stime_elapsed;
long double nstime_elapsed;
void set_elapsed()
{
stime_elapsed += static_cast<long double>(tstop.tv_sec - tstart.tv_sec);
nstime_elapsed +=
static_cast<long double>(tstop.tv_nsec - tstart.tv_nsec) / 1000000000.0;
}
#endif
};
} // namespace RAJA
#endif // closing endif for header file include guard
+545
Просмотреть файл
@@ -0,0 +1,545 @@
/*
* Copyright (c) 2003, 2007-8 Matteo Frigo
* Copyright (c) 2003, 2007-8 Massachusetts Institute of Technology
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
* LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
* OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
* WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
*/
/* machine-dependent cycle counters code. Needs to be inlined. */
/***************************************************************************/
/* To use the cycle counters in your code, simply #include "cycle.h" (this
file), and then use the functions/macros:
ticks getticks(void);
ticks is an opaque typedef defined below, representing the current time.
You extract the elapsed time between two calls to gettick() via:
double elapsed(ticks t1, ticks t0);
which returns a double-precision variable in arbitrary units. You
are not expected to convert this into human units like seconds; it
is intended only for *comparisons* of time intervals.
(In order to use some of the OS-dependent timer routines like
Solaris' gethrtime, you need to paste the autoconf snippet below
into your configure.ac file and #include "config.h" before cycle.h,
or define the relevant macros manually if you are not using autoconf.)
*/
/***************************************************************************/
/* This file uses macros like HAVE_GETHRTIME that are assumed to be
defined according to whether the corresponding function/type/header
is available on your system. The necessary macros are most
conveniently defined if you are using GNU autoconf, via the tests:
dnl ---------------------------------------------------------------------
AC_C_INLINE
AC_HEADER_TIME
AC_CHECK_HEADERS([sys/time.h c_asm.h intrinsics.h mach/mach_time.h])
AC_CHECK_TYPE([hrtime_t],[AC_DEFINE(HAVE_HRTIME_T, 1, [Define to 1 if hrtime_t is
defined in <sys/time.h>])],,[#if HAVE_SYS_TIME_H #include <sys/time.h> #endif])
AC_CHECK_FUNCS([gethrtime read_real_time time_base_to_time clock_gettime
mach_absolute_time])
dnl Cray UNICOS _rtc() (real-time clock) intrinsic
AC_MSG_CHECKING([for _rtc intrinsic])
rtc_ok=yes
AC_TRY_LINK([#ifdef HAVE_INTRINSICS_H
#include <intrinsics.h>
#endif], [_rtc()], [AC_DEFINE(HAVE__RTC,1,[Define if you have the UNICOS _rtc()
intrinsic.])], [rtc_ok=no]) AC_MSG_RESULT($rtc_ok)
dnl ---------------------------------------------------------------------
*/
/***************************************************************************/
#if TIME_WITH_SYS_TIME
# include <sys/time.h>
# include <time.h>
#else
# if HAVE_SYS_TIME_H
# include <sys/time.h>
# else
# include <time.h>
# endif
#endif
#define INLINE_ELAPSED(INL) \
static INL double elapsed(ticks t1, ticks t0) { return (double) t1 - (double) t0; }
/*----------------------------------------------------------------*/
/* Solaris */
#if defined(HAVE_GETHRTIME) && defined(HAVE_HRTIME_T) && !defined(HAVE_TICK_COUNTER)
typedef hrtime_t ticks;
# define getticks gethrtime
INLINE_ELAPSED(inline)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
/* AIX v. 4+ routines to read the real-time clock or time-base register */
#if defined(HAVE_READ_REAL_TIME) && defined(HAVE_TIME_BASE_TO_TIME) && \
!defined(HAVE_TICK_COUNTER)
typedef timebasestruct_t ticks;
static __inline ticks
getticks(void)
{
ticks t;
read_real_time(&t, TIMEBASE_SZ);
return t;
}
static __inline double
elapsed(ticks t1, ticks t0) /* time in nanoseconds */
{
time_base_to_time(&t1, TIMEBASE_SZ);
time_base_to_time(&t0, TIMEBASE_SZ);
return (((double) t1.tb_high - (double) t0.tb_high) * 1.0e9 +
((double) t1.tb_low - (double) t0.tb_low));
}
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
/*
* PowerPC ``cycle'' counter using the time base register.
*/
#if((((defined(__GNUC__) && (defined(__powerpc__) || defined(__ppc__))) || \
(defined(__MWERKS__) && defined(macintosh)))) || \
(defined(__IBM_GCC_ASM) && (defined(__powerpc__) || defined(__ppc__)))) && \
!defined(HAVE_TICK_COUNTER)
typedef unsigned long long ticks;
static __inline__ ticks
getticks(void)
{
unsigned int tbl, tbu0, tbu1;
do
{
__asm__ __volatile__("mftbu %0" : "=r"(tbu0));
__asm__ __volatile__("mftb %0" : "=r"(tbl));
__asm__ __volatile__("mftbu %0" : "=r"(tbu1));
} while(tbu0 != tbu1);
return (((unsigned long long) tbu0) << 32) | tbl;
}
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
#endif
/* MacOS/Mach (Darwin) time-base register interface (unlike UpTime,
from Carbon, requires no additional libraries to be linked). */
#if defined(HAVE_MACH_ABSOLUTE_TIME) && defined(HAVE_MACH_MACH_TIME_H) && \
!defined(HAVE_TICK_COUNTER)
# include <mach/mach_time.h>
typedef uint64_t ticks;
# define getticks mach_absolute_time
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
/*
* Pentium cycle counter
*/
#if(defined(__GNUC__) || defined(__ICC)) && defined(__i386__) && \
!defined(HAVE_TICK_COUNTER)
typedef unsigned long long ticks;
static __inline__ ticks
getticks(void)
{
ticks ret;
__asm__ __volatile__("rdtsc" : "=A"(ret));
/* no input, nothing else clobbered */
return ret;
}
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
# define TIME_MIN 5000.0 /* unreliable pentium IV cycle counter */
#endif
/* Visual C++ -- thanks to Morten Nissov for his help with this */
#if _MSC_VER >= 1200 && _M_IX86 >= 500 && !defined(HAVE_TICK_COUNTER)
# include <windows.h>
typedef LARGE_INTEGER ticks;
# define RDTSC __asm __emit 0fh __asm __emit 031h /* hack for VC++ 5.0 */
static __inline ticks
getticks(void)
{
ticks retval;
__asm {
RDTSC
mov retval.HighPart, edx
mov retval.LowPart, eax
}
return retval;
}
static __inline double
elapsed(ticks t1, ticks t0)
{
return (double) t1.QuadPart - (double) t0.QuadPart;
}
# define HAVE_TICK_COUNTER
# define TIME_MIN 5000.0 /* unreliable pentium IV cycle counter */
#endif
/*----------------------------------------------------------------*/
/*
* X86-64 cycle counter
*/
#if(defined(__GNUC__) || defined(__ICC) || defined(__SUNPRO_C)) && \
defined(__x86_64__) && !defined(HAVE_TICK_COUNTER)
typedef unsigned long long ticks;
static __inline__ ticks
getticks(void)
{
unsigned a, d;
__asm__ volatile("rdtsc" : "=a"(a), "=d"(d));
return ((ticks) a) | (((ticks) d) << 32);
}
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
#endif
/* PGI compiler, courtesy Cristiano Calonaci, Andrea Tarsi, & Roberto Gori.
NOTE: this code will fail to link unless you use the -Masmkeyword compiler
option (grrr). */
#if defined(__PGI) && defined(__x86_64__) && !defined(HAVE_TICK_COUNTER)
typedef unsigned long long ticks;
static ticks
getticks(void)
{
asm(" rdtsc; shl $0x20,%rdx; mov %eax,%eax; or %rdx,%rax; ");
}
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
#endif
/* Visual C++, courtesy of Dirk Michaelis */
#if _MSC_VER >= 1400 && (defined(_M_AMD64) || defined(_M_X64)) && \
!defined(HAVE_TICK_COUNTER)
# include <intrin.h>
# pragma intrinsic(__rdtsc)
typedef unsigned __int64 ticks;
# define getticks __rdtsc
INLINE_ELAPSED(__inline)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
/*
* IA64 cycle counter
*/
/* intel's icc/ecc compiler */
#if(defined(__EDG_VERSION) || defined(__ECC)) && defined(__ia64__) && \
!defined(HAVE_TICK_COUNTER)
typedef unsigned long ticks;
# include <ia64intrin.h>
static __inline__ ticks
getticks(void)
{
return __getReg(_IA64_REG_AR_ITC);
}
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
#endif
/* gcc */
#if defined(__GNUC__) && defined(__ia64__) && !defined(HAVE_TICK_COUNTER)
typedef unsigned long ticks;
static __inline__ ticks
getticks(void)
{
ticks ret;
__asm__ __volatile__("mov %0=ar.itc" : "=r"(ret));
return ret;
}
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
#endif
/* HP/UX IA64 compiler, courtesy Teresa L. Johnson: */
#if defined(__hpux) && defined(__ia64) && !defined(HAVE_TICK_COUNTER)
# include <machine/sys/inline.h>
typedef unsigned long ticks;
static inline ticks
getticks(void)
{
ticks ret;
ret = _Asm_mov_from_ar(_AREG_ITC);
return ret;
}
INLINE_ELAPSED(inline)
# define HAVE_TICK_COUNTER
#endif
/* Microsoft Visual C++ */
#if defined(_MSC_VER) && defined(_M_IA64) && !defined(HAVE_TICK_COUNTER)
typedef unsigned __int64 ticks;
# ifdef __cplusplus
extern "C"
# endif
ticks
__getReg(int whichReg);
# pragma intrinsic(__getReg)
static __inline ticks
getticks(void)
{
volatile ticks temp;
temp = __getReg(3116);
return temp;
}
INLINE_ELAPSED(inline)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
/*
* PA-RISC cycle counter
*/
#if defined(__hppa__) || defined(__hppa) && !defined(HAVE_TICK_COUNTER)
typedef unsigned long ticks;
# ifdef __GNUC__
static __inline__ ticks
getticks(void)
{
ticks ret;
__asm__ __volatile__("mfctl 16, %0" : "=r"(ret));
/* no input, nothing else clobbered */
return ret;
}
# else
# include <machine/inline.h>
static inline unsigned long
getticks(void)
{
register ticks ret;
_MFCTL(16, ret);
return ret;
}
# endif
INLINE_ELAPSED(inline)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
/* S390, courtesy of James Treacy */
#if defined(__GNUC__) && defined(__s390__) && !defined(HAVE_TICK_COUNTER)
typedef unsigned long long ticks;
static __inline__ ticks
getticks(void)
{
ticks cycles;
__asm__("stck 0(%0)" : : "a"(&(cycles)) : "memory", "cc");
return cycles;
}
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
#if defined(__GNUC__) && defined(__alpha__) && !defined(HAVE_TICK_COUNTER)
/*
* The 32-bit cycle counter on alpha overflows pretty quickly,
* unfortunately. A 1GHz machine overflows in 4 seconds.
*/
typedef unsigned int ticks;
static __inline__ ticks
getticks(void)
{
unsigned long cc;
__asm__ __volatile__("rpcc %0" : "=r"(cc));
return (cc & 0xFFFFFFFF);
}
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
#if defined(__GNUC__) && defined(__sparc_v9__) && !defined(HAVE_TICK_COUNTER)
typedef unsigned long ticks;
static __inline__ ticks
getticks(void)
{
ticks ret;
__asm__ __volatile__("rd %%tick, %0" : "=r"(ret));
return ret;
}
INLINE_ELAPSED(__inline__)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
#if(defined(__DECC) || defined(__DECCXX)) && defined(__alpha) && \
defined(HAVE_C_ASM_H) && !defined(HAVE_TICK_COUNTER)
# include <c_asm.h>
typedef unsigned int ticks;
static __inline ticks
getticks(void)
{
unsigned long cc;
cc = asm("rpcc %v0");
return (cc & 0xFFFFFFFF);
}
INLINE_ELAPSED(__inline)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
/* SGI/Irix */
#if defined(HAVE_CLOCK_GETTIME) && defined(CLOCK_SGI_CYCLE) && !defined(HAVE_TICK_COUNTER)
typedef struct timespec ticks;
static inline ticks
getticks(void)
{
struct timespec t;
clock_gettime(CLOCK_SGI_CYCLE, &t);
return t;
}
static inline double
elapsed(ticks t1, ticks t0)
{
return ((double) t1.tv_sec - (double) t0.tv_sec) * 1.0E9 +
((double) t1.tv_nsec - (double) t0.tv_nsec);
}
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
/* Cray UNICOS _rtc() intrinsic function */
#if defined(HAVE__RTC) && !defined(HAVE_TICK_COUNTER)
# ifdef HAVE_INTRINSICS_H
# include <intrinsics.h>
# endif
typedef long long ticks;
# define getticks _rtc
INLINE_ELAPSED(inline)
# define HAVE_TICK_COUNTER
#endif
/*----------------------------------------------------------------*/
/* MIPS ZBus */
#if HAVE_MIPS_ZBUS_TIMER
# if defined(__mips__) && !defined(HAVE_TICK_COUNTER)
# include <fcntl.h>
# include <sys/mman.h>
# include <unistd.h>
typedef uint64_t ticks;
static inline ticks
getticks(void)
{
static uint64_t* addr = 0;
if(addr == 0)
{
uint32_t rq_addr = 0x10030000;
int fd;
int pgsize;
pgsize = getpagesize();
fd = open("/dev/mem", O_RDONLY | O_SYNC, 0);
if(fd < 0)
{
perror("open");
return NULL;
}
addr = mmap(0, pgsize, PROT_READ, MAP_SHARED, fd, rq_addr);
close(fd);
if(addr == (uint64_t*) -1)
{
perror("mmap");
return NULL;
}
}
return *addr;
}
INLINE_ELAPSED(inline)
# define HAVE_TICK_COUNTER
# endif
#endif /* HAVE_MIPS_ZBUS_TIMER */
Разница между файлами не показана из-за своего большого размера Загрузить разницу
+886
Просмотреть файл
@@ -0,0 +1,886 @@
#include <math.h>
#if USE_MPI
# include <mpi.h>
#endif
#if _OPENMP
# include <omp.h>
#endif
#include "lulesh.h"
#include <cstdlib>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static KOKKOS_INLINE_FUNCTION Real_t
CalcElemVolume(const Real_t x0, const Real_t x1, const Real_t x2, const Real_t x3,
const Real_t x4, const Real_t x5, const Real_t x6, const Real_t x7,
const Real_t y0, const Real_t y1, const Real_t y2, const Real_t y3,
const Real_t y4, const Real_t y5, const Real_t y6, const Real_t y7,
const Real_t z0, const Real_t z1, const Real_t z2, const Real_t z3,
const Real_t z4, const Real_t z5, const Real_t z6, const Real_t z7)
{
Real_t twelveth = Real_t(1.0) / Real_t(12.0);
Real_t dx61 = x6 - x1;
Real_t dy61 = y6 - y1;
Real_t dz61 = z6 - z1;
Real_t dx70 = x7 - x0;
Real_t dy70 = y7 - y0;
Real_t dz70 = z7 - z0;
Real_t dx63 = x6 - x3;
Real_t dy63 = y6 - y3;
Real_t dz63 = z6 - z3;
Real_t dx20 = x2 - x0;
Real_t dy20 = y2 - y0;
Real_t dz20 = z2 - z0;
Real_t dx50 = x5 - x0;
Real_t dy50 = y5 - y0;
Real_t dz50 = z5 - z0;
Real_t dx64 = x6 - x4;
Real_t dy64 = y6 - y4;
Real_t dz64 = z6 - z4;
Real_t dx31 = x3 - x1;
Real_t dy31 = y3 - y1;
Real_t dz31 = z3 - z1;
Real_t dx72 = x7 - x2;
Real_t dy72 = y7 - y2;
Real_t dz72 = z7 - z2;
Real_t dx43 = x4 - x3;
Real_t dy43 = y4 - y3;
Real_t dz43 = z4 - z3;
Real_t dx57 = x5 - x7;
Real_t dy57 = y5 - y7;
Real_t dz57 = z5 - z7;
Real_t dx14 = x1 - x4;
Real_t dy14 = y1 - y4;
Real_t dz14 = z1 - z4;
Real_t dx25 = x2 - x5;
Real_t dy25 = y2 - y5;
Real_t dz25 = z2 - z5;
#define TRIPLE_PRODUCT(x1, y1, z1, x2, y2, z2, x3, y3, z3) \
((x1) * ((y2) * (z3) - (z2) * (y3)) + (x2) * ((z1) * (y3) - (y1) * (z3)) + \
(x3) * ((y1) * (z2) - (z1) * (y2)))
Real_t volume = TRIPLE_PRODUCT(dx31 + dx72, dx63, dx20, dy31 + dy72, dy63, dy20,
dz31 + dz72, dz63, dz20) +
TRIPLE_PRODUCT(dx43 + dx57, dx64, dx70, dy43 + dy57, dy64, dy70,
dz43 + dz57, dz64, dz70) +
TRIPLE_PRODUCT(dx14 + dx25, dx61, dx50, dy14 + dy25, dy61, dy50,
dz14 + dz25, dz61, dz50);
#undef TRIPLE_PRODUCT
volume *= twelveth;
return volume;
}
/******************************************/
KOKKOS_INLINE_FUNCTION
Real_t
CalcElemVolume(const Real_t x[8], const Real_t y[8], const Real_t z[8])
{
return CalcElemVolume(x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], y[0], y[1],
y[2], y[3], y[4], y[5], y[6], y[7], z[0], z[1], z[2], z[3],
z[4], z[5], z[6], z[7]);
}
/////////////////////////////////////////////////////////////////////
Domain::Domain(Int_t numRanks, Index_t colLoc, Index_t rowLoc, Index_t planeLoc,
Index_t nx, int tp, int nr, int balance, Int_t cost)
: m_e_cut(Real_t(1.0e-7))
, m_p_cut(Real_t(1.0e-7))
, m_q_cut(Real_t(1.0e-7))
, m_v_cut(Real_t(1.0e-10))
, m_u_cut(Real_t(1.0e-7))
, m_hgcoef(Real_t(3.0))
, m_ss4o3(Real_t(4.0) / Real_t(3.0))
, m_qstop(Real_t(1.0e+12))
, m_monoq_max_slope(Real_t(1.0))
, m_monoq_limiter_mult(Real_t(2.0))
, m_qlc_monoq(Real_t(0.5))
, m_qqc_monoq(Real_t(2.0) / Real_t(3.0))
, m_qqc(Real_t(2.0))
, m_eosvmax(Real_t(1.0e+9))
, m_eosvmin(Real_t(1.0e-9))
, m_pmin(Real_t(0.))
, m_emin(Real_t(-1.0e+15))
, m_dvovmax(Real_t(0.1))
, m_refdens(Real_t(1.0))
,
//
// set pointers to (potentially) "new'd" arrays to null to
// simplify deallocation.
//
m_regNumList(0)
, m_nodeElemStart(0)
, m_nodeElemCornerList(0)
, m_regElemSize(0)
, m_regElemlist(0)
#if USE_MPI
, commDataSend(0)
, commDataRecv(0)
#endif
{
Index_t edgeElems = nx;
Index_t edgeNodes = edgeElems + 1;
this->cost() = cost;
m_tp = tp;
m_numRanks = numRanks;
///////////////////////////////
// Initialize Sedov Mesh
///////////////////////////////
// construct a uniform box for this processor
m_colLoc = colLoc;
m_rowLoc = rowLoc;
m_planeLoc = planeLoc;
m_sizeX = edgeElems;
m_sizeY = edgeElems;
m_sizeZ = edgeElems;
m_numElem = edgeElems * edgeElems * edgeElems;
m_numNode = edgeNodes * edgeNodes * edgeNodes;
m_regNumList = Allocate<Index_t>(numElem()); // material indexset
// Elem-centered
AllocateElemPersistent(numElem());
// Node-centered
AllocateNodePersistent(numNode());
SetupCommBuffers(edgeNodes);
// Basic Field Initialization
for(Index_t i = 0; i < numElem(); ++i)
{
e(i) = Real_t(0.0);
p(i) = Real_t(0.0);
q(i) = Real_t(0.0);
ss(i) = Real_t(0.0);
}
// Note - v initializes to 1.0, not 0.0!
for(Index_t i = 0; i < numElem(); ++i)
{
v(i) = Real_t(1.0);
}
for(Index_t i = 0; i < numNode(); ++i)
{
xd(i) = Real_t(0.0);
yd(i) = Real_t(0.0);
zd(i) = Real_t(0.0);
}
for(Index_t i = 0; i < numNode(); ++i)
{
xdd(i) = Real_t(0.0);
ydd(i) = Real_t(0.0);
zdd(i) = Real_t(0.0);
}
for(Index_t i = 0; i < numNode(); ++i)
{
nodalMass(i) = Real_t(0.0);
}
BuildMesh(nx, edgeNodes, edgeElems);
#if _OPENMP
SetupThreadSupportStructures();
#else
// These arrays are not used if we're not threaded
m_nodeElemStart = NULL;
m_nodeElemCornerList = NULL;
#endif
// Setup region index sets. For now, these are constant sized
// throughout the run, but could be changed every cycle to
// simulate effects of ALE on the lagrange solver
CreateRegionIndexSets(nr, balance);
// Setup symmetry nodesets
SetupSymmetryPlanes(edgeNodes);
// Setup element connectivities
SetupElementConnectivities(edgeElems);
// Setup symmetry planes and free surface boundary arrays
SetupBoundaryConditions(edgeElems);
// Setup defaults
// These can be changed (requires recompile) if you want to run
// with a fixed timestep, or to a different end time, but it's
// probably easier/better to just run a fixed number of timesteps
// using the -i flag in 2.x
dtfixed() = Real_t(-1.0e-6); // Negative means use courant condition
stoptime() = Real_t(1.0e-2); // *Real_t(edgeElems*tp/45.0) ;
// Initial conditions
deltatimemultlb() = Real_t(1.1);
deltatimemultub() = Real_t(1.2);
dtcourant() = Real_t(1.0e+20);
dthydro() = Real_t(1.0e+20);
dtmax() = Real_t(1.0e-2);
time() = Real_t(0.);
cycle() = Int_t(0);
// initialize field data
for(Index_t i = 0; i < numElem(); ++i)
{
Real_t x_local[8], y_local[8], z_local[8];
Index_t* elemToNode = nodelist(i);
for(Index_t lnode = 0; lnode < 8; ++lnode)
{
Index_t gnode = elemToNode[lnode];
x_local[lnode] = x(gnode);
y_local[lnode] = y(gnode);
z_local[lnode] = z(gnode);
}
// volume calculations
Real_t volume = CalcElemVolume(x_local, y_local, z_local);
volo(i) = volume;
elemMass(i) = volume;
for(Index_t j = 0; j < 8; ++j)
{
Index_t idx = elemToNode[j];
nodalMass(idx) += volume / Real_t(8.0);
}
}
// deposit initial energy
// An energy of 3.948746e+7 is correct for a problem with
// 45 zones along a side - we need to scale it
const Real_t ebase = Real_t(3.948746e+7);
Real_t scale = (nx * m_tp) / Real_t(45.0);
Real_t einit = ebase * scale * scale * scale;
if(m_rowLoc + m_colLoc + m_planeLoc == 0)
{
// Dump into the first zone (which we know is in the corner)
// of the domain that sits at the origin
e(0) = einit;
}
// set initial deltatime base on analytic CFL calculation
deltatime() = (Real_t(.5) * cbrt(volo(0))) / sqrt(Real_t(2.0) * einit);
} // End constructor
////////////////////////////////////////////////////////////////////////////////
Domain::~Domain()
{
/* Release(&m_regNumList);
Release(&m_nodeElemStart);
Release(&m_nodeElemCornerList);
Release(&m_regElemSize);
for (Index_t i=0 ; i<numReg() ; ++i) {
Release(&m_regElemlist[i]);
}
Release(&m_regElemlist);
#if USE_MPI
Release(&commDataSend);
Release(&commDataRecv);
#endif
*/
} // End destructor
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
void
Domain::BuildMesh(Int_t nx, Int_t edgeNodes, Int_t edgeElems)
{
Index_t meshEdgeElems = m_tp * nx;
// initialize nodal coordinates
Index_t nidx = 0;
Real_t tz = Real_t(1.125) * Real_t(m_planeLoc * nx) / Real_t(meshEdgeElems);
for(Index_t plane = 0; plane < edgeNodes; ++plane)
{
Real_t ty = Real_t(1.125) * Real_t(m_rowLoc * nx) / Real_t(meshEdgeElems);
for(Index_t row = 0; row < edgeNodes; ++row)
{
Real_t tx = Real_t(1.125) * Real_t(m_colLoc * nx) / Real_t(meshEdgeElems);
for(Index_t col = 0; col < edgeNodes; ++col)
{
x(nidx) = tx;
y(nidx) = ty;
z(nidx) = tz;
++nidx;
// tx += ds ; // may accumulate roundoff...
tx = Real_t(1.125) * Real_t(m_colLoc * nx + col + 1) /
Real_t(meshEdgeElems);
}
// ty += ds ; // may accumulate roundoff...
ty = Real_t(1.125) * Real_t(m_rowLoc * nx + row + 1) / Real_t(meshEdgeElems);
}
// tz += ds ; // may accumulate roundoff...
tz = Real_t(1.125) * Real_t(m_planeLoc * nx + plane + 1) / Real_t(meshEdgeElems);
}
// embed hexehedral elements in nodal point lattice
Index_t zidx = 0;
nidx = 0;
for(Index_t plane = 0; plane < edgeElems; ++plane)
{
for(Index_t row = 0; row < edgeElems; ++row)
{
for(Index_t col = 0; col < edgeElems; ++col)
{
Index_t* localNode = nodelist(zidx);
localNode[0] = nidx;
localNode[1] = nidx + 1;
localNode[2] = nidx + edgeNodes + 1;
localNode[3] = nidx + edgeNodes;
localNode[4] = nidx + edgeNodes * edgeNodes;
localNode[5] = nidx + edgeNodes * edgeNodes + 1;
localNode[6] = nidx + edgeNodes * edgeNodes + edgeNodes + 1;
localNode[7] = nidx + edgeNodes * edgeNodes + edgeNodes;
++zidx;
++nidx;
}
++nidx;
}
nidx += edgeNodes;
}
}
////////////////////////////////////////////////////////////////////////////////
void
Domain::SetupThreadSupportStructures()
{
// set up node-centered indexing of elements
Index_t* nodeElemCount = Allocate<Index_t>(numNode());
for(Index_t i = 0; i < numNode(); ++i)
{
nodeElemCount[i] = 0;
}
for(Index_t i = 0; i < numElem(); ++i)
{
Index_t* nl = nodelist(i);
for(Index_t j = 0; j < 8; ++j)
{
++(nodeElemCount[nl[j]]);
}
}
m_nodeElemStart = Allocate<Index_t>(numNode() + 1);
m_nodeElemStart[0] = 0;
for(Index_t i = 1; i <= numNode(); ++i)
{
m_nodeElemStart[i] = m_nodeElemStart[i - 1] + nodeElemCount[i - 1];
}
m_nodeElemCornerList = Allocate<Index_t>(m_nodeElemStart[numNode()]);
for(Index_t i = 0; i < numNode(); ++i)
{
nodeElemCount[i] = 0;
}
for(Index_t i = 0; i < numElem(); ++i)
{
Index_t* nl = nodelist(i);
for(Index_t j = 0; j < 8; ++j)
{
Index_t m = nl[j];
Index_t k = i * 8 + j;
Index_t offset = m_nodeElemStart[m] + nodeElemCount[m];
m_nodeElemCornerList[offset] = k;
++(nodeElemCount[m]);
}
}
Index_t clSize = m_nodeElemStart[numNode()];
for(Index_t i = 0; i < clSize; ++i)
{
Index_t clv = m_nodeElemCornerList[i];
if((clv < 0) || (clv > numElem() * 8))
{
fprintf(
stderr,
"AllocateNodeElemIndexes(): nodeElemCornerList entry out of range!\n");
#if USE_MPI
MPI_Abort(MPI_COMM_WORLD, -1);
#else
exit(-1);
#endif
}
}
Release<Index_t>(&nodeElemCount);
}
////////////////////////////////////////////////////////////////////////////////
void
Domain::SetupCommBuffers(Int_t edgeNodes)
{
// allocate a buffer large enough for nodal ghost data
Index_t maxEdgeSize = MAX(this->sizeX(), MAX(this->sizeY(), this->sizeZ())) + 1;
m_maxPlaneSize = CACHE_ALIGN_REAL(maxEdgeSize * maxEdgeSize);
m_maxEdgeSize = CACHE_ALIGN_REAL(maxEdgeSize);
// assume communication to 6 neighbors by default
m_rowMin = (m_rowLoc == 0) ? 0 : 1;
m_rowMax = (m_rowLoc == m_tp - 1) ? 0 : 1;
m_colMin = (m_colLoc == 0) ? 0 : 1;
m_colMax = (m_colLoc == m_tp - 1) ? 0 : 1;
m_planeMin = (m_planeLoc == 0) ? 0 : 1;
m_planeMax = (m_planeLoc == m_tp - 1) ? 0 : 1;
#if USE_MPI
// account for face communication
Index_t comBufSize =
(m_rowMin + m_rowMax + m_colMin + m_colMax + m_planeMin + m_planeMax) *
m_maxPlaneSize * MAX_FIELDS_PER_MPI_COMM;
// account for edge communication
comBufSize +=
((m_rowMin & m_colMin) + (m_rowMin & m_planeMin) + (m_colMin & m_planeMin) +
(m_rowMax & m_colMax) + (m_rowMax & m_planeMax) + (m_colMax & m_planeMax) +
(m_rowMax & m_colMin) + (m_rowMin & m_planeMax) + (m_colMin & m_planeMax) +
(m_rowMin & m_colMax) + (m_rowMax & m_planeMin) + (m_colMax & m_planeMin)) *
m_maxEdgeSize * MAX_FIELDS_PER_MPI_COMM;
// account for corner communication
// factor of 16 is so each buffer has its own cache line
comBufSize +=
((m_rowMin & m_colMin & m_planeMin) + (m_rowMin & m_colMin & m_planeMax) +
(m_rowMin & m_colMax & m_planeMin) + (m_rowMin & m_colMax & m_planeMax) +
(m_rowMax & m_colMin & m_planeMin) + (m_rowMax & m_colMin & m_planeMax) +
(m_rowMax & m_colMax & m_planeMin) + (m_rowMax & m_colMax & m_planeMax)) *
CACHE_COHERENCE_PAD_REAL;
this->commDataSend = Allocate<Real_t>(comBufSize);
this->commDataRecv = Allocate<Real_t>(comBufSize);
// prevent floating point exceptions
memset(this->commDataSend, 0, comBufSize * sizeof(Real_t));
memset(this->commDataRecv, 0, comBufSize * sizeof(Real_t));
#endif
// Boundary nodesets
if(m_colLoc == 0)
m_symmX.resize(edgeNodes * edgeNodes);
if(m_rowLoc == 0)
m_symmY.resize(edgeNodes * edgeNodes);
if(m_planeLoc == 0)
m_symmZ.resize(edgeNodes * edgeNodes);
}
////////////////////////////////////////////////////////////////////////////////
void
Domain::CreateRegionIndexSets(Int_t nr, Int_t balance)
{
#if USE_MPI
Index_t myRank;
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
srand(myRank);
#else
srand(0);
Index_t myRank = 0;
#endif
this->numReg() = nr;
m_regElemSize = Allocate<Index_t>(numReg());
m_regElemlist = Allocate<Index_t*>(numReg());
Index_t nextIndex = 0;
// if we only have one region just fill it
// Fill out the regNumList with material numbers, which are always
// the region index plus one
if(numReg() == 1)
{
while(nextIndex < numElem())
{
this->regNumList(nextIndex) = 1;
nextIndex++;
}
regElemSize(0) = 0;
}
// If we have more than one region distribute the elements.
else
{
Int_t regionNum;
Int_t regionVar;
Int_t lastReg = -1;
Int_t binSize;
Index_t elements;
Index_t runto = 0;
Int_t costDenominator = 0;
Int_t* regBinEnd = Allocate<Int_t>(numReg());
// Determine the relative weights of all the regions. This is based off the -b
// flag. Balance is the value passed into b.
for(Index_t i = 0; i < numReg(); ++i)
{
regElemSize(i) = 0;
costDenominator += pow((i + 1), balance); // Total sum of all regions weights
regBinEnd[i] =
costDenominator; // Chance of hitting a given region is (regBinEnd[i] -
// regBinEdn[i-1])/costDenominator
}
// Until all elements are assigned
while(nextIndex < numElem())
{
// pick the region
regionVar = rand() % costDenominator;
Index_t i = 0;
while(regionVar >= regBinEnd[i])
i++;
// rotate the regions based on MPI rank. Rotation is Rank % NumRegions this
// makes each domain have a different region with the highest representation
regionNum = ((i + myRank) % numReg()) + 1;
// make sure we don't pick the same region twice in a row
while(regionNum == lastReg)
{
regionVar = rand() % costDenominator;
i = 0;
while(regionVar >= regBinEnd[i])
i++;
regionNum = ((i + myRank) % numReg()) + 1;
}
// Pick the bin size of the region and determine the number of elements.
binSize = rand() % 1000;
if(binSize < 773)
{
elements = rand() % 15 + 1;
}
else if(binSize < 937)
{
elements = rand() % 16 + 16;
}
else if(binSize < 970)
{
elements = rand() % 32 + 32;
}
else if(binSize < 974)
{
elements = rand() % 64 + 64;
}
else if(binSize < 978)
{
elements = rand() % 128 + 128;
}
else if(binSize < 981)
{
elements = rand() % 256 + 256;
}
else
elements = rand() % 1537 + 512;
runto = elements + nextIndex;
// Store the elements. If we hit the end before we run out of elements then
// just stop.
while(nextIndex < runto && nextIndex < numElem())
{
this->regNumList(nextIndex) = regionNum;
nextIndex++;
}
lastReg = regionNum;
}
}
// Convert regNumList to region index sets
// First, count size of each region
for(Index_t i = 0; i < numElem(); ++i)
{
int r = this->regNumList(i) - 1; // region index == regnum-1
regElemSize(r)++;
}
// Second, allocate each region index set
for(Index_t i = 0; i < numReg(); ++i)
{
m_regElemlist[i] = Allocate<Int_t>(regElemSize(i));
regElemSize(i) = 0;
}
// Third, fill index sets
for(Index_t i = 0; i < numElem(); ++i)
{
Index_t r = regNumList(i) - 1; // region index == regnum-1
Index_t regndx = regElemSize(r)++; // Note increment
regElemlist(r, regndx) = i;
}
}
/////////////////////////////////////////////////////////////
void
Domain::SetupSymmetryPlanes(Int_t edgeNodes)
{
Index_t nidx = 0;
for(Index_t i = 0; i < edgeNodes; ++i)
{
Index_t planeInc = i * edgeNodes * edgeNodes;
Index_t rowInc = i * edgeNodes;
for(Index_t j = 0; j < edgeNodes; ++j)
{
if(m_planeLoc == 0)
{
m_symmZ[nidx] = rowInc + j;
}
if(m_rowLoc == 0)
{
m_symmY[nidx] = planeInc + j;
}
if(m_colLoc == 0)
{
m_symmX[nidx] = planeInc + j * edgeNodes;
}
++nidx;
}
}
}
/////////////////////////////////////////////////////////////
void
Domain::SetupElementConnectivities(Int_t edgeElems)
{
lxim(0) = 0;
for(Index_t i = 1; i < numElem(); ++i)
{
lxim(i) = i - 1;
lxip(i - 1) = i;
}
lxip(numElem() - 1) = numElem() - 1;
for(Index_t i = 0; i < edgeElems; ++i)
{
letam(i) = i;
letap(numElem() - edgeElems + i) = numElem() - edgeElems + i;
}
for(Index_t i = edgeElems; i < numElem(); ++i)
{
letam(i) = i - edgeElems;
letap(i - edgeElems) = i;
}
for(Index_t i = 0; i < edgeElems * edgeElems; ++i)
{
lzetam(i) = i;
lzetap(numElem() - edgeElems * edgeElems + i) =
numElem() - edgeElems * edgeElems + i;
}
for(Index_t i = edgeElems * edgeElems; i < numElem(); ++i)
{
lzetam(i) = i - edgeElems * edgeElems;
lzetap(i - edgeElems * edgeElems) = i;
}
}
/////////////////////////////////////////////////////////////
void
Domain::SetupBoundaryConditions(Int_t edgeElems)
{
Index_t ghostIdx[6]; // offsets to ghost locations
// set up boundary condition information
for(Index_t i = 0; i < numElem(); ++i)
{
elemBC(i) = Int_t(0);
}
for(Index_t i = 0; i < 6; ++i)
{
ghostIdx[i] = INT_MIN;
}
Int_t pidx = numElem();
if(m_planeMin != 0)
{
ghostIdx[0] = pidx;
pidx += sizeX() * sizeY();
}
if(m_planeMax != 0)
{
ghostIdx[1] = pidx;
pidx += sizeX() * sizeY();
}
if(m_rowMin != 0)
{
ghostIdx[2] = pidx;
pidx += sizeX() * sizeZ();
}
if(m_rowMax != 0)
{
ghostIdx[3] = pidx;
pidx += sizeX() * sizeZ();
}
if(m_colMin != 0)
{
ghostIdx[4] = pidx;
pidx += sizeY() * sizeZ();
}
if(m_colMax != 0)
{
ghostIdx[5] = pidx;
}
// symmetry plane or free surface BCs
for(Index_t i = 0; i < edgeElems; ++i)
{
Index_t planeInc = i * edgeElems * edgeElems;
Index_t rowInc = i * edgeElems;
for(Index_t j = 0; j < edgeElems; ++j)
{
if(m_planeLoc == 0)
{
elemBC(rowInc + j) |= ZETA_M_SYMM;
}
else
{
elemBC(rowInc + j) |= ZETA_M_COMM;
lzetam(rowInc + j) = ghostIdx[0] + rowInc + j;
}
if(m_planeLoc == m_tp - 1)
{
elemBC(rowInc + j + numElem() - edgeElems * edgeElems) |= ZETA_P_FREE;
}
else
{
elemBC(rowInc + j + numElem() - edgeElems * edgeElems) |= ZETA_P_COMM;
lzetap(rowInc + j + numElem() - edgeElems * edgeElems) =
ghostIdx[1] + rowInc + j;
}
if(m_rowLoc == 0)
{
elemBC(planeInc + j) |= ETA_M_SYMM;
}
else
{
elemBC(planeInc + j) |= ETA_M_COMM;
letam(planeInc + j) = ghostIdx[2] + rowInc + j;
}
if(m_rowLoc == m_tp - 1)
{
elemBC(planeInc + j + edgeElems * edgeElems - edgeElems) |= ETA_P_FREE;
}
else
{
elemBC(planeInc + j + edgeElems * edgeElems - edgeElems) |= ETA_P_COMM;
letap(planeInc + j + edgeElems * edgeElems - edgeElems) =
ghostIdx[3] + rowInc + j;
}
if(m_colLoc == 0)
{
elemBC(planeInc + j * edgeElems) |= XI_M_SYMM;
}
else
{
elemBC(planeInc + j * edgeElems) |= XI_M_COMM;
lxim(planeInc + j * edgeElems) = ghostIdx[4] + rowInc + j;
}
if(m_colLoc == m_tp - 1)
{
elemBC(planeInc + j * edgeElems + edgeElems - 1) |= XI_P_FREE;
}
else
{
elemBC(planeInc + j * edgeElems + edgeElems - 1) |= XI_P_COMM;
lxip(planeInc + j * edgeElems + edgeElems - 1) = ghostIdx[5] + rowInc + j;
}
}
}
}
///////////////////////////////////////////////////////////////////////////
void
InitMeshDecomp(Int_t numRanks, Int_t myRank, Int_t* col, Int_t* row, Int_t* plane,
Int_t* side)
{
Int_t testProcs;
Int_t dx, dy, dz;
Int_t myDom;
// Assume cube processor layout for now
testProcs = Int_t(cbrt(Real_t(numRanks)) + 0.5);
if(testProcs * testProcs * testProcs != numRanks)
{
printf("Num processors must be a cube of an integer (1, 8, 27, ...)\n");
#if USE_MPI
MPI_Abort(MPI_COMM_WORLD, -1);
#else
exit(-1);
#endif
}
if(sizeof(Real_t) != 4 && sizeof(Real_t) != 8)
{
printf("MPI operations only support float and double right now...\n");
#if USE_MPI
MPI_Abort(MPI_COMM_WORLD, -1);
#else
exit(-1);
#endif
}
if(MAX_FIELDS_PER_MPI_COMM > CACHE_COHERENCE_PAD_REAL)
{
printf("corner element comm buffers too small. Fix code.\n");
#if USE_MPI
MPI_Abort(MPI_COMM_WORLD, -1);
#else
exit(-1);
#endif
}
dx = testProcs;
dy = testProcs;
dz = testProcs;
// temporary test
if(dx * dy * dz != numRanks)
{
printf("error -- must have as many domains as procs\n");
#if USE_MPI
MPI_Abort(MPI_COMM_WORLD, -1);
#else
exit(-1);
#endif
}
Int_t remainder = dx * dy * dz % numRanks;
if(myRank < remainder)
{
myDom = myRank * (1 + (dx * dy * dz / numRanks));
}
else
{
myDom = remainder * (1 + (dx * dy * dz / numRanks)) +
(myRank - remainder) * (dx * dy * dz / numRanks);
}
*col = myDom % dx;
*row = (myDom / dx) % dy;
*plane = myDom / (dx * dy);
*side = testProcs;
return;
}
+273
Просмотреть файл
@@ -0,0 +1,273 @@
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#if USE_MPI
# include <mpi.h>
#endif
#include "lulesh.h"
/* Helper function for converting strings to ints, with error checking */
int
StrToInt(const char* token, int* retVal)
{
const char* c;
char* endptr;
const int decimal_base = 10;
if(token == NULL)
return 0;
c = token;
*retVal = (int) strtol(c, &endptr, decimal_base);
if((endptr != c) && ((*endptr == ' ') || (*endptr == '\0')))
return 1;
else
return 0;
}
static void
PrintCommandLineOptions(char* execname, int myRank)
{
if(myRank == 0)
{
printf("Usage: %s [opts]\n", execname);
printf(" where [opts] is one or more of:\n");
printf(" -q : quiet mode - suppress all stdout\n");
printf(" -i <iterations> : number of cycles to run\n");
printf(" -s <size> : length of cube mesh along side\n");
printf(" -r <numregions> : Number of distinct regions (def: 11)\n");
printf(" -b <balance> : Load balance between regions of a domain (def: 1)\n");
printf(" -c <cost> : Extra cost of more expensive regions (def: 1)\n");
printf(" -f <numfiles> : Number of files to split viz dump into (def: "
"(np+10)/9)\n");
printf(" -p : Print out progress\n");
printf(
" -v : Output viz file (requires compiling with -DVIZ_MESH\n");
printf(" -h : This message\n");
printf("\n\n");
}
}
static void
ParseError(const char* message, int myRank)
{
if(myRank == 0)
{
printf("%s\n", message);
#if USE_MPI
MPI_Abort(MPI_COMM_WORLD, -1);
#else
exit(-1);
#endif
}
}
void
ParseCommandLineOptions(int argc, char* argv[], int myRank, struct cmdLineOpts* opts)
{
if(argc > 1)
{
int i = 1;
while(i < argc)
{
int ok;
/* -i <iterations> */
if(strcmp(argv[i], "-i") == 0)
{
if(i + 1 >= argc)
{
ParseError("Missing integer argument to -i", myRank);
}
ok = StrToInt(argv[i + 1], &(opts->its));
if(!ok)
{
ParseError("Parse Error on option -i integer value required after "
"argument\n",
myRank);
}
i += 2;
}
/* -s <size, sidelength> */
else if(strcmp(argv[i], "-s") == 0)
{
if(i + 1 >= argc)
{
ParseError("Missing integer argument to -s\n", myRank);
}
ok = StrToInt(argv[i + 1], &(opts->nx));
if(!ok)
{
ParseError("Parse Error on option -s integer value required after "
"argument\n",
myRank);
}
i += 2;
}
/* -r <numregions> */
else if(strcmp(argv[i], "-r") == 0)
{
if(i + 1 >= argc)
{
ParseError("Missing integer argument to -r\n", myRank);
}
ok = StrToInt(argv[i + 1], &(opts->numReg));
if(!ok)
{
ParseError("Parse Error on option -r integer value required after "
"argument\n",
myRank);
}
i += 2;
}
/* -f <numfilepieces> */
else if(strcmp(argv[i], "-f") == 0)
{
if(i + 1 >= argc)
{
ParseError("Missing integer argument to -f\n", myRank);
}
ok = StrToInt(argv[i + 1], &(opts->numFiles));
if(!ok)
{
ParseError("Parse Error on option -f integer value required after "
"argument\n",
myRank);
}
i += 2;
}
/* -p */
else if(strcmp(argv[i], "-p") == 0)
{
opts->showProg = 1;
i++;
}
/* -q */
else if(strcmp(argv[i], "-q") == 0)
{
opts->quiet = 1;
i++;
}
/* -q */
else if(strcmp(argv[i], "-a") == 0)
{
opts->do_atomic = 1;
i++;
}
else if(strcmp(argv[i], "-b") == 0)
{
if(i + 1 >= argc)
{
ParseError("Missing integer argument to -b\n", myRank);
}
ok = StrToInt(argv[i + 1], &(opts->balance));
if(!ok)
{
ParseError("Parse Error on option -b integer value required after "
"argument\n",
myRank);
}
i += 2;
}
else if(strcmp(argv[i], "-c") == 0)
{
if(i + 1 >= argc)
{
ParseError("Missing integer argument to -c\n", myRank);
}
ok = StrToInt(argv[i + 1], &(opts->cost));
if(!ok)
{
ParseError("Parse Error on option -c integer value required after "
"argument\n",
myRank);
}
i += 2;
}
/* -v */
else if(strcmp(argv[i], "-v") == 0)
{
#if VIZ_MESH
opts->viz = 1;
#else
ParseError("Use of -v requires compiling with -DVIZ_MESH\n", myRank);
#endif
i++;
}
/* -h */
else if(strcmp(argv[i], "-h") == 0)
{
PrintCommandLineOptions(argv[0], myRank);
#if USE_MPI
MPI_Abort(MPI_COMM_WORLD, 0);
#else
exit(0);
#endif
}
else
{
char msg[80];
PrintCommandLineOptions(argv[0], myRank);
sprintf(msg, "ERROR: Unknown command line argument: %s\n", argv[i]);
ParseError(msg, myRank);
}
}
}
}
/////////////////////////////////////////////////////////////////////
void
VerifyAndWriteFinalOutput(Real_t elapsed_time, Domain& locDom, Int_t nx, Int_t numRanks)
{
// GrindTime1 only takes a single domain into account, and is thus a good way to
// measure processor speed indepdendent of MPI parallelism. GrindTime2 takes into
// account speedups from MPI parallelism
Real_t grindTime1 = ((elapsed_time * 1e6) / locDom.cycle()) / (nx * nx * nx);
Real_t grindTime2 =
((elapsed_time * 1e6) / locDom.cycle()) / (nx * nx * nx * numRanks);
Index_t ElemId = 0;
printf("Run completed: \n");
printf(" Problem size = %i \n", nx);
printf(" MPI tasks = %i \n", numRanks);
printf(" Iteration count = %i \n", locDom.cycle());
printf(" Final Origin Energy = %12.6e \n", locDom.e(ElemId));
Real_t MaxAbsDiff = Real_t(0.0);
Real_t TotalAbsDiff = Real_t(0.0);
Real_t MaxRelDiff = Real_t(0.0);
for(Index_t j = 0; j < nx; ++j)
{
for(Index_t k = j + 1; k < nx; ++k)
{
Real_t AbsDiff = FABS(locDom.e(j * nx + k) - locDom.e(k * nx + j));
TotalAbsDiff += AbsDiff;
if(MaxAbsDiff < AbsDiff)
MaxAbsDiff = AbsDiff;
Real_t RelDiff = AbsDiff / locDom.e(k * nx + j);
if(MaxRelDiff < RelDiff)
MaxRelDiff = RelDiff;
}
}
// Quick symmetry check
printf(" Testing Plane 0 of Energy Array on rank 0:\n");
printf(" MaxAbsDiff = %12.6e\n", MaxAbsDiff);
printf(" TotalAbsDiff = %12.6e\n", TotalAbsDiff);
printf(" MaxRelDiff = %12.6e\n\n", MaxRelDiff);
// Timing information
printf("\nElapsed time = %10.2f (s)\n", elapsed_time);
printf("Grind time (us/z/c) = %10.8g (per dom) (%10.8g overall)\n", grindTime1,
grindTime2);
printf("FOM = %10.8g (z/s)\n\n",
1000.0 / grindTime2); // zones per second
return;
}
+422
Просмотреть файл
@@ -0,0 +1,422 @@
#include "lulesh.h"
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#ifdef VIZ_MESH
# ifdef __cplusplus
extern "C"
{
# endif
# include "silo.h"
# if USE_MPI
# include "pmpio.h"
# endif
# ifdef __cplusplus
}
# endif
// Function prototypes
static void
DumpDomainToVisit(DBfile* db, Domain& domain, int myRank);
static
# if USE_MPI
// For some reason, earlier versions of g++ (e.g. 4.2) won't let me
// put the 'static' qualifier on this prototype, even if it's done
// consistently in the prototype and definition
void
DumpMultiblockObjects(DBfile* db, PMPIO_baton_t* bat, char basename[], int numRanks);
// Callback prototypes for PMPIO interface (only useful if we're
// running parallel)
static void*
LULESH_PMPIO_Create(const char* fname, const char* dname, void* udata);
static void*
LULESH_PMPIO_Open(const char* fname, const char* dname, PMPIO_iomode_t ioMode,
void* udata);
static void
LULESH_PMPIO_Close(void* file, void* udata);
# else
void
DumpMultiblockObjects(DBfile* db, char basename[], int numRanks);
# endif
/**********************************************************************/
void
DumpToVisit(Domain& domain, int numFiles, int myRank, int numRanks)
{
char subdirName[32];
char basename[32];
DBfile* db;
sprintf(basename, "lulesh_plot_c%d", domain.cycle());
sprintf(subdirName, "data_%d", myRank);
# if USE_MPI
PMPIO_baton_t* bat =
PMPIO_Init(numFiles, PMPIO_WRITE, MPI_COMM_WORLD, 10101, LULESH_PMPIO_Create,
LULESH_PMPIO_Open, LULESH_PMPIO_Close, NULL);
int myiorank = PMPIO_GroupRank(bat, myRank);
char fileName[64];
if(myiorank == 0)
strcpy(fileName, basename);
else
sprintf(fileName, "%s.%03d", basename, myiorank);
db = (DBfile*) PMPIO_WaitForBaton(bat, fileName, subdirName);
DumpDomainToVisit(db, domain, myRank);
// Processor 0 writes out bit of extra data to its file that
// describes how to stitch all the pieces together
if(myRank == 0)
{
DumpMultiblockObjects(db, bat, basename, numRanks);
}
PMPIO_HandOffBaton(bat, db);
PMPIO_Finish(bat);
# else
db = (DBfile*) DBCreate(basename, DB_CLOBBER, DB_LOCAL, NULL, DB_HDF5X);
if(db)
{
DBMkDir(db, subdirName);
DBSetDir(db, subdirName);
DumpDomainToVisit(db, domain, myRank);
DumpMultiblockObjects(db, basename, numRanks);
}
else
{
printf("Error writing out viz file - rank %d\n", myRank);
}
# endif
}
/**********************************************************************/
static void
DumpDomainToVisit(DBfile* db, Domain& domain, int myRank)
{
int ok = 0;
/* Create an option list that will give some hints to VisIt for
* printing out the cycle and time in the annotations */
DBoptlist* optlist;
/* Write out the mesh connectivity in fully unstructured format */
int shapetype[1] = { DB_ZONETYPE_HEX };
int shapesize[1] = { 8 };
int shapecnt[1] = { domain.numElem() };
int* conn = Allocate<int>(domain.numElem() * 8);
int ci = 0;
for(int ei = 0; ei < domain.numElem(); ++ei)
{
Index_t* elemToNode = domain.nodelist(ei);
for(int ni = 0; ni < 8; ++ni)
{
conn[ci++] = elemToNode[ni];
}
}
ok += DBPutZonelist2(db, "connectivity", domain.numElem(), 3, conn,
domain.numElem() * 8, 0, 0, 0, /* Not carrying ghost zones */
shapetype, shapesize, shapecnt, 1, NULL);
Release<int>(&conn);
/* Write out the mesh coordinates associated with the mesh */
const char* coordnames[3] = { "X", "Y", "Z" };
float* coords[3];
coords[0] = Allocate<float>(domain.numNode());
coords[1] = Allocate<float>(domain.numNode());
coords[2] = Allocate<float>(domain.numNode());
for(int ni = 0; ni < domain.numNode(); ++ni)
{
coords[0][ni] = float(domain.x(ni));
coords[1][ni] = float(domain.y(ni));
coords[2][ni] = float(domain.z(ni));
}
optlist = DBMakeOptlist(2);
ok += DBAddOption(optlist, DBOPT_DTIME, &domain.time());
ok += DBAddOption(optlist, DBOPT_CYCLE, &domain.cycle());
ok += DBPutUcdmesh(db, "mesh", 3, (char**) &coordnames[0], (float**) coords,
domain.numNode(), domain.numElem(), "connectivity", 0, DB_FLOAT,
optlist);
ok += DBFreeOptlist(optlist);
Release<float>(&coords[2]);
Release<float>(&coords[1]);
Release<float>(&coords[0]);
/* Write out the materials */
int* matnums = Allocate<int>(domain.numReg());
int dims[1] = { domain.numElem() }; // No mixed elements
for(int i = 0; i < domain.numReg(); ++i)
matnums[i] = i + 1;
ok += DBPutMaterial(db, "regions", "mesh", domain.numReg(), matnums,
domain.regNumList(), dims, 1, NULL, NULL, NULL, NULL, 0, DB_FLOAT,
NULL);
Release<int>(&matnums);
/* Write out pressure, energy, relvol, q */
float* e = Allocate<float>(domain.numElem());
for(int ei = 0; ei < domain.numElem(); ++ei)
{
e[ei] = float(domain.e(ei));
}
ok += DBPutUcdvar1(db, "e", "mesh", e, domain.numElem(), NULL, 0, DB_FLOAT,
DB_ZONECENT, NULL);
Release<float>(&e);
float* p = Allocate<float>(domain.numElem());
for(int ei = 0; ei < domain.numElem(); ++ei)
{
p[ei] = float(domain.p(ei));
}
ok += DBPutUcdvar1(db, "p", "mesh", p, domain.numElem(), NULL, 0, DB_FLOAT,
DB_ZONECENT, NULL);
Release<float>(&p);
float* v = Allocate<float>(domain.numElem());
for(int ei = 0; ei < domain.numElem(); ++ei)
{
v[ei] = float(domain.v(ei));
}
ok += DBPutUcdvar1(db, "v", "mesh", v, domain.numElem(), NULL, 0, DB_FLOAT,
DB_ZONECENT, NULL);
Release<float>(&v);
float* q = Allocate<float>(domain.numElem());
for(int ei = 0; ei < domain.numElem(); ++ei)
{
q[ei] = float(domain.q(ei));
}
ok += DBPutUcdvar1(db, "q", "mesh", q, domain.numElem(), NULL, 0, DB_FLOAT,
DB_ZONECENT, NULL);
Release<float>(&q);
/* Write out nodal speed, velocities */
float* zd = Allocate<float>(domain.numNode());
float* yd = Allocate<float>(domain.numNode());
float* xd = Allocate<float>(domain.numNode());
float* speed = Allocate<float>(domain.numNode());
for(int ni = 0; ni < domain.numNode(); ++ni)
{
xd[ni] = float(domain.xd(ni));
yd[ni] = float(domain.yd(ni));
zd[ni] = float(domain.zd(ni));
speed[ni] =
float(sqrt((xd[ni] * xd[ni]) + (yd[ni] * yd[ni]) + (zd[ni] * zd[ni])));
}
ok += DBPutUcdvar1(db, "speed", "mesh", speed, domain.numNode(), NULL, 0, DB_FLOAT,
DB_NODECENT, NULL);
Release<float>(&speed);
ok += DBPutUcdvar1(db, "xd", "mesh", xd, domain.numNode(), NULL, 0, DB_FLOAT,
DB_NODECENT, NULL);
Release<float>(&xd);
ok += DBPutUcdvar1(db, "yd", "mesh", yd, domain.numNode(), NULL, 0, DB_FLOAT,
DB_NODECENT, NULL);
Release<float>(&yd);
ok += DBPutUcdvar1(db, "zd", "mesh", zd, domain.numNode(), NULL, 0, DB_FLOAT,
DB_NODECENT, NULL);
Release<float>(&zd);
if(ok != 0)
{
printf("Error writing out viz file - rank %d\n", myRank);
}
}
/**********************************************************************/
# if USE_MPI
void
DumpMultiblockObjects(DBfile* db, PMPIO_baton_t* bat, char basename[], int numRanks)
# else
void
DumpMultiblockObjects(DBfile* db, char basename[], int numRanks)
# endif
{
/* MULTIBLOCK objects to tie together multiple files */
char** multimeshObjs;
char** multimatObjs;
char*** multivarObjs;
int* blockTypes;
int* varTypes;
int ok = 0;
// Make sure this list matches what's written out above
char vars[][10] = { "p", "e", "v", "q", "speed", "xd", "yd", "zd" };
int numvars = sizeof(vars) / sizeof(vars[0]);
// Reset to the root directory of the silo file
DBSetDir(db, "/");
// Allocate a bunch of space for building up the string names
multimeshObjs = Allocate<char*>(numRanks);
multimatObjs = Allocate<char*>(numRanks);
multivarObjs = Allocate<char**>(numvars);
blockTypes = Allocate<int>(numRanks);
varTypes = Allocate<int>(numRanks);
for(int v = 0; v < numvars; ++v)
{
multivarObjs[v] = Allocate<char*>(numRanks);
}
for(int i = 0; i < numRanks; ++i)
{
multimeshObjs[i] = Allocate<char>(64);
multimatObjs[i] = Allocate<char>(64);
for(int v = 0; v < numvars; ++v)
{
multivarObjs[v][i] = Allocate<char>(64);
}
blockTypes[i] = DB_UCDMESH;
varTypes[i] = DB_UCDVAR;
}
// Build up the multiobject names
for(int i = 0; i < numRanks; ++i)
{
# if USE_MPI
int iorank = PMPIO_GroupRank(bat, i);
# else
int iorank = 0;
# endif
// delete multivarObjs[i];
if(iorank == 0)
{
snprintf(multimeshObjs[i], 64, "/data_%d/mesh", i);
snprintf(multimatObjs[i], 64, "/data_%d/regions", i);
for(int v = 0; v < numvars; ++v)
{
snprintf(multivarObjs[v][i], 64, "/data_%d/%s", i, vars[v]);
}
}
else
{
snprintf(multimeshObjs[i], 64, "%s.%03d:/data_%d/mesh", basename, iorank, i);
snprintf(multimatObjs[i], 64, "%s.%03d:/data_%d/regions", basename, iorank,
i);
for(int v = 0; v < numvars; ++v)
{
snprintf(multivarObjs[v][i], 64, "%s.%03d:/data_%d/%s", basename, iorank,
i, vars[v]);
}
}
}
// Now write out the objects
ok += DBPutMultimesh(db, "mesh", numRanks, (char**) multimeshObjs, blockTypes, NULL);
ok += DBPutMultimat(db, "regions", numRanks, (char**) multimatObjs, NULL);
for(int v = 0; v < numvars; ++v)
{
ok += DBPutMultivar(db, vars[v], numRanks, (char**) multivarObjs[v], varTypes,
NULL);
}
for(int v = 0; v < numvars; ++v)
{
for(int i = 0; i < numRanks; i++)
{
Release<char>(&multivarObjs[v][i]);
}
Release<char*>(&multivarObjs[v]);
}
// Clean up
for(int i = 0; i < numRanks; i++)
{
Release<char>(&multimeshObjs[i]);
Release<char>(&multimatObjs[i]);
}
Release<char*>(&multimeshObjs);
Release<char*>(&multimatObjs);
Release<char**>(&multivarObjs);
Release<int>(&blockTypes);
Release<int>(&varTypes);
if(ok != 0)
{
printf("Error writing out multiXXX objs to viz file - rank 0\n");
}
}
# if USE_MPI
/**********************************************************************/
static void*
LULESH_PMPIO_Create(const char* fname, const char* dname, void* udata)
{
/* Create the file */
DBfile* db = DBCreate(fname, DB_CLOBBER, DB_LOCAL, NULL, DB_HDF5X);
/* Put the data in a subdirectory, so VisIt only sees the multimesh
* objects we write out in the base file */
if(db)
{
DBMkDir(db, dname);
DBSetDir(db, dname);
}
return (void*) db;
}
/**********************************************************************/
static void*
LULESH_PMPIO_Open(const char* fname, const char* dname, PMPIO_iomode_t ioMode,
void* udata)
{
/* Open the file */
DBfile* db = DBOpen(fname, DB_UNKNOWN, DB_APPEND);
/* Put the data in a subdirectory, so VisIt only sees the multimesh
* objects we write out in the base file */
if(db)
{
DBMkDir(db, dname);
DBSetDir(db, dname);
}
return (void*) db;
}
/**********************************************************************/
static void
LULESH_PMPIO_Close(void* file, void* udata)
{
DBfile* db = (DBfile*) file;
if(db)
DBClose(db);
}
# endif
#else
void
DumpToVisit(Domain& domain, int numFiles, int myRank, int numRanks)
{
if(myRank == 0)
{
printf("Must enable -DVIZ_MESH at compile time to call DumpDomain\n");
}
}
#endif
Разница между файлами не показана из-за своего большого размера Загрузить разницу
+836
Просмотреть файл
@@ -0,0 +1,836 @@
#if !defined(USE_MPI)
# error "You should specify USE_MPI=0 or USE_MPI=1 on the compile line"
#endif
// OpenMP will be compiled in if this flag is set to 1 AND the compiler beging
// used supports it (i.e. the _OPENMP symbol is defined)
#define USE_OMP 1
#if USE_MPI
# include <mpi.h>
/*
define one of these three symbols:
SEDOV_SYNC_POS_VEL_NONE
SEDOV_SYNC_POS_VEL_EARLY
SEDOV_SYNC_POS_VEL_LATE
*/
# define SEDOV_SYNC_POS_VEL_EARLY 1
#endif
#include <Kokkos_Core.hpp>
#include <Kokkos_Vector.hpp>
#include <math.h>
#include <vector>
//**************************************************
// Allow flexibility for arithmetic representations
//**************************************************
#define MAX(a, b) (((a) > (b)) ? (a) : (b))
// Precision specification
typedef float real4;
typedef double real8;
typedef long double real10; // 10 bytes on x86
typedef int Index_t; // array subscript and loop index
typedef real8 Real_t; // floating point representation
typedef int Int_t; // integer representation
enum
{
VolumeError = -1,
QStopError = -2
};
inline real4
SQRT(real4 arg)
{
return sqrtf(arg);
}
inline real8
SQRT(real8 arg)
{
return sqrt(arg);
}
inline real10
SQRT(real10 arg)
{
return sqrtl(arg);
}
inline real4
CBRT(real4 arg)
{
return cbrtf(arg);
}
inline real8
CBRT(real8 arg)
{
return cbrt(arg);
}
inline real10
CBRT(real10 arg)
{
return cbrtl(arg);
}
inline real4
FABS(real4 arg)
{
return fabsf(arg);
}
inline real8
FABS(real8 arg)
{
return fabs(arg);
}
inline real10
FABS(real10 arg)
{
return fabsl(arg);
}
// Stuff needed for boundary conditions
// 2 BCs on each of 6 hexahedral faces (12 bits)
#define XI_M 0x00007
#define XI_M_SYMM 0x00001
#define XI_M_FREE 0x00002
#define XI_M_COMM 0x00004
#define XI_P 0x00038
#define XI_P_SYMM 0x00008
#define XI_P_FREE 0x00010
#define XI_P_COMM 0x00020
#define ETA_M 0x001c0
#define ETA_M_SYMM 0x00040
#define ETA_M_FREE 0x00080
#define ETA_M_COMM 0x00100
#define ETA_P 0x00e00
#define ETA_P_SYMM 0x00200
#define ETA_P_FREE 0x00400
#define ETA_P_COMM 0x00800
#define ZETA_M 0x07000
#define ZETA_M_SYMM 0x01000
#define ZETA_M_FREE 0x02000
#define ZETA_M_COMM 0x04000
#define ZETA_P 0x38000
#define ZETA_P_SYMM 0x08000
#define ZETA_P_FREE 0x10000
#define ZETA_P_COMM 0x20000
// MPI Message Tags
#define MSG_COMM_SBN 1024
#define MSG_SYNC_POS_VEL 2048
#define MSG_MONOQ 3072
#define MAX_FIELDS_PER_MPI_COMM 6
// Assume 128 byte coherence
// Assume Real_t is an "integral power of 2" bytes wide
#define CACHE_COHERENCE_PAD_REAL (128 / sizeof(Real_t))
#define CACHE_ALIGN_REAL(n) \
(((n) + (CACHE_COHERENCE_PAD_REAL - 1)) & ~(CACHE_COHERENCE_PAD_REAL - 1))
//////////////////////////////////////////////////////
// Primary data structure
//////////////////////////////////////////////////////
/*
* The implementation of the data abstraction used for lulesh
* resides entirely in the Domain class below. You can change
* grouping and interleaving of fields here to maximize data layout
* efficiency for your underlying architecture or compiler.
*
* For example, fields can be implemented as STL objects or
* raw array pointers. As another example, individual fields
* m_x, m_y, m_z could be budled into
*
* struct { Real_t x, y, z ; } *m_coord ;
*
* allowing accessor functions such as
*
* "Real_t &x(Index_t idx) { return m_coord[idx].x ; }"
* "Real_t &y(Index_t idx) { return m_coord[idx].y ; }"
* "Real_t &z(Index_t idx) { return m_coord[idx].z ; }"
*/
class Domain
{
public:
// Constructor
Domain(Int_t numRanks, Index_t colLoc, Index_t rowLoc, Index_t planeLoc, Index_t nx,
Int_t tp, Int_t nr, Int_t balance, Int_t cost);
// Destructor
~Domain();
//
// ALLOCATION
//
void AllocateNodePersistent(Int_t numNode) // Node-centered
{
m_x.resize(numNode); // coordinates
m_y.resize(numNode);
m_z.resize(numNode);
m_xd.resize(numNode); // velocities
m_yd.resize(numNode);
m_zd.resize(numNode);
m_xdd.resize(numNode); // accelerations
m_ydd.resize(numNode);
m_zdd.resize(numNode);
m_fx.resize(numNode); // forces
m_fy.resize(numNode);
m_fz.resize(numNode);
m_nodalMass.resize(numNode); // mass
m_c_x = m_x.d_view;
m_c_y = m_y.d_view;
m_c_z = m_z.d_view;
m_c_xd = m_xd.d_view;
m_c_yd = m_yd.d_view;
m_c_zd = m_zd.d_view;
}
void AllocateElemPersistent(Int_t numElem) // Elem-centered
{
m_nodelist.resize(8 * numElem);
// elem connectivities through face
m_lxim.resize(numElem);
m_lxip.resize(numElem);
m_letam.resize(numElem);
m_letap.resize(numElem);
m_lzetam.resize(numElem);
m_lzetap.resize(numElem);
m_elemBC.resize(numElem);
m_e.resize(numElem);
m_p.resize(numElem);
m_q.resize(numElem);
m_ql.resize(numElem);
m_qq.resize(numElem);
m_v.resize(numElem);
m_volo.resize(numElem);
m_delv.resize(numElem);
m_vdov.resize(numElem);
m_arealg.resize(numElem);
m_ss.resize(numElem);
m_elemMass.resize(numElem);
m_vnew.resize(numElem);
m_c_e = m_e.d_view;
m_c_p = m_p.d_view;
m_c_q = m_q.d_view;
m_c_ql = m_ql.d_view;
m_c_qq = m_qq.d_view;
m_c_delv = m_delv.d_view;
}
void AllocateGradients(Int_t numElem, Int_t allElem)
{
// Position gradients
m_delx_xi.resize(numElem);
m_delx_eta.resize(numElem);
m_delx_zeta.resize(numElem);
// Velocity gradients
m_delv_xi.resize(allElem);
m_delv_eta.resize(allElem);
m_delv_zeta.resize(allElem);
}
void DeallocateGradients()
{
m_delx_zeta.clear();
m_delx_eta.clear();
m_delx_xi.clear();
m_delv_zeta.clear();
m_delv_eta.clear();
m_delv_xi.clear();
}
void AllocateStrains(Int_t numElem)
{
m_dxx.resize(numElem);
m_dyy.resize(numElem);
m_dzz.resize(numElem);
}
void DeallocateStrains()
{
m_dzz.clear();
m_dyy.clear();
m_dxx.clear();
}
//
// ACCESSORS
//
// Node-centered
// Nodal coordinates
KOKKOS_INLINE_FUNCTION Real_t& x(const Index_t idx) const { return m_x[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& y(const Index_t idx) const { return m_y[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& z(const Index_t idx) const { return m_z[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_x(const Index_t idx) const { return m_c_x[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_y(const Index_t idx) const { return m_c_y[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_z(const Index_t idx) const { return m_c_z[idx]; }
// Nodal velocities
KOKKOS_INLINE_FUNCTION Real_t& xd(const Index_t idx) const { return m_xd[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& yd(const Index_t idx) const { return m_yd[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& zd(const Index_t idx) const { return m_zd[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_xd(const Index_t idx) const { return m_c_xd[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_yd(const Index_t idx) const { return m_c_yd[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_zd(const Index_t idx) const { return m_c_zd[idx]; }
// Nodal accelerations
KOKKOS_INLINE_FUNCTION Real_t& xdd(const Index_t idx) const { return m_xdd[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& ydd(const Index_t idx) const { return m_ydd[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& zdd(const Index_t idx) const { return m_zdd[idx]; }
// Nodal forces
KOKKOS_INLINE_FUNCTION Real_t& fx(const Index_t idx) const { return m_fx[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& fy(const Index_t idx) const { return m_fy[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& fz(const Index_t idx) const { return m_fz[idx]; }
// Nodal mass
KOKKOS_INLINE_FUNCTION Real_t& nodalMass(const Index_t idx) const
{
return m_nodalMass[idx];
}
// Nodes on symmertry planes
Index_t symmX(const Index_t idx) const { return m_symmX[idx]; }
Index_t symmY(const Index_t idx) const { return m_symmY[idx]; }
Index_t symmZ(const Index_t idx) const { return m_symmZ[idx]; }
bool symmXempty() { return m_symmX.empty(); }
bool symmYempty() { return m_symmY.empty(); }
bool symmZempty() { return m_symmZ.empty(); }
//
// Element-centered
//
Index_t& regElemSize(Index_t idx) { return m_regElemSize[idx]; }
Index_t& regNumList(Index_t idx) { return m_regNumList[idx]; }
Index_t* regNumList() { return &m_regNumList[0]; }
Index_t* regElemlist(Int_t r) { return m_regElemlist[r]; }
Index_t& regElemlist(const Int_t r, Index_t idx) const
{
return m_regElemlist[r][idx];
}
Index_t* nodelist(Index_t idx) const { return &m_nodelist[Index_t(8) * idx]; }
// elem connectivities through face
Index_t& lxim(const Index_t idx) const { return m_lxim[idx]; }
Index_t& lxip(const Index_t idx) const { return m_lxip[idx]; }
Index_t& letam(const Index_t idx) const { return m_letam[idx]; }
Index_t& letap(const Index_t idx) const { return m_letap[idx]; }
Index_t& lzetam(const Index_t idx) const { return m_lzetam[idx]; }
Index_t& lzetap(const Index_t idx) const { return m_lzetap[idx]; }
// elem face symm/free-surface flag
Int_t& elemBC(const Index_t idx) const { return m_elemBC[idx]; }
// Principal strains - temporary
KOKKOS_INLINE_FUNCTION Real_t& dxx(const Index_t idx) const { return m_dxx[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& dyy(const Index_t idx) const { return m_dyy[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& dzz(const Index_t idx) const { return m_dzz[idx]; }
// New relative volume - temporary
KOKKOS_INLINE_FUNCTION Real_t& vnew(const Index_t idx) const { return m_vnew[idx]; }
// Velocity gradient - temporary
KOKKOS_INLINE_FUNCTION Real_t& delv_xi(const Index_t idx) const
{
return m_delv_xi[idx];
}
KOKKOS_INLINE_FUNCTION Real_t& delv_eta(const Index_t idx) const
{
return m_delv_eta[idx];
}
KOKKOS_INLINE_FUNCTION Real_t& delv_zeta(const Index_t idx) const
{
return m_delv_zeta[idx];
}
// Position gradient - temporary
KOKKOS_INLINE_FUNCTION Real_t& delx_xi(const Index_t idx) const
{
return m_delx_xi[idx];
}
KOKKOS_INLINE_FUNCTION Real_t& delx_eta(const Index_t idx) const
{
return m_delx_eta[idx];
}
KOKKOS_INLINE_FUNCTION Real_t& delx_zeta(const Index_t idx) const
{
return m_delx_zeta[idx];
}
// Energy
KOKKOS_INLINE_FUNCTION Real_t& e(const Index_t idx) const { return m_e[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_e(const Index_t idx) const { return m_c_e[idx]; }
// Pressure
KOKKOS_INLINE_FUNCTION Real_t& p(const Index_t idx) const { return m_p[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_p(const Index_t idx) const { return m_c_p[idx]; }
// Artificial viscosity
KOKKOS_INLINE_FUNCTION Real_t& q(const Index_t idx) const { return m_q[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_q(const Index_t idx) const { return m_c_q[idx]; }
// Linear term for q
KOKKOS_INLINE_FUNCTION Real_t& ql(const Index_t idx) const { return m_ql[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_ql(const Index_t idx) const { return m_c_ql[idx]; }
// Quadratic term for q
KOKKOS_INLINE_FUNCTION Real_t& qq(const Index_t idx) const { return m_qq[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_qq(const Index_t idx) const { return m_c_qq[idx]; }
// Relative volume
KOKKOS_INLINE_FUNCTION Real_t& v(const Index_t idx) const { return m_v[idx]; }
KOKKOS_INLINE_FUNCTION Real_t& delv(const Index_t idx) const { return m_delv[idx]; }
KOKKOS_INLINE_FUNCTION Real_t c_delv(const Index_t idx) const
{
return m_c_delv[idx];
}
// Reference volume
KOKKOS_INLINE_FUNCTION Real_t& volo(Index_t idx) const { return m_volo[idx]; }
// volume derivative over volume
KOKKOS_INLINE_FUNCTION Real_t& vdov(Index_t idx) const { return m_vdov[idx]; }
// Element characteristic length
KOKKOS_INLINE_FUNCTION Real_t& arealg(Index_t idx) const { return m_arealg[idx]; }
// Sound speed
KOKKOS_INLINE_FUNCTION Real_t& ss(const Index_t idx) const { return m_ss[idx]; }
// Element mass
KOKKOS_INLINE_FUNCTION Real_t& elemMass(const Index_t idx) const
{
return m_elemMass[idx];
}
KOKKOS_INLINE_FUNCTION Index_t nodeElemCount(Index_t idx) const
{
return m_nodeElemStart[idx + 1] - m_nodeElemStart[idx];
}
KOKKOS_INLINE_FUNCTION Index_t* nodeElemCornerList(Index_t idx) const
{
return &m_nodeElemCornerList[m_nodeElemStart[idx]];
}
// Parameters
// Cutoffs
KOKKOS_INLINE_FUNCTION Real_t u_cut() const { return m_u_cut; }
KOKKOS_INLINE_FUNCTION Real_t e_cut() const { return m_e_cut; }
KOKKOS_INLINE_FUNCTION Real_t p_cut() const { return m_p_cut; }
KOKKOS_INLINE_FUNCTION Real_t q_cut() const { return m_q_cut; }
KOKKOS_INLINE_FUNCTION Real_t v_cut() const { return m_v_cut; }
// Other constants (usually are settable via input file in real codes)
KOKKOS_INLINE_FUNCTION Real_t hgcoef() const { return m_hgcoef; }
KOKKOS_INLINE_FUNCTION Real_t qstop() const { return m_qstop; }
KOKKOS_INLINE_FUNCTION Real_t monoq_max_slope() const { return m_monoq_max_slope; }
KOKKOS_INLINE_FUNCTION Real_t monoq_limiter_mult() const
{
return m_monoq_limiter_mult;
}
KOKKOS_INLINE_FUNCTION Real_t ss4o3() const { return m_ss4o3; }
KOKKOS_INLINE_FUNCTION Real_t qlc_monoq() const { return m_qlc_monoq; }
KOKKOS_INLINE_FUNCTION Real_t qqc_monoq() const { return m_qqc_monoq; }
KOKKOS_INLINE_FUNCTION Real_t qqc() const { return m_qqc; }
KOKKOS_INLINE_FUNCTION Real_t eosvmax() const { return m_eosvmax; }
KOKKOS_INLINE_FUNCTION Real_t eosvmin() const { return m_eosvmin; }
KOKKOS_INLINE_FUNCTION Real_t pmin() const { return m_pmin; }
KOKKOS_INLINE_FUNCTION Real_t emin() const { return m_emin; }
KOKKOS_INLINE_FUNCTION Real_t dvovmax() const { return m_dvovmax; }
KOKKOS_INLINE_FUNCTION Real_t refdens() const { return m_refdens; }
// Timestep controls, etc...
Real_t& time() { return m_time; }
Real_t& deltatime() { return m_deltatime; }
Real_t& deltatimemultlb() { return m_deltatimemultlb; }
Real_t& deltatimemultub() { return m_deltatimemultub; }
Real_t& stoptime() { return m_stoptime; }
Real_t& dtcourant() { return m_dtcourant; }
Real_t& dthydro() { return m_dthydro; }
Real_t& dtmax() { return m_dtmax; }
Real_t& dtfixed() { return m_dtfixed; }
Int_t& cycle() { return m_cycle; }
Index_t& numRanks() { return m_numRanks; }
Index_t& colLoc() { return m_colLoc; }
Index_t& rowLoc() { return m_rowLoc; }
Index_t& planeLoc() { return m_planeLoc; }
Index_t& tp() { return m_tp; }
Index_t& sizeX() { return m_sizeX; }
Index_t& sizeY() { return m_sizeY; }
Index_t& sizeZ() { return m_sizeZ; }
Index_t& numReg() { return m_numReg; }
Int_t& cost() { return m_cost; }
Index_t& numElem() { return m_numElem; }
Index_t& numNode() { return m_numNode; }
Index_t& maxPlaneSize() { return m_maxPlaneSize; }
Index_t& maxEdgeSize() { return m_maxEdgeSize; }
//
// MPI-Related additional data
//
#if USE_MPI
// Communication Work space
Real_t* commDataSend;
Real_t* commDataRecv;
// Maximum number of block neighbors
MPI_Request recvRequest[26]; // 6 faces + 12 edges + 8 corners
MPI_Request sendRequest[26]; // 6 faces + 12 edges + 8 corners
#endif
private:
void BuildMesh(Int_t nx, Int_t edgeNodes, Int_t edgeElems);
void SetupThreadSupportStructures();
void CreateRegionIndexSets(Int_t nreg, Int_t balance);
void SetupCommBuffers(Int_t edgeNodes);
void SetupSymmetryPlanes(Int_t edgeNodes);
void SetupElementConnectivities(Int_t edgeElems);
void SetupBoundaryConditions(Int_t edgeElems);
//
// IMPLEMENTATION
//
/* Node-centered */
Kokkos::vector<Real_t> m_x; /* coordinates */
Kokkos::vector<Real_t> m_y;
Kokkos::vector<Real_t> m_z;
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_x; /* coordinates */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_y; /* coordinates */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_z; /* coordinates */
Kokkos::vector<Real_t> m_xd; /* velocities */
Kokkos::vector<Real_t> m_yd;
Kokkos::vector<Real_t> m_zd;
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_xd; /* coordinates */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_yd; /* coordinates */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_zd; /* coordinates */
Kokkos::vector<Real_t> m_xdd; /* accelerations */
Kokkos::vector<Real_t> m_ydd;
Kokkos::vector<Real_t> m_zdd;
Kokkos::vector<Real_t> m_fx; /* forces */
Kokkos::vector<Real_t> m_fy;
Kokkos::vector<Real_t> m_fz;
Kokkos::vector<Real_t> m_nodalMass; /* mass */
Kokkos::vector<Index_t> m_symmX; /* symmetry plane nodesets */
Kokkos::vector<Index_t> m_symmY;
Kokkos::vector<Index_t> m_symmZ;
// Element-centered
// Region information
Int_t m_numReg;
Int_t m_cost; // imbalance cost
Index_t* m_regElemSize; // Size of region sets
Index_t* m_regNumList; // Region number per domain element
Index_t** m_regElemlist; // region indexset
Kokkos::vector<Index_t> m_nodelist; /* elemToNode connectivity */
Kokkos::vector<Index_t> m_lxim; /* element connectivity across each face */
Kokkos::vector<Index_t> m_lxip;
Kokkos::vector<Index_t> m_letam;
Kokkos::vector<Index_t> m_letap;
Kokkos::vector<Index_t> m_lzetam;
Kokkos::vector<Index_t> m_lzetap;
Kokkos::vector<Int_t> m_elemBC; /* symmetry/free-surface flags for each elem face */
Kokkos::vector<Real_t> m_dxx; /* principal strains -- temporary */
Kokkos::vector<Real_t> m_dyy;
Kokkos::vector<Real_t> m_dzz;
Kokkos::vector<Real_t> m_delv_xi; /* velocity gradient -- temporary */
Kokkos::vector<Real_t> m_delv_eta;
Kokkos::vector<Real_t> m_delv_zeta;
Kokkos::vector<Real_t> m_delx_xi; /* coordinate gradient -- temporary */
Kokkos::vector<Real_t> m_delx_eta;
Kokkos::vector<Real_t> m_delx_zeta;
Kokkos::vector<Real_t> m_e; /* energy */
Kokkos::vector<Real_t> m_p; /* pressure */
Kokkos::vector<Real_t> m_q; /* q */
Kokkos::vector<Real_t> m_ql; /* linear term for q */
Kokkos::vector<Real_t> m_qq; /* quadratic term for q */
Kokkos::vector<Real_t> m_v; /* relative volume */
Kokkos::vector<Real_t> m_volo; /* reference volume */
Kokkos::vector<Real_t> m_vnew; /* new relative volume -- temporary */
Kokkos::vector<Real_t> m_delv; /* m_vnew - m_v */
Kokkos::vector<Real_t> m_vdov; /* volume derivative over volume */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_e; /* coordinates */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_p; /* coordinates */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_q; /* coordinates */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_ql; /* coordinates */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_qq; /* coordinates */
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
m_c_delv; /* coordinates */
Kokkos::vector<Real_t> m_arealg; /* characteristic length of an element */
Kokkos::vector<Real_t> m_ss; /* "sound speed" */
Kokkos::vector<Real_t> m_elemMass; /* mass */
// Cutoffs (treat as constants)
const Real_t m_e_cut; // energy tolerance
const Real_t m_p_cut; // pressure tolerance
const Real_t m_q_cut; // q tolerance
const Real_t m_v_cut; // relative volume tolerance
const Real_t m_u_cut; // velocity tolerance
// Other constants (usually setable, but hardcoded in this proxy app)
const Real_t m_hgcoef; // hourglass control
const Real_t m_ss4o3;
const Real_t m_qstop; // excessive q indicator
const Real_t m_monoq_max_slope;
const Real_t m_monoq_limiter_mult;
const Real_t m_qlc_monoq; // linear term coef for q
const Real_t m_qqc_monoq; // quadratic term coef for q
const Real_t m_qqc;
const Real_t m_eosvmax;
const Real_t m_eosvmin;
const Real_t m_pmin; // pressure floor
const Real_t m_emin; // energy floor
const Real_t m_dvovmax; // maximum allowable volume change
const Real_t m_refdens; // reference density
// Variables to keep track of timestep, simulation time, and cycle
Real_t m_dtcourant; // courant constraint
Real_t m_dthydro; // volume change constraint
Int_t m_cycle; // iteration count for simulation
Real_t m_dtfixed; // fixed time increment
Real_t m_time; // current time
Real_t m_deltatime; // variable time increment
Real_t m_deltatimemultlb;
Real_t m_deltatimemultub;
Real_t m_dtmax; // maximum allowable time increment
Real_t m_stoptime; // end time for simulation
Int_t m_numRanks;
Index_t m_colLoc;
Index_t m_rowLoc;
Index_t m_planeLoc;
Index_t m_tp;
Index_t m_sizeX;
Index_t m_sizeY;
Index_t m_sizeZ;
Index_t m_numElem;
Index_t m_numNode;
Index_t m_maxPlaneSize;
Index_t m_maxEdgeSize;
// OMP hack
Index_t* m_nodeElemStart;
Index_t* m_nodeElemCornerList;
// Used in setup
Index_t m_rowMin, m_rowMax;
Index_t m_colMin, m_colMax;
Index_t m_planeMin, m_planeMax;
};
typedef Real_t& (Domain::*Domain_member)(Index_t) const;
struct cmdLineOpts
{
Int_t its; // -i
Int_t nx; // -s
Int_t numReg; // -r
Int_t numFiles; // -f
Int_t showProg; // -p
Int_t quiet; // -q
Int_t viz; // -v
Int_t cost; // -c
Int_t balance; // -b
Int_t do_atomic; // -a
};
// Function Prototypes
// lulesh-par
/*Real_t CalcElemVolume( const Real_t x[8],
const Real_t y[8],
const Real_t z[8]);*/
// lulesh-util
void
ParseCommandLineOptions(int argc, char* argv[], Int_t myRank, struct cmdLineOpts* opts);
void
VerifyAndWriteFinalOutput(Real_t elapsed_time, Domain& locDom, Int_t nx, Int_t numRanks);
// lulesh-viz
void
DumpToVisit(Domain& domain, int numFiles, int myRank, int numRanks);
// lulesh-comm
void
CommRecv(Domain& domain, Int_t msgType, Index_t xferFields, Index_t dx, Index_t dy,
Index_t dz, bool doRecv, bool planeOnly);
void
CommSend(Domain& domain, Int_t msgType, Index_t xferFields, Domain_member* fieldData,
Index_t dx, Index_t dy, Index_t dz, bool doSend, bool planeOnly);
void
CommSBN(Domain& domain, Int_t xferFields, Domain_member* fieldData);
void
CommSyncPosVel(Domain& domain);
void
CommMonoQ(Domain& domain);
// lulesh-init
void
InitMeshDecomp(Int_t numRanks, Int_t myRank, Int_t* col, Int_t* row, Int_t* plane,
Int_t* side);
/*********************************/
/* Data structure implementation */
/*********************************/
/* might want to add access methods so that memory can be */
/* better managed, as in luleshFT */
template <typename T>
T*
Allocate(size_t size)
{
return static_cast<T*>(Kokkos::kokkos_malloc<>(sizeof(T) * size));
}
template <typename T>
void
Release(T** ptr)
{
if(*ptr != NULL)
{
Kokkos::kokkos_free<>(*ptr);
*ptr = NULL;
}
}
struct MinFinder
{
Real_t val;
int i;
KOKKOS_INLINE_FUNCTION
MinFinder()
: val(100000000000000000000.0000)
, i(-1)
{}
KOKKOS_INLINE_FUNCTION
MinFinder(const double& val_, const int& i_)
: val(val_)
, i(i_)
{}
KOKKOS_INLINE_FUNCTION
MinFinder(const MinFinder& src)
: val(src.val)
, i(src.i)
{}
// overloading += operator to do the max assignment
KOKKOS_INLINE_FUNCTION
void operator+=(MinFinder& src)
{
if(src.val < val)
{
val = src.val;
i = src.i;
}
}
KOKKOS_INLINE_FUNCTION
void operator+=(const volatile MinFinder& src) volatile
{
if(src.val < val)
{
val = src.val;
i = src.i;
}
}
};
struct reduce_double3
{
double x, y, z;
KOKKOS_INLINE_FUNCTION
reduce_double3()
{
x = 0.0;
y = 0.0;
z = 0.0;
}
KOKKOS_INLINE_FUNCTION
void operator+=(const reduce_double3& src)
{
x += src.x;
y += src.y;
z += src.z;
}
};
+651
Просмотреть файл
@@ -0,0 +1,651 @@
#if !defined(USE_MPI)
# error "You should specify USE_MPI=0 or USE_MPI=1 on the compile line"
#endif
// OpenMP will be compiled in if this flag is set to 1 AND the compiler beging
// used supports it (i.e. the _OPENMP symbol is defined)
#define USE_OMP 1
#if USE_MPI
# include <mpi.h>
#endif
#include <mpi.h>
/*
define one of these three symbols:
SEDOV_SYNC_POS_VEL_NONE
SEDOV_SYNC_POS_VEL_EARLY
SEDOV_SYNC_POS_VEL_LATE
*/
#define SEDOV_SYNC_POS_VEL_EARLY 1
#include <math.h>
#include <vector>
//**************************************************
// Allow flexibility for arithmetic representations
//**************************************************
#define MAX(a, b) (((a) > (b)) ? (a) : (b))
// Precision specification
typedef float real4;
typedef double real8;
typedef long double real10; // 10 bytes on x86
typedef int Index_t; // array subscript and loop index
typedef real8 Real_t; // floating point representation
typedef int Int_t; // integer representation
enum
{
VolumeError = -1,
QStopError = -2
};
inline real4
SQRT(real4 arg)
{
return sqrtf(arg);
}
inline real8
SQRT(real8 arg)
{
return sqrt(arg);
}
inline real10
SQRT(real10 arg)
{
return sqrtl(arg);
}
inline real4
CBRT(real4 arg)
{
return cbrtf(arg);
}
inline real8
CBRT(real8 arg)
{
return cbrt(arg);
}
inline real10
CBRT(real10 arg)
{
return cbrtl(arg);
}
inline real4
FABS(real4 arg)
{
return fabsf(arg);
}
inline real8
FABS(real8 arg)
{
return fabs(arg);
}
inline real10
FABS(real10 arg)
{
return fabsl(arg);
}
// Stuff needed for boundary conditions
// 2 BCs on each of 6 hexahedral faces (12 bits)
#define XI_M 0x00007
#define XI_M_SYMM 0x00001
#define XI_M_FREE 0x00002
#define XI_M_COMM 0x00004
#define XI_P 0x00038
#define XI_P_SYMM 0x00008
#define XI_P_FREE 0x00010
#define XI_P_COMM 0x00020
#define ETA_M 0x001c0
#define ETA_M_SYMM 0x00040
#define ETA_M_FREE 0x00080
#define ETA_M_COMM 0x00100
#define ETA_P 0x00e00
#define ETA_P_SYMM 0x00200
#define ETA_P_FREE 0x00400
#define ETA_P_COMM 0x00800
#define ZETA_M 0x07000
#define ZETA_M_SYMM 0x01000
#define ZETA_M_FREE 0x02000
#define ZETA_M_COMM 0x04000
#define ZETA_P 0x38000
#define ZETA_P_SYMM 0x08000
#define ZETA_P_FREE 0x10000
#define ZETA_P_COMM 0x20000
// MPI Message Tags
#define MSG_COMM_SBN 1024
#define MSG_SYNC_POS_VEL 2048
#define MSG_MONOQ 3072
#define MAX_FIELDS_PER_MPI_COMM 6
// Assume 128 byte coherence
// Assume Real_t is an "integral power of 2" bytes wide
#define CACHE_COHERENCE_PAD_REAL (128 / sizeof(Real_t))
#define CACHE_ALIGN_REAL(n) \
(((n) + (CACHE_COHERENCE_PAD_REAL - 1)) & ~(CACHE_COHERENCE_PAD_REAL - 1))
//////////////////////////////////////////////////////
// Primary data structure
//////////////////////////////////////////////////////
/*
* The implementation of the data abstraction used for lulesh
* resides entirely in the Domain class below. You can change
* grouping and interleaving of fields here to maximize data layout
* efficiency for your underlying architecture or compiler.
*
* For example, fields can be implemented as STL objects or
* raw array pointers. As another example, individual fields
* m_x, m_y, m_z could be budled into
*
* struct { Real_t x, y, z ; } *m_coord ;
*
* allowing accessor functions such as
*
* "Real_t &x(Index_t idx) { return m_coord[idx].x ; }"
* "Real_t &y(Index_t idx) { return m_coord[idx].y ; }"
* "Real_t &z(Index_t idx) { return m_coord[idx].z ; }"
*/
class Domain
{
public:
// Constructor
Domain(Int_t numRanks, Index_t colLoc, Index_t rowLoc, Index_t planeLoc, Index_t nx,
Int_t tp, Int_t nr, Int_t balance, Int_t cost);
//
// ALLOCATION
//
void AllocateNodePersistent(Int_t numNode) // Node-centered
{
m_coord.resize(numNode); // coordinates
m_vel.resize(numNode); // velocities
m_acc.resize(numNode); // accelerations
m_force.resize(numNode); // forces
m_nodalMass.resize(numNode); // mass
}
void AllocateElemPersistent(Int_t numElem) // Elem-centered
{
m_nodelist.resize(8 * numElem);
// elem connectivities through face
m_faceToElem.resize(numElem);
m_elemBC.resize(numElem);
m_e.resize(numElem);
m_pq.resize(numElem);
m_qlqq.resize(numElem);
m_vol.resize(numElem);
m_delv.resize(numElem);
m_vdov.resize(numElem);
m_arealg.resize(numElem);
m_ss.resize(numElem);
m_elemMass.resize(numElem);
}
void AllocateGradients(Int_t numElem, Int_t allElem)
{
// Position gradients
m_delx_xi.resize(numElem);
m_delx_eta.resize(numElem);
m_delx_zeta.resize(numElem);
// Velocity gradients
m_delv_xi.resize(allElem);
m_delv_eta.resize(allElem);
m_delv_zeta.resize(allElem);
}
void DeallocateGradients()
{
m_delx_zeta.clear();
m_delx_eta.clear();
m_delx_xi.clear();
m_delv_zeta.clear();
m_delv_eta.clear();
m_delv_xi.clear();
}
void AllocateStrains(Int_t numElem)
{
m_dxx.resize(numElem);
m_dyy.resize(numElem);
m_dzz.resize(numElem);
}
void DeallocateStrains()
{
m_dzz.clear();
m_dyy.clear();
m_dxx.clear();
}
//
// ACCESSORS
//
// Node-centered
// Nodal coordinates
Real_t& x(Index_t idx) { return m_coord[idx].x; }
Real_t& y(Index_t idx) { return m_coord[idx].y; }
Real_t& z(Index_t idx) { return m_coord[idx].z; }
// Nodal velocities
Real_t& xd(Index_t idx) { return m_vel[idx].x; }
Real_t& yd(Index_t idx) { return m_vel[idx].y; }
Real_t& zd(Index_t idx) { return m_vel[idx].z; }
// Nodal accelerations
Real_t& xdd(Index_t idx) { return m_acc[idx].x; }
Real_t& ydd(Index_t idx) { return m_acc[idx].y; }
Real_t& zdd(Index_t idx) { return m_acc[idx].z; }
// Nodal forces
Real_t& fx(Index_t idx) { return m_force[idx].x; }
Real_t& fy(Index_t idx) { return m_force[idx].y; }
Real_t& fz(Index_t idx) { return m_force[idx].z; }
// Nodal mass
Real_t& nodalMass(Index_t idx) { return m_nodalMass[idx]; }
// Nodes on symmertry planes
Index_t symmX(Index_t idx) { return m_symmX[idx]; }
Index_t symmY(Index_t idx) { return m_symmY[idx]; }
Index_t symmZ(Index_t idx) { return m_symmZ[idx]; }
bool symmXempty() { return m_symmX.empty(); }
bool symmYempty() { return m_symmY.empty(); }
bool symmZempty() { return m_symmZ.empty(); }
//
// Element-centered
//
Index_t& regElemSize(Index_t idx) { return m_regElemSize[idx]; }
Index_t& regNumList(Index_t idx) { return m_regNumList[idx]; }
Index_t* regNumList() { return &m_regNumList[0]; }
Index_t* regElemlist(Int_t r) { return m_regElemlist[r]; }
Index_t& regElemlist(Int_t r, Index_t idx) { return m_regElemlist[r][idx]; }
Index_t* nodelist(Index_t idx) { return &m_nodelist[Index_t(8) * idx]; }
// elem connectivities through face
Index_t& lxim(Index_t idx) { return m_faceToElem[idx].lxim; }
Index_t& lxip(Index_t idx) { return m_faceToElem[idx].lxip; }
Index_t& letam(Index_t idx) { return m_faceToElem[idx].letam; }
Index_t& letap(Index_t idx) { return m_faceToElem[idx].letap; }
Index_t& lzetam(Index_t idx) { return m_faceToElem[idx].lzetam; }
Index_t& lzetap(Index_t idx) { return m_faceToElem[idx].lzetap; }
// elem face symm/free-surface flag
Int_t& elemBC(Index_t idx) { return m_elemBC[idx]; }
// Principal strains - temporary
Real_t& dxx(Index_t idx) { return m_dxx[idx]; }
Real_t& dyy(Index_t idx) { return m_dyy[idx]; }
Real_t& dzz(Index_t idx) { return m_dzz[idx]; }
// Velocity gradient - temporary
Real_t& delv_xi(Index_t idx) { return m_delv_xi[idx]; }
Real_t& delv_eta(Index_t idx) { return m_delv_eta[idx]; }
Real_t& delv_zeta(Index_t idx) { return m_delv_zeta[idx]; }
// Position gradient - temporary
Real_t& delx_xi(Index_t idx) { return m_delx_xi[idx]; }
Real_t& delx_eta(Index_t idx) { return m_delx_eta[idx]; }
Real_t& delx_zeta(Index_t idx) { return m_delx_zeta[idx]; }
// Energy
Real_t& e(Index_t idx) { return m_e[idx]; }
// Pressure
Real_t& p(Index_t idx) { return m_pq[idx].p; }
// Artificial viscosity
Real_t& q(Index_t idx) { return m_pq[idx].q; }
// Linear term for q
Real_t& ql(Index_t idx) { return m_qlqq[idx].ql; }
// Quadratic term for q
Real_t& qq(Index_t idx) { return m_qlqq[idx].qq; }
Real_t& delv(Index_t idx) { return m_delv[idx]; }
// Relative volume
Real_t& v(Index_t idx) { return m_vol[idx].v; }
// Reference volume
Real_t& volo(Index_t idx) { return m_vol[idx].volo; }
// volume derivative over volume
Real_t& vdov(Index_t idx) { return m_vdov[idx]; }
// Element characteristic length
Real_t& arealg(Index_t idx) { return m_arealg[idx]; }
// Sound speed
Real_t& ss(Index_t idx) { return m_ss[idx]; }
// Element mass
Real_t& elemMass(Index_t idx) { return m_elemMass[idx]; }
Index_t nodeElemCount(Index_t idx)
{
return m_nodeElemStart[idx + 1] - m_nodeElemStart[idx];
}
Index_t* nodeElemCornerList(Index_t idx)
{
return &m_nodeElemCornerList[m_nodeElemStart[idx]];
}
// Parameters
// Cutoffs
Real_t u_cut() const { return m_u_cut; }
Real_t e_cut() const { return m_e_cut; }
Real_t p_cut() const { return m_p_cut; }
Real_t q_cut() const { return m_q_cut; }
Real_t v_cut() const { return m_v_cut; }
// Other constants (usually are settable via input file in real codes)
Real_t hgcoef() const { return m_hgcoef; }
Real_t qstop() const { return m_qstop; }
Real_t monoq_max_slope() const { return m_monoq_max_slope; }
Real_t monoq_limiter_mult() const { return m_monoq_limiter_mult; }
Real_t ss4o3() const { return m_ss4o3; }
Real_t qlc_monoq() const { return m_qlc_monoq; }
Real_t qqc_monoq() const { return m_qqc_monoq; }
Real_t qqc() const { return m_qqc; }
Real_t eosvmax() const { return m_eosvmax; }
Real_t eosvmin() const { return m_eosvmin; }
Real_t pmin() const { return m_pmin; }
Real_t emin() const { return m_emin; }
Real_t dvovmax() const { return m_dvovmax; }
Real_t refdens() const { return m_refdens; }
// Timestep controls, etc...
Real_t& time() { return m_time; }
Real_t& deltatime() { return m_deltatime; }
Real_t& deltatimemultlb() { return m_deltatimemultlb; }
Real_t& deltatimemultub() { return m_deltatimemultub; }
Real_t& stoptime() { return m_stoptime; }
Real_t& dtcourant() { return m_dtcourant; }
Real_t& dthydro() { return m_dthydro; }
Real_t& dtmax() { return m_dtmax; }
Real_t& dtfixed() { return m_dtfixed; }
Int_t& cycle() { return m_cycle; }
Index_t& numRanks() { return m_numRanks; }
Index_t& colLoc() { return m_colLoc; }
Index_t& rowLoc() { return m_rowLoc; }
Index_t& planeLoc() { return m_planeLoc; }
Index_t& tp() { return m_tp; }
Index_t& sizeX() { return m_sizeX; }
Index_t& sizeY() { return m_sizeY; }
Index_t& sizeZ() { return m_sizeZ; }
Index_t& numReg() { return m_numReg; }
Int_t& cost() { return m_cost; }
Index_t& numElem() { return m_numElem; }
Index_t& numNode() { return m_numNode; }
Index_t& maxPlaneSize() { return m_maxPlaneSize; }
Index_t& maxEdgeSize() { return m_maxEdgeSize; }
//
// MPI-Related additional data
//
#if USE_MPI
// Communication Work space
Real_t* commDataSend;
Real_t* commDataRecv;
// Maximum number of block neighbors
MPI_Request recvRequest[26]; // 6 faces + 12 edges + 8 corners
MPI_Request sendRequest[26]; // 6 faces + 12 edges + 8 corners
#endif
private:
void BuildMesh(Int_t nx, Int_t edgeNodes, Int_t edgeElems);
void SetupThreadSupportStructures();
void CreateRegionIndexSets(Int_t nreg, Int_t balance);
void SetupCommBuffers(Int_t edgeNodes);
void SetupSymmetryPlanes(Int_t edgeNodes);
void SetupElementConnectivities(Int_t edgeElems);
void SetupBoundaryConditions(Int_t edgeElems);
//
// IMPLEMENTATION
//
/* Node-centered */
struct Tuple3
{
Real_t x, y, z;
};
Kokkos::vector<Tuple3> m_coord; /* coordinates */
Kokkos::vector<Tuple3> m_vel; /* velocities */
Kokkos::vector<Tuple3> m_acc; /* accelerations */
Kokkos::vector<Tuple3> m_force; /* forces */
Kokkos::vector<Real_t> m_nodalMass; /* mass */
Kokkos::vector<Index_t> m_symmX; /* symmetry plane nodesets */
Kokkos::vector<Index_t> m_symmY;
Kokkos::vector<Index_t> m_symmZ;
// Element-centered
// Region information
Int_t m_numReg;
Int_t m_cost; // imbalance cost
Index_t* m_regElemSize; // Size of region sets
Index_t* m_regNumList; // Region number per domain element
Index_t** m_regElemlist; // region indexset
Kokkos::vector<Index_t> m_nodelist; /* elemToNode connectivity */
struct FaceElemConn
{
Index_t lxim, lxip, letam, letap, lzetam, lzetap;
};
Kokkos::vector<FaceElemConn> m_faceToElem; /* element conn across faces */
Kokkos::vector<Int_t> m_elemBC; /* symmetry/free-surface flags for each elem face */
Kokkos::vector<Real_t> m_dxx; /* principal strains -- temporary */
Kokkos::vector<Real_t> m_dyy;
Kokkos::vector<Real_t> m_dzz;
Kokkos::vector<Real_t> m_delv_xi; /* velocity gradient -- temporary */
Kokkos::vector<Real_t> m_delv_eta;
Kokkos::vector<Real_t> m_delv_zeta;
Kokkos::vector<Real_t> m_delx_xi; /* coordinate gradient -- temporary */
Kokkos::vector<Real_t> m_delx_eta;
Kokkos::vector<Real_t> m_delx_zeta;
Kokkos::vector<Real_t> m_e; /* energy */
struct Pcomponents
{
Real_t p, q;
};
Kokkos::vector<Pcomponents> m_pq; /* pressure and artificial viscosity */
struct Qcomponents
{
Real_t ql, qq;
};
Kokkos::vector<Qcomponents> m_qlqq; /* linear and quadratic terms for q */
struct Volume
{
Real_t v, volo;
};
Kokkos::vector<Volume> m_vol; /* relative and reference volume */
Kokkos::vector<Real_t> m_vnew; /* new relative volume -- temporary */
Kokkos::vector<Real_t> m_delv; /* m_vnew - m_v */
Kokkos::vector<Real_t> m_vdov; /* volume derivative over volume */
Kokkos::vector<Real_t> m_arealg; /* characteristic length of an element */
Kokkos::vector<Real_t> m_ss; /* "sound speed" */
Kokkos::vector<Real_t> m_elemMass; /* mass */
// Cutoffs (treat as constants)
const Real_t m_e_cut; // energy tolerance
const Real_t m_p_cut; // pressure tolerance
const Real_t m_q_cut; // q tolerance
const Real_t m_v_cut; // relative volume tolerance
const Real_t m_u_cut; // velocity tolerance
// Other constants (usually setable, but hardcoded in this proxy app)
const Real_t m_hgcoef; // hourglass control
const Real_t m_ss4o3;
const Real_t m_qstop; // excessive q indicator
const Real_t m_monoq_max_slope;
const Real_t m_monoq_limiter_mult;
const Real_t m_qlc_monoq; // linear term coef for q
const Real_t m_qqc_monoq; // quadratic term coef for q
const Real_t m_qqc;
const Real_t m_eosvmax;
const Real_t m_eosvmin;
const Real_t m_pmin; // pressure floor
const Real_t m_emin; // energy floor
const Real_t m_dvovmax; // maximum allowable volume change
const Real_t m_refdens; // reference density
// Variables to keep track of timestep, simulation time, and cycle
Real_t m_dtcourant; // courant constraint
Real_t m_dthydro; // volume change constraint
Int_t m_cycle; // iteration count for simulation
Real_t m_dtfixed; // fixed time increment
Real_t m_time; // current time
Real_t m_deltatime; // variable time increment
Real_t m_deltatimemultlb;
Real_t m_deltatimemultub;
Real_t m_dtmax; // maximum allowable time increment
Real_t m_stoptime; // end time for simulation
Int_t m_numRanks;
Index_t m_colLoc;
Index_t m_rowLoc;
Index_t m_planeLoc;
Index_t m_tp;
Index_t m_sizeX;
Index_t m_sizeY;
Index_t m_sizeZ;
Index_t m_numElem;
Index_t m_numNode;
Index_t m_maxPlaneSize;
Index_t m_maxEdgeSize;
// OMP hack
Index_t* m_nodeElemStart;
Index_t* m_nodeElemCornerList;
// Used in setup
Index_t m_rowMin, m_rowMax;
Index_t m_colMin, m_colMax;
Index_t m_planeMin, m_planeMax;
};
typedef Real_t& (Domain::*Domain_member)(Index_t);
struct cmdLineOpts
{
Int_t its; // -i
Int_t nx; // -s
Int_t numReg; // -r
Int_t numFiles; // -f
Int_t showProg; // -p
Int_t quiet; // -q
Int_t viz; // -v
Int_t cost; // -c
Int_t balance; // -b
};
// Function Prototypes
// lulesh-par
Real_t
CalcElemVolume(const Real_t x[8], const Real_t y[8], const Real_t z[8]);
// lulesh-util
void
ParseCommandLineOptions(int argc, char* argv[], Int_t myRank, struct cmdLineOpts* opts);
void
VerifyAndWriteFinalOutput(Real_t elapsed_time, Domain& locDom, Int_t nx, Int_t numRanks);
// lulesh-viz
void
DumpToVisit(Domain& domain, int numFiles, int myRank, int numRanks);
// lulesh-comm
void
CommRecv(Domain& domain, Int_t msgType, Index_t xferFields, Index_t dx, Index_t dy,
Index_t dz, bool doRecv, bool planeOnly);
void
CommSend(Domain& domain, Int_t msgType, Index_t xferFields, Domain_member* fieldData,
Index_t dx, Index_t dy, Index_t dz, bool doSend, bool planeOnly);
void
CommSBN(Domain& domain, Int_t xferFields, Domain_member* fieldData);
void
CommSyncPosVel(Domain& domain);
void
CommMonoQ(Domain& domain);
// lulesh-init
void
InitMeshDecomp(Int_t numRanks, Int_t myRank, Int_t* col, Int_t* row, Int_t* plane,
Int_t* side);
+1 -1
Просмотреть файл
@@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.13 FATAL_ERROR)
cmake_minimum_required(VERSION 3.15 FATAL_ERROR)
project(omnitrace-parallel-overhead LANGUAGES CXX)
+5 -1
Просмотреть файл
@@ -36,6 +36,9 @@ main(int argc, char** argv)
if(argc > 2) nthread = atol(argv[2]);
if(argc > 3) nitr = atol(argv[3]);
printf("[%s] Threads: %zu\n[%s] Iterations: %zu\n[%s] fibonacci(%li)...\n", argv[0],
nthread, argv[0], nitr, argv[0], nfib);
std::vector<std::thread> threads{};
for(size_t i = 0; i < nthread; ++i)
{
@@ -43,10 +46,11 @@ main(int argc, char** argv)
threads.emplace_back(&run, _nitr, nfib);
}
run(nitr - 0.25 * nitr, nfib - 0.1 * nfib);
for(auto& itr : threads)
itr.join();
printf("fibonacci(%li) x %lu = %li\n", nfib, nthread, total.load());
printf("[%s] fibonacci(%li) x %lu = %li\n", argv[0], nfib, nthread, total.load());
return 0;
}
+1 -1
Просмотреть файл
@@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.13 FATAL_ERROR)
cmake_minimum_required(VERSION 3.15 FATAL_ERROR)
project(omnitrace-transpose LANGUAGES CXX)
+54 -31
Просмотреть файл
@@ -21,6 +21,7 @@ THE SOFTWARE.
*/
#include "hip/hip_runtime.h"
#include <cfloat>
#include <chrono>
#include <cmath>
@@ -29,14 +30,19 @@ THE SOFTWARE.
#include <fstream>
#include <iomanip>
#include <iostream>
#include <mutex>
#include <thread>
#include <vector>
static std::mutex print_lock{};
using auto_lock_t = std::unique_lock<std::mutex>;
#define HIP_API_CALL(CALL) \
{ \
hipError_t error_ = (CALL); \
if(error_ != hipSuccess) \
{ \
auto_lock_t _lk{ print_lock }; \
fprintf(stderr, "%s:%d :: HIP error : %s\n", __FILE__, __LINE__, \
hipGetErrorString(error_)); \
exit(EXIT_FAILURE); \
@@ -49,6 +55,7 @@ check_hip_error(void)
hipError_t err = hipGetLastError();
if(err != hipSuccess)
{
auto_lock_t _lk{ print_lock };
std::cerr << "Error: " << hipGetErrorString(err) << std::endl;
exit(err);
}
@@ -63,6 +70,7 @@ verify(int* in, int* out, int M, int N)
int col = rand() % N;
if(in[row * N + col] != out[col * M + row])
{
auto_lock_t _lk{ print_lock };
std::cout << "mismatch: " << row << ", " << col << " : " << in[row * N + col]
<< " | " << out[col * M + row] << "\n";
}
@@ -85,19 +93,23 @@ transpose_a(int* in, int* out, int M, int N)
}
void
run(int rank, int argc, char** argv)
run(int rank, int tid, hipStream_t stream, int argc, char** argv)
{
(void) argc;
(void) argv;
unsigned int M = 4960 * 2;
unsigned int N = 4960 * 2;
size_t nitr = 5000;
unsigned int M = 4960 * 2;
unsigned int N = 4960 * 2;
if(argc > 2) nitr = atoll(argv[2]);
auto_lock_t _lk{ print_lock };
std::cout << "[" << rank << "][" << tid << "] M: " << M << " N: " << N << std::endl;
_lk.unlock();
std::cout << "[" << rank << "] M: " << M << " N: " << N << std::endl;
size_t size = sizeof(int) * M * N;
int* matrix = (int*) malloc(size);
int* matrix = new int[size];
for(size_t i = 0; i < M * N; i++)
matrix[i] = rand() % 1002;
int *in, *out;
int* in = nullptr;
int* out = nullptr;
std::chrono::high_resolution_clock::time_point t1, t2;
@@ -106,37 +118,36 @@ run(int rank, int argc, char** argv)
HIP_API_CALL(hipMemset(in, 0, size));
HIP_API_CALL(hipMemset(out, 0, size));
HIP_API_CALL(hipMemcpy(in, matrix, size, hipMemcpyHostToDevice));
HIP_API_CALL(hipDeviceSynchronize());
hipDeviceProp_t props;
HIP_API_CALL(hipGetDeviceProperties(&props, 0));
dim3 grid(M / 32, N / 32, 1);
dim3 block(32, 32, 1); // transpose_a
t1 = std::chrono::high_resolution_clock::now();
const unsigned times = 10000;
auto _func = [&](hipStream_t stream) {
for(size_t i = 0; i < times / 2; i++)
{
transpose_a<<<grid, block, 0, stream>>>(in, out, M, N);
check_hip_error();
}
HIP_API_CALL(hipStreamSynchronize(stream));
};
hipStream_t _stream{};
HIP_API_CALL(hipStreamCreate(&_stream));
std::thread _t{ _func, _stream };
_t.join();
_func(0);
HIP_API_CALL(hipDeviceSynchronize());
t1 = std::chrono::high_resolution_clock::now();
for(size_t i = 0; i < nitr; i++)
{
transpose_a<<<grid, block, 0, stream>>>(in, out, M, N);
check_hip_error();
}
HIP_API_CALL(hipStreamSynchronize(stream));
t2 = std::chrono::high_resolution_clock::now();
double time =
std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1).count();
float GB = (float) size * times * 2 / (1 << 30);
std::cout << "[" << rank << "] Runtime of transpose is " << time << " sec\n"
float GB = (float) size * nitr * 2 / (1 << 30);
print_lock.lock();
std::cout << "[" << rank << "][" << tid << "] Runtime of transpose is " << time
<< " sec\n"
<< "The average performance of transpose is " << GB / time << " GBytes/sec"
<< std::endl;
print_lock.unlock();
int* out_matrix = (int*) malloc(size);
HIP_API_CALL(hipDeviceSynchronize());
int* out_matrix = new int[size];
HIP_API_CALL(hipMemcpy(out_matrix, out, size, hipMemcpyDeviceToHost));
// cpu_transpose(matrix, out_matrix, M, N);
@@ -145,8 +156,8 @@ run(int rank, int argc, char** argv)
HIP_API_CALL(hipFree(in));
HIP_API_CALL(hipFree(out));
free(matrix);
free(out_matrix);
delete[] matrix;
delete[] out_matrix;
}
#if defined(USE_MPI)
@@ -174,12 +185,16 @@ main(int argc, char** argv)
int rank = 0;
int size = 1;
int nthreads = 2;
int nitr = 5000;
if(argc > 1) nthreads = atoi(argv[1]);
if(argc > 2) nitr = atoi(argv[2]);
#if defined(USE_MPI)
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
#else
(void) size;
#endif
// this is a temporary workaround in omnitrace when HIP + MPI is enabled
int ndevice = 0;
@@ -193,16 +208,24 @@ main(int argc, char** argv)
if(rank == devid && rank < ndevice)
{
std::vector<std::thread> _threads{};
std::vector<hipStream_t> _streams(nthreads);
for(int i = 0; i < nthreads; ++i)
HIP_API_CALL(hipStreamCreate(&_streams.at(i)));
for(int i = 1; i < nthreads; ++i)
_threads.emplace_back(run, rank, argc, argv);
run(rank, argc, argv);
_threads.emplace_back(run, rank, i, _streams.at(i), argc, argv);
run(rank, 0, _streams.at(0), argc, argv);
for(auto& itr : _threads)
itr.join();
for(int i = 0; i < nthreads; ++i)
HIP_API_CALL(hipStreamDestroy(_streams.at(i)));
}
#if defined(USE_MPI)
MPI_Barrier(MPI_COMM_WORLD);
do_a2a(rank);
MPI_Finalize();
#endif
HIP_API_CALL(hipDeviceSynchronize());
HIP_API_CALL(hipDeviceReset());
return 0;
}
+1 -1
Submodule projects/rocprofiler-systems/external/PTL updated: dd1b67829c...61f873cf79
Submodule projects/rocprofiler-systems/external/dyninst updated: 076d8bdef4...82b10fdcf5
Submodule projects/rocprofiler-systems/external/timemory updated: c040fe7022...335abea0c5
+337
Просмотреть файл
@@ -0,0 +1,337 @@
// MIT License
//
// Copyright (c) 2020, The Regents of the University of California,
// through Lawrence Berkeley National Laboratory (subject to receipt of any
// required approvals from the U.S. Dept. of Energy). All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to
// deal in the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
// IN THE SOFTWARE.
/** \file timemory/tools/available.hpp
* \headerfile tools/available.hpp "tools/available.hpp"
* Handles serializing the settings
*
*/
#pragma once
#define TIMEMORY_DISABLE_BANNER
#define TIMEMORY_DISABLE_COMPONENT_STORAGE_INIT
#include "timemory/settings/macros.hpp"
#include "timemory/tpls/cereal/archives.hpp"
#include "timemory/tpls/cereal/cereal/external/base64.hpp"
#include "timemory/utility/demangle.hpp"
#include <algorithm>
#include <array>
#include <functional>
#include <iomanip>
#include <sstream>
#include <stack>
#include <string>
#include <tuple>
#include <utility>
#include <vector>
#if !defined(TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME)
# define TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME epilogue
#endif
#if !defined(TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME)
# define TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME prologue
#endif
//======================================================================================//
namespace tim
{
namespace cereal
{
class SettingsTextArchive
: public OutputArchive<SettingsTextArchive>
, public traits::TextArchive
{
public:
using width_type = std::vector<uint64_t>;
using value_type = std::string;
using entry_type = std::map<std::string, value_type>;
using array_type = std::vector<entry_type>;
using unique_set = std::set<std::string>;
using int_stack = std::stack<uint32_t>;
public:
//! Construct, outputting to the provided stream
/// \param stream The array of output data
SettingsTextArchive(array_type& stream, unique_set exclude)
: OutputArchive<SettingsTextArchive>(this)
, output_stream(&stream)
, exclude_stream(std::move(exclude))
{
name_counter.push(0);
}
~SettingsTextArchive() override = default;
void saveBinaryValue(const void* data, size_t size, const char* name = nullptr)
{
setNextName(name);
writeName();
auto base64string =
base64::encode(reinterpret_cast<const unsigned char*>(data), size);
saveValue(base64string);
}
void startNode() { name_counter.push(0); }
void finishNode() { name_counter.pop(); }
//! Sets the name for the next node created with startNode
void setNextName(const char* name)
{
if(exclude_stream.count(name) > 0) return;
if((current_entry != nullptr) && value_keys.count(name) > 0)
{
current_entry->insert({ name, "" });
current_value = &((*current_entry)[name]);
return;
}
if(value_keys.count(name) > 0)
{
return;
}
current_value = nullptr;
output_stream->push_back(entry_type{});
current_entry = &(output_stream->back());
current_entry->insert({ "identifier", name });
std::string func = name;
const std::string prefix = TIMEMORY_SETTINGS_PREFIX;
func = func.erase(0, prefix.length());
std::transform(func.begin(), func.end(), func.begin(),
[](char& c) { return tolower(c); });
{
std::stringstream ss;
ss << "settings::" << func << "()";
current_entry->insert({ "static_accessor", ss.str() });
}
{
std::stringstream ss;
ss << "settings::instance()->get_" << func << "()";
current_entry->insert({ "member_accessor", ss.str() });
}
{
std::stringstream ss;
ss << "settings." << func;
current_entry->insert({ "python_accessor", ss.str() });
}
}
void setNextType(const char*) {}
public:
template <typename Tp>
inline void saveValue(Tp _val)
{
std::stringstream ssval;
ssval << std::boolalpha << _val;
if(current_value)
{
*current_value = ssval.str();
}
}
void writeName() {}
void makeArray() {}
private:
value_type* current_value = nullptr;
entry_type* current_entry = nullptr;
array_type* output_stream = nullptr;
unique_set exclude_stream = {};
int_stack name_counter;
unique_set value_keys = { "name", "value", "description", "count",
"environ", "max_count", "cmdline", "data_type",
"initial", "categories" };
};
//======================================================================================//
//
// prologue and epilogue functions
//
//======================================================================================//
//--------------------------------------------------------------------------------------//
//! Prologue for NVPs for settings archive
/*! NVPs do not start or finish nodes - they just set up the names */
template <typename T>
inline void
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive&, const NameValuePair<T>&)
{}
//--------------------------------------------------------------------------------------//
//! Epilogue for NVPs for settings archive
/*! NVPs do not start or finish nodes - they just set up the names */
template <typename T>
inline void
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive&, const NameValuePair<T>&)
{}
//--------------------------------------------------------------------------------------//
//! Prologue for deferred data for settings archive
/*! Do nothing for the defer wrapper */
template <typename T>
inline void
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive&, const DeferredData<T>&)
{}
//--------------------------------------------------------------------------------------//
//! Epilogue for deferred for settings archive
/*! NVPs do not start or finish nodes - they just set up the names */
template <typename T>
inline void
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive&, const DeferredData<T>&)
{}
//--------------------------------------------------------------------------------------//
//! Prologue for SizeTags for settings archive
/*! SizeTags are ignored */
template <typename T>
inline void
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive& ar, const SizeTag<T>&)
{
ar.makeArray();
}
//--------------------------------------------------------------------------------------//
//! Epilogue for SizeTags for settings archive
/*! SizeTags are ignored */
template <typename T>
inline void
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive&, const SizeTag<T>&)
{}
//--------------------------------------------------------------------------------------//
//! Prologue for all other types for settings archive
/*! Starts a new node, named either automatically or by some NVP,
that may be given data by the type about to be archived*/
template <typename T>
inline void
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive& ar, const T&)
{
ar.startNode();
}
//--------------------------------------------------------------------------------------//
//! Epilogue for all other types other for settings archive
/*! Finishes the node created in the prologue*/
template <typename T>
inline void
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive& ar, const T&)
{
ar.finishNode();
}
//--------------------------------------------------------------------------------------//
//! Prologue for arithmetic types for settings archive
inline void
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive&, const std::nullptr_t&)
{}
//--------------------------------------------------------------------------------------//
//! Epilogue for arithmetic types for settings archive
inline void
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive&, const std::nullptr_t&)
{}
//======================================================================================//
//
// Common serialization functions
//
//======================================================================================//
//! Serializing NVP types
template <typename T>
inline void
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive& ar, const NameValuePair<T>& t)
{
ar.setNextName(t.name);
if(std::is_same<T, std::string>::value)
{
ar.setNextType("string");
}
else
{
ar.setNextType(tim::demangle<T>().c_str());
}
ar(t.value);
}
template <typename CharT, typename Traits, typename Alloc>
inline void
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(
SettingsTextArchive& ar,
const NameValuePair<std::basic_string<CharT, Traits, Alloc>>& t)
{
ar.setNextName(t.name);
ar.setNextType("string");
ar(t.value);
}
//! Saving for nullptr
inline void
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive&, const std::nullptr_t&)
{}
//! Saving for arithmetic
template <typename T, traits::EnableIf<std::is_arithmetic<T>::value> = traits::sfinae>
inline void
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive& ar, const T& t)
{
if(std::is_same<T, std::string>::value) ar.setNextType("string");
ar.saveValue(t);
}
//! saving string
template <typename CharT, typename Traits, typename Alloc>
inline void
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive& ar,
const std::basic_string<CharT, Traits, Alloc>& str)
{
ar.setNextType("string");
ar.saveValue(str);
}
//--------------------------------------------------------------------------------------//
//! Saving SizeTags
template <typename T>
inline void
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive&, const SizeTag<T>&)
{
// nothing to do here, we don't explicitly save the size
}
} // namespace cereal
} // namespace tim
// register archives for polymorphic support
TIMEMORY_CEREAL_REGISTER_ARCHIVE(SettingsTextArchive)
+8 -5
Просмотреть файл
@@ -34,10 +34,10 @@
// clang-format on
#include "library/timemory.hpp"
#include "library/roctracer.hpp"
#include "library/components/roctracer.hpp"
#include "library/api.hpp"
#include "library/fork_gotcha.hpp"
#include "library/mpi_gotcha.hpp"
#include "library/components/fork_gotcha.hpp"
#include "library/components/mpi_gotcha.hpp"
#include "library/api.hpp"
#include "library/common.hpp"
#include "library/state.hpp"
@@ -51,6 +51,8 @@
#include <mutex>
namespace omnitrace
{
template <critical_trace::Device DevID, critical_trace::Phase PhaseID,
bool UpdateStack = true>
inline void
@@ -74,7 +76,7 @@ add_critical_trace(int64_t _tid, size_t _cpu_cid, size_t _gpu_cid, size_t _paren
if constexpr(PhaseID != critical_trace::Phase::NONE)
{
// unique lock per thread
auto& _mtx = type_mutex<critical_insert, omnitrace, num_mutexes>(_tid);
auto& _mtx = type_mutex<critical_insert, api::omnitrace, num_mutexes>(_tid);
auto_lock_t _lk{ _mtx };
auto& _critical_trace = critical_trace::get(_tid);
@@ -86,7 +88,7 @@ add_critical_trace(int64_t _tid, size_t _cpu_cid, size_t _gpu_cid, size_t _paren
if constexpr(UpdateStack)
{
// unique lock per thread
auto& _mtx = type_mutex<cpu_cid_stack, omnitrace, num_mutexes>(_tid);
auto& _mtx = type_mutex<cpu_cid_stack, api::omnitrace, num_mutexes>(_tid);
if constexpr(PhaseID == critical_trace::Phase::NONE)
{
@@ -110,3 +112,4 @@ add_critical_trace(int64_t _tid, size_t _cpu_cid, size_t _gpu_cid, size_t _paren
tim::consume_parameters(_tid, _cpu_cid, _gpu_cid, _parent_cid, _ts_beg, _ts_val,
_hash, _depth, _prio);
}
} // namespace omnitrace
+2
Просмотреть файл
@@ -28,6 +28,8 @@
#pragma once
#include "library/defines.hpp"
#include <timemory/compat/macros.h>
// forward decl of the API
+9 -3
Просмотреть файл
@@ -28,6 +28,8 @@
#pragma once
#include "library/defines.hpp"
#include <timemory/api.hpp>
#include <timemory/backends/dmp.hpp>
#include <timemory/backends/process.hpp>
@@ -45,6 +47,10 @@
#include <utility>
#include <vector>
// timemory api struct
struct omnitrace : tim::concepts::api
{};
TIMEMORY_DEFINE_NS_API(api, omnitrace)
TIMEMORY_DEFINE_NS_API(api, sampling)
namespace omnitrace
{
namespace api = tim::api; // NOLINT
}
+125
Просмотреть файл
@@ -0,0 +1,125 @@
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// with the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
// sell copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// * Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimers.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimers in the
// documentation and/or other materials provided with the distribution.
//
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this Software without specific prior written permission.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#pragma once
#include "library/common.hpp"
#include "library/components/fwd.hpp"
#include "library/defines.hpp"
#include "library/thread_data.hpp"
#include "library/timemory.hpp"
#include <timemory/components/base.hpp>
#include <timemory/components/papi/papi_array.hpp>
#include <timemory/macros/language.hpp>
#include <timemory/mpl/concepts.hpp>
#include <timemory/sampling/sampler.hpp>
#include <timemory/variadic/types.hpp>
#include <array>
#include <chrono>
#include <cstddef>
#include <cstdint>
#include <set>
#include <vector>
namespace omnitrace
{
namespace component
{
struct backtrace
: tim::component::empty_base
, tim::concepts::component
{
static constexpr size_t num_hw_counters = 8;
using data_t = std::array<char[512], 128>;
using clock_type = std::chrono::steady_clock;
using time_point_type = typename clock_type::time_point;
using value_type = void;
using hw_counters = tim::component::papi_array<num_hw_counters>;
using hw_counter_data_t = typename hw_counters::value_type;
using system_clock = std::chrono::system_clock;
using system_time_point = typename system_clock::time_point;
static void preinit();
static std::string label();
static std::string description();
backtrace() = default;
~backtrace() = default;
backtrace(backtrace&&) = default;
backtrace(const backtrace&) = default;
backtrace& operator=(const backtrace&) = default;
backtrace& operator=(backtrace&&) = default;
bool operator<(const backtrace& rhs) const;
static std::set<int> configure(bool, int64_t _tid = threading::get_id());
static void post_process(int64_t _tid = threading::get_id());
static hw_counter_data_t& get_last_hwcounters();
static void start();
static void stop();
void sample(int = -1);
bool empty() const;
size_t size() const;
std::vector<std::string> get() const;
time_point_type get_timestamp() const;
int64_t get_thread_cpu_timestamp() const;
private:
int64_t m_tid = 0;
int64_t m_thr_cpu_ts = 0;
size_t m_size = 0;
time_point_type m_ts = {};
data_t m_data = {};
hw_counter_data_t m_hw_counter = {};
};
} // namespace component
} // namespace omnitrace
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
# include <timemory/operations.hpp>
TIMEMORY_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_wall_clock>), true,
double)
TIMEMORY_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_cpu_clock>), true,
double)
TIMEMORY_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_fraction>), true,
double)
#endif
@@ -29,8 +29,11 @@
#pragma once
#include "library/common.hpp"
#include "library/defines.hpp"
#include "library/timemory.hpp"
namespace omnitrace
{
// this is used to wrap fork()
struct fork_gotcha : comp::base<fork_gotcha, void>
{
@@ -38,11 +41,18 @@ struct fork_gotcha : comp::base<fork_gotcha, void>
TIMEMORY_DEFAULT_OBJECT(fork_gotcha)
// string id for component
static std::string label() { return "fork_gotcha"; }
// generate the gotcha wrappers
static void configure();
// this will get called right before fork
void audit(const gotcha_data_t& _data, audit::incoming);
static void audit(const gotcha_data_t& _data, audit::incoming);
// this will get called right after fork with the return value
void audit(const gotcha_data_t& _data, audit::outgoing, pid_t _pid);
static void audit(const gotcha_data_t& _data, audit::outgoing, pid_t _pid);
};
using fork_gotcha_t = comp::gotcha<4, tim::component_tuple<fork_gotcha>, omnitrace>;
using fork_gotcha_t = comp::gotcha<4, tim::component_tuple<fork_gotcha>, api::omnitrace>;
} // namespace omnitrace
+121
Просмотреть файл
@@ -0,0 +1,121 @@
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// with the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
// sell copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// * Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimers.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimers in the
// documentation and/or other materials provided with the distribution.
//
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this Software without specific prior written permission.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#pragma once
#include "library/defines.hpp"
#include "timemory/components/user_bundle/types.hpp"
#include <timemory/components/data_tracker/types.hpp>
#include <timemory/components/macros.hpp>
#include <timemory/enum.h>
TIMEMORY_DECLARE_COMPONENT(roctracer)
namespace omnitrace
{
namespace component
{
template <typename... Tp>
using data_tracker = tim::component::data_tracker<Tp...>;
struct omnitrace;
struct backtrace;
struct backtrace_wall_clock
{};
struct backtrace_cpu_clock
{};
struct backtrace_fraction
{};
using sampling_wall_clock = data_tracker<double, backtrace_wall_clock>;
using sampling_cpu_clock = data_tracker<double, backtrace_cpu_clock>;
using sampling_percent = data_tracker<double, backtrace_fraction>;
using roctracer = tim::component::roctracer;
} // namespace component
} // namespace omnitrace
#if !defined(OMNITRACE_USE_ROCTRACER)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::roctracer, false_type)
#endif
#if !defined(TIMEMORY_USE_LIBUNWIND)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::api::sampling, false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::sampling::backtrace, false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_wall_clock,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_cpu_clock,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_percent,
false_type)
#endif
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::omnitrace, OMNITRACE_COMPONENT,
"omnitrace", "omnitrace_component")
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::roctracer, OMNITRACE_ROCTRACER,
"roctracer", "omnitrace_roctracer")
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::sampling_wall_clock,
OMNITRACE_SAMPLING_WALL_CLOCK, "sampling_wall_clock", "")
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::sampling_cpu_clock,
OMNITRACE_SAMPLING_CPU_CLOCK, "sampling_cpu_clock", "")
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::sampling_percent,
OMNITRACE_SAMPLING_PERCENT, "sampling_percent", "")
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::sampling_wall_clock,
"sampling_wall_clock", "Wall-clock timing",
"derived from statistical sampling")
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::sampling_cpu_clock,
"sampling_cpu_clock", "CPU-clock timing",
"derived from statistical sampling")
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::sampling_percent,
"sampling_percent",
"Fraction of wall-clock time spent in functions",
"derived from statistical sampling")
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::roctracer, "roctracer",
"High-precision ROCm API and kernel tracing", "")
TIMEMORY_STATISTICS_TYPE(omnitrace::component::sampling_wall_clock, double)
TIMEMORY_STATISTICS_TYPE(omnitrace::component::sampling_cpu_clock, double)
// enable timing units
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category,
omnitrace::component::sampling_wall_clock, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units,
omnitrace::component::sampling_wall_clock, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category,
omnitrace::component::sampling_cpu_clock, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units,
omnitrace::component::sampling_cpu_clock, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category, omnitrace::component::sampling_percent,
true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_mean, omnitrace::component::sampling_percent,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_units, omnitrace::component::sampling_percent,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_statistics, omnitrace::component::sampling_percent,
false_type)
@@ -29,8 +29,11 @@
#pragma once
#include "library/common.hpp"
#include "library/defines.hpp"
#include "library/timemory.hpp"
namespace omnitrace
{
// this is used to wrap MPI_Init and MPI_Init_thread
struct mpi_gotcha : comp::base<mpi_gotcha, void>
{
@@ -39,6 +42,12 @@ struct mpi_gotcha : comp::base<mpi_gotcha, void>
TIMEMORY_DEFAULT_OBJECT(mpi_gotcha)
// string id for component
static std::string label() { return "mpi_gotcha"; }
// generate the gotcha wrappers
static void configure();
// called right before MPI_Init with that functions arguments
static void audit(const gotcha_data_t& _data, audit::incoming, int*, char***);
@@ -56,9 +65,10 @@ struct mpi_gotcha : comp::base<mpi_gotcha, void>
void audit(const gotcha_data_t& _data, audit::outgoing, int _retval);
private:
comm_t m_comm = tim::mpi::comm_world_v;
int* m_rank = nullptr;
int* m_size = nullptr;
void* m_comm = nullptr;
int* m_rank = nullptr;
int* m_size = nullptr;
};
using mpi_gotcha_t = comp::gotcha<5, tim::component_tuple<mpi_gotcha>, omnitrace>;
using mpi_gotcha_t = comp::gotcha<5, tim::component_tuple<mpi_gotcha>, api::omnitrace>;
} // namespace omnitrace
@@ -28,16 +28,29 @@
#pragma once
#include "library/defines.hpp"
#include "library/timemory.hpp"
namespace omnitrace
{
namespace component
{
// timemory component which calls omnitrace functions
// (used in gotcha wrappers)
struct omnitrace_component : comp::base<omnitrace_component, void>
struct omnitrace : comp::base<omnitrace, void>
{
void start();
void stop();
void set_prefix(const char*);
static std::string label() { return "omnitrace"; }
void start();
void stop();
void set_prefix(const char*);
private:
const char* m_prefix = nullptr;
};
} // namespace component
} // namespace omnitrace
TIMEMORY_METADATA_SPECIALIZATION(
omnitrace::component::omnitrace, "omnitrace",
"Invokes instrumentation functions 'omnitrace_push_trace' and 'omnitrace_pop_trace'",
"Used by gotcha wrappers")
+75
Просмотреть файл
@@ -0,0 +1,75 @@
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// with the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
// sell copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// * Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimers.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimers in the
// documentation and/or other materials provided with the distribution.
//
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this Software without specific prior written permission.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#pragma once
#include "library/common.hpp"
#include "library/defines.hpp"
#include "library/timemory.hpp"
#include <future>
namespace omnitrace
{
struct pthread_gotcha : tim::component::base<pthread_gotcha, void>
{
struct wrapper
{
using routine_t = void* (*) (void*);
using promise_t = std::promise<void>;
wrapper(routine_t _routine, void* _arg, bool, promise_t*);
void* operator()() const;
static void* wrap(void* _arg);
private:
bool m_enable_sampling = false;
routine_t m_routine = nullptr;
void* m_arg = nullptr;
promise_t* m_promise = nullptr;
};
TIMEMORY_DEFAULT_OBJECT(pthread_gotcha)
// string id for component
static std::string label() { return "pthread_gotcha"; }
// generate the gotcha wrappers
static void configure();
// threads can set this to avoid starting sampling on child threads
static bool& enable_sampling_on_child_threads();
// pthread_create
int operator()(pthread_t* thread, const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg) const;
};
using pthread_gotcha_t = tim::component::gotcha<2, std::tuple<>, pthread_gotcha>;
} // namespace omnitrace
@@ -28,20 +28,17 @@
#pragma once
#include "timemory/api.hpp"
#include "timemory/components/base.hpp"
#include "timemory/components/data_tracker/components.hpp"
#include "timemory/components/macros.hpp"
#include "timemory/enum.h"
#include "timemory/macros/os.hpp"
#include "timemory/mpl/type_traits.hpp"
#include "timemory/mpl/types.hpp"
#include "library/components/fwd.hpp"
#include "library/defines.hpp"
TIMEMORY_DECLARE_COMPONENT(roctracer)
#if !defined(OMNITRACE_USE_ROCTRACER)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::roctracer, false_type)
#endif
#include <timemory/api.hpp>
#include <timemory/components/base.hpp>
#include <timemory/components/data_tracker/components.hpp>
#include <timemory/components/macros.hpp>
#include <timemory/enum.h>
#include <timemory/macros/os.hpp>
#include <timemory/mpl/type_traits.hpp>
#include <timemory/mpl/types.hpp>
namespace tim
{
@@ -86,7 +83,12 @@ TIMEMORY_SET_COMPONENT_API(component::roctracer_data, project::timemory, categor
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category, component::roctracer_data, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units, component::roctracer_data, true_type)
#include "timemory/operations.hpp"
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
# include <timemory/operations.hpp>
TIMEMORY_DECLARE_EXTERN_COMPONENT(roctracer, false, void)
TIMEMORY_DECLARE_EXTERN_COMPONENT(roctracer_data, true, double)
#endif
@@ -28,12 +28,12 @@
#pragma once
#include "library/components/roctracer.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/dynamic_library.hpp"
#include "library/perfetto.hpp"
#include "library/ptl.hpp"
#include "library/roctracer.hpp"
#include <roctracer.h>
#include <roctracer_ext.h>
@@ -58,12 +58,15 @@
} \
} while(0)
using hsa_timer_t = hsa_rt_utils::Timer;
using timestamp_t = hsa_timer_t::timestamp_t;
using roctracer_bundle_t = tim::component_bundle<omnitrace, comp::roctracer_data,
comp::wall_clock, quirk::explicit_pop>;
using roctracer_hsa_bundle_t = tim::component_bundle<omnitrace, comp::roctracer_data>;
using roctracer_functions_t = std::vector<std::pair<std::string, std::function<void()>>>;
namespace omnitrace
{
using hsa_timer_t = hsa_rt_utils::Timer;
using timestamp_t = hsa_timer_t::timestamp_t;
using roctracer_bundle_t =
tim::component_bundle<api::omnitrace, comp::roctracer_data, comp::wall_clock>;
using roctracer_hsa_bundle_t =
tim::component_bundle<api::omnitrace, comp::roctracer_data>;
using roctracer_functions_t = std::vector<std::pair<std::string, std::function<void()>>>;
std::unique_ptr<hsa_timer_t>&
get_hsa_timer();
@@ -94,3 +97,4 @@ roctracer_setup_routines();
roctracer_functions_t&
roctracer_tear_down_routines();
} // namespace omnitrace
+48 -19
Просмотреть файл
@@ -30,9 +30,11 @@
#include "library/api.hpp"
#include "library/common.hpp"
#include "library/fork_gotcha.hpp"
#include "library/mpi_gotcha.hpp"
#include "library/roctracer.hpp"
#include "library/components/fork_gotcha.hpp"
#include "library/components/mpi_gotcha.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/components/roctracer.hpp"
#include "library/defines.hpp"
#include "library/state.hpp"
#include "library/timemory.hpp"
@@ -40,37 +42,42 @@
#include <string_view>
namespace omnitrace
{
// bundle of components around omnitrace_init and omnitrace_finalize
using main_bundle_t =
tim::lightweight_tuple<comp::wall_clock, comp::peak_rss, comp::cpu_clock,
comp::cpu_util, comp::roctracer, papi_tot_ins,
comp::user_global_bundle, fork_gotcha_t, mpi_gotcha_t>;
comp::cpu_util, comp::roctracer, comp::user_global_bundle,
fork_gotcha_t, mpi_gotcha_t, pthread_gotcha_t>;
// bundle of components used in instrumentation
using instrumentation_bundle_t =
tim::component_bundle<omnitrace, comp::wall_clock*, comp::user_global_bundle*>;
tim::component_bundle<api::omnitrace, comp::wall_clock*, comp::user_global_bundle*>;
// allocator for instrumentation_bundle_t
using bundle_allocator_t = tim::data::ring_buffer_allocator<instrumentation_bundle_t>;
// bundle of components around each thread
#if defined(TIMEMORY_RUSAGE_THREAD) && TIMEMORY_RUSAGE_THREAD > 0
using omnitrace_thread_bundle_t =
tim::lightweight_tuple<comp::wall_clock, comp::thread_cpu_clock,
comp::thread_cpu_util,
#if defined(TIMEMORY_RUSAGE_THREAD) && TIMEMORY_RUSAGE_THREAD > 0
comp::peak_rss,
comp::thread_cpu_util, comp::peak_rss>;
#else
using omnitrace_thread_bundle_t =
tim::lightweight_tuple<comp::wall_clock, comp::thread_cpu_clock,
comp::thread_cpu_util>;
#endif
papi_tot_ins>;
//
// Initialization routines
//
void
configure_settings();
configure_settings() TIMEMORY_VISIBILITY("default");
void
print_config_settings(std::ostream& _os,
std::function<bool(const std::string_view&)>&& _filter);
print_config_settings(
std::ostream& _os,
std::function<bool(const std::string_view&, const std::set<std::string>&)>&& _filter);
std::string&
get_exe_name();
@@ -81,24 +88,39 @@ get_exe_name();
std::string
get_config_file();
bool
get_debug_env();
bool
get_debug();
bool
bool&
get_use_perfetto();
bool
bool&
get_use_timemory();
bool&
get_use_roctracer();
bool&
get_use_sampling();
bool&
get_use_pid();
bool
bool&
get_use_mpip();
bool
bool&
get_use_critical_trace();
bool
get_timeline_sampling();
bool
get_flat_sampling();
bool
get_roctracer_timeline_profile();
@@ -135,14 +157,20 @@ get_trace_hsa_api_types();
std::string&
get_backend();
std::string
std::string&
get_perfetto_output_filename();
int64_t
get_critical_trace_count();
size_t&
get_sample_rate();
get_instrumentation_interval();
double&
get_sampling_freq();
double&
get_sampling_delay();
int64_t
get_critical_trace_per_row();
@@ -161,3 +189,4 @@ get_cpu_cid();
std::unique_ptr<std::vector<uint64_t>>&
get_cpu_cid_stack(int64_t _tid = threading::get_id());
} // namespace omnitrace
+6 -1
Просмотреть файл
@@ -29,8 +29,10 @@
#pragma once
#include "library/config.hpp"
#include "library/defines.hpp"
#include "library/thread_data.hpp"
#include "timemory/tpls/cereal/cereal/cereal.hpp"
#include <timemory/tpls/cereal/cereal/cereal.hpp>
#include <cstdint>
#include <cstdlib>
@@ -38,6 +40,8 @@
#include <string>
#include <vector>
namespace omnitrace
{
namespace critical_trace
{
enum class Device : short
@@ -207,3 +211,4 @@ struct id
{};
} // namespace critical_trace
} // namespace omnitrace
+10 -3
Просмотреть файл
@@ -28,17 +28,23 @@
#pragma once
#include <cstdio>
#include "library/defines.hpp"
#include <timemory/api.hpp>
#include <timemory/backends/dmp.hpp>
#include <timemory/backends/process.hpp>
#include <timemory/utility/utility.hpp>
#include <cstdio>
namespace omnitrace
{
bool
get_debug();
bool
get_critical_trace_debug();
} // namespace omnitrace
#if defined(TIMEMORY_USE_MPI)
# define OMNITRACE_CONDITIONAL_PRINT(COND, ...) \
@@ -74,7 +80,8 @@ get_critical_trace_debug();
fflush(stderr); \
}
#define OMNITRACE_DEBUG(...) OMNITRACE_CONDITIONAL_PRINT(get_debug(), __VA_ARGS__)
#define OMNITRACE_DEBUG(...) \
OMNITRACE_CONDITIONAL_PRINT(::omnitrace::get_debug(), __VA_ARGS__)
#define OMNITRACE_PRINT(...) OMNITRACE_CONDITIONAL_PRINT(true, __VA_ARGS__)
#define OMNITRACE_CT_DEBUG(...) \
OMNITRACE_CONDITIONAL_PRINT(get_critical_trace_debug(), __VA_ARGS__)
OMNITRACE_CONDITIONAL_PRINT(::omnitrace::get_critical_trace_debug(), __VA_ARGS__)
+11 -1
Просмотреть файл
@@ -33,10 +33,20 @@
#define OMNITRACE_HIP_VERSION_MAJOR @HIP_VERSION_MAJOR@
#define OMNITRACE_HIP_VERSION_MINOR @HIP_VERSION_MINOR@
#define OMNITRACE_HIP_VERSION_PATCH @HIP_VERSION_PATCH@
// clang-format on
#if defined(OMNITRACE_USE_ROCTRACER)
# define OMNITRACE_ROCTRACER_LIBKFDWRAPPER "@roctracer_kfdwrapper_LIBRARY@"
#else
# define OMNITRACE_ROCTRACER_LIBKFDWRAPPER "/opt/rocm/roctracer/lib/libkfdwrapper64.so"
#endif
// clang-format on
#define TIMEMORY_USER_COMPONENT_ENUM \
OMNITRACE_SAMPLING_WALL_CLOCK_idx, OMNITRACE_SAMPLING_CPU_CLOCK_idx, \
OMNITRACE_SAMPLING_PERCENT_idx, OMNITRACE_COMPONENT_idx, OMNITRACE_ROCTRACER_idx,
#define OMNITRACE_COMPONENT OMNITRACE_COMPONENT_idx
#define OMNITRACE_ROCTRACER OMNITRACE_ROCTRACER_idx
#define OMNITRACE_SAMPLING_WALL_CLOCK OMNITRACE_SAMPLING_WALL_CLOCK_idx
#define OMNITRACE_SAMPLING_CPU_CLOCK OMNITRACE_SAMPLING_CPU_CLOCK_idx
#define OMNITRACE_SAMPLING_PERCENT OMNITRACE_SAMPLING_PERCENT_idx
+6 -1
Просмотреть файл
@@ -29,11 +29,15 @@
#pragma once
#include "library/debug.hpp"
#include "library/defines.hpp"
#include <timemory/environment.hpp>
#include <dlfcn.h>
#include <string>
#include <timemory/environment.hpp>
namespace omnitrace
{
struct dynamic_library
{
dynamic_library() = delete;
@@ -69,3 +73,4 @@ struct dynamic_library
int flags = 0;
void* handle = nullptr;
};
} // namespace omnitrace
+32
Просмотреть файл
@@ -0,0 +1,32 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#pragma once
namespace omnitrace
{
namespace gpu
{
int
device_count();
}
} // namespace omnitrace
+5
Просмотреть файл
@@ -28,6 +28,8 @@
#pragma once
#include "library/defines.hpp"
#if defined(PERFETTO_CATEGORIES)
# error "PERFETTO_CATEGORIES is already defined. Please include \"" __FILE__ "\" before including any timemory files"
#endif
@@ -58,6 +60,8 @@
PERFETTO_DEFINE_CATEGORIES(PERFETTO_CATEGORIES);
#endif
namespace omnitrace
{
#if defined(CUSTOM_DATA_SOURCE)
class CustomDataSource : public perfetto::DataSource<CustomDataSource>
{
@@ -89,3 +93,4 @@ public:
PERFETTO_DECLARE_DATA_SOURCE_STATIC_MEMBERS(CustomDataSource);
#endif
} // namespace omnitrace
+6 -2
Просмотреть файл
@@ -28,11 +28,14 @@
#pragma once
#include "PTL/PTL.hh"
#include "timemory/macros/attributes.hpp"
#include "library/defines.hpp"
#include <PTL/PTL.hh>
#include <mutex>
namespace omnitrace
{
namespace tasking
{
std::mutex&
@@ -53,3 +56,4 @@ get_critical_trace_thread_pool();
PTL::TaskGroup<void>&
get_critical_trace_task_group();
} // namespace tasking
} // namespace omnitrace
+76
Просмотреть файл
@@ -0,0 +1,76 @@
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// with the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
// sell copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// * Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimers.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimers in the
// documentation and/or other materials provided with the distribution.
//
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this Software without specific prior written permission.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#pragma once
#include "library/common.hpp"
#include "library/components/backtrace.hpp"
#include "library/components/fwd.hpp"
#include "library/defines.hpp"
#include "library/thread_data.hpp"
#include "library/timemory.hpp"
#include <timemory/macros/language.hpp>
#include <timemory/sampling/sampler.hpp>
#include <timemory/variadic/types.hpp>
#include <cstdint>
#include <memory>
#include <set>
namespace omnitrace
{
namespace sampling
{
using component::backtrace;
using component::backtrace_cpu_clock; // NOLINT
using component::backtrace_fraction; // NOLINT
using component::backtrace_wall_clock; // NOLINT
using component::sampling_cpu_clock;
using component::sampling_percent;
using component::sampling_wall_clock;
std::set<int>
setup();
std::set<int>
shutdown();
void block_signals(std::set<int> = {});
void unblock_signals(std::set<int> = {});
using bundle_t = tim::lightweight_tuple<backtrace>;
using sampler_t = tim::sampling::sampler<bundle_t, tim::sampling::dynamic>;
using sampler_instances = omnitrace_thread_data<sampler_t, api::sampling>;
std::unique_ptr<sampler_t>&
get_sampler(int64_t _tid = threading::get_id());
} // namespace sampling
} // namespace omnitrace
+5
Просмотреть файл
@@ -28,6 +28,10 @@
#pragma once
#include "library/defines.hpp"
namespace omnitrace
{
// used for specifying the state of omnitrace
enum class State : unsigned short
{
@@ -36,3 +40,4 @@ enum class State : unsigned short
Active,
Finalized
};
} // namespace omnitrace
+8 -2
Просмотреть файл
@@ -29,6 +29,7 @@
#pragma once
#include "library/config.hpp"
#include "library/defines.hpp"
#include <array>
#include <cstdint>
@@ -40,6 +41,8 @@
# define OMNITRACE_MAX_THREADS 1024
#endif
namespace omnitrace
{
static constexpr size_t max_supported_threads = OMNITRACE_MAX_THREADS;
template <typename Tp, typename Tag = void, size_t MaxThreads = max_supported_threads>
@@ -63,8 +66,10 @@ template <typename... Args>
void
omnitrace_thread_data<Tp, Tag, MaxThreads>::construct(Args&&... _args)
{
static thread_local bool _v = [&_args...]() {
instances().at(threading::get_id()) =
// construct outside of lambda to prevent data-race
static auto& _instances = instances();
static thread_local bool _v = [&_args...]() {
_instances.at(threading::get_id()) =
std::make_unique<Tp>(std::forward<Args>(_args)...);
return true;
}();
@@ -124,3 +129,4 @@ struct instrumentation_bundles
static instance_array_t& instances();
};
} // namespace omnitrace
+16 -13
Просмотреть файл
@@ -28,36 +28,39 @@
#pragma once
#include "library/components/fwd.hpp"
#include "library/defines.hpp"
#include <timemory/api.hpp>
#include <timemory/backends/mpi.hpp>
#include <timemory/backends/process.hpp>
#include <timemory/backends/threading.hpp>
#include <timemory/components.hpp>
#include <timemory/components/gotcha/mpip.hpp>
#include <timemory/components/papi/papi_tuple.hpp>
#include <timemory/config.hpp>
#include <timemory/environment.hpp>
#include <timemory/manager.hpp>
#include <timemory/mpl/apply.hpp>
#include <timemory/mpl.hpp>
#include <timemory/operations.hpp>
#include <timemory/runtime.hpp>
#include <timemory/settings.hpp>
#include <timemory/storage.hpp>
#include <timemory/variadic.hpp>
namespace audit = tim::audit;
namespace comp = tim::component;
namespace quirk = tim::quirk;
namespace threading = tim::threading;
namespace scope = tim::scope;
namespace dmp = tim::dmp;
namespace process = tim::process;
namespace units = tim::units;
namespace trait = tim::trait;
namespace omnitrace
{
namespace audit = tim::audit; // NOLINT
namespace comp = tim::component; // NOLINT
namespace quirk = tim::quirk; // NOLINT
namespace threading = tim::threading; // NOLINT
namespace scope = tim::scope; // NOLINT
namespace dmp = tim::dmp; // NOLINT
namespace process = tim::process; // NOLINT
namespace units = tim::units; // NOLINT
namespace trait = tim::trait; // NOLINT
// same sort of functionality as python's " ".join([...])
#if !defined(JOIN)
# define JOIN(...) tim::mpl::apply<std::string>::join(__VA_ARGS__)
#endif
using papi_tot_ins = comp::papi_tuple<PAPI_TOT_INS>;
} // namespace omnitrace
+67 -63
Просмотреть файл
@@ -32,7 +32,7 @@
#include "timemory/environment.hpp"
#include "timemory/mpl/apply.hpp"
#include "timemory/utility/argparse.hpp"
#include "timemory/utility/macros.hpp"
#include "timemory/utility/demangle.hpp"
#include "timemory/utility/popen.hpp"
#include "timemory/variadic/macros.hpp"
@@ -49,9 +49,11 @@
#include <cstring>
#include <limits>
#include <memory>
#include <numeric>
#include <regex>
#include <set>
#include <sstream>
#include <string>
#include <vector>
//
@@ -121,23 +123,15 @@ omnitrace_prefork_callback(thread_t* parent, thread_t* child);
//
// boolean settings
//
static bool binary_rewrite = 0;
static bool loop_level_instr = false;
static bool werror = false;
static bool stl_func_instr = false;
static bool use_mpi = false;
static bool is_static_exe = false;
static bool use_return_info = false;
static bool use_args_info = false;
static bool use_file_info = false;
static bool use_line_info = false;
static bool use_return_info = false;
static bool use_args_info = false;
static bool use_file_info = false;
static bool use_line_info = false;
//
// integral settings
//
static bool debug_print = false;
static int expect_error = NO_ERROR;
static int error_print = 0;
static int verbose_level = tim::get_env<int>("TIMEMORY_RUN_VERBOSE", 0);
extern bool debug_print;
extern int verbose_level;
//
// string settings
//
@@ -150,7 +144,6 @@ static string_t prefer_library = {};
// global variables
//
static patch_pointer_t bpatch = {};
static call_expr_t* initialize_expr = nullptr;
static call_expr_t* terminate_expr = nullptr;
static snippet_vec_t init_names = {};
static snippet_vec_t fini_names = {};
@@ -161,18 +154,18 @@ static regexvec_t func_include = {};
static regexvec_t func_exclude = {};
static regexvec_t file_include = {};
static regexvec_t file_exclude = {};
static auto regex_opts = std::regex_constants::egrep | std::regex_constants::optimize;
//
//======================================================================================//
// control debug printf statements
#define dprintf(...) \
if(debug_print || verbose_level > 0) fprintf(stderr, __VA_ARGS__); \
if(debug_print || verbose_level > 0) \
fprintf(stderr, "[omnitrace][exe] " __VA_ARGS__); \
fflush(stderr);
// control verbose printf statements
#define verbprintf(LEVEL, ...) \
if(verbose_level >= LEVEL) fprintf(stdout, __VA_ARGS__); \
if(verbose_level >= LEVEL) fprintf(stdout, "[omnitrace][exe] " __VA_ARGS__); \
fflush(stdout);
//======================================================================================//
@@ -195,6 +188,9 @@ extern "C"
//======================================================================================//
strset_t
get_whole_function_names();
function_signature
get_func_file_line_info(module_t* mutatee_module, procedure_t* f);
@@ -217,7 +213,7 @@ void
errorFunc(error_level_t level, int num, const char** params);
procedure_t*
find_function(image_t* appImage, const string_t& functionName, strset_t = {});
find_function(image_t* appImage, const string_t& functionName, const strset_t& = {});
void
error_func_real(error_level_t level, int num, const char* const* params);
@@ -242,15 +238,15 @@ get_absolute_path(const char* fname)
if(!(p = strrchr((char*) fname, '/')))
{
auto ret = getcwd(abs_exe_path, sizeof(abs_exe_path));
auto* ret = getcwd(abs_exe_path, sizeof(abs_exe_path));
consume_parameters(ret);
}
else
{
auto rets = getcwd(path_save, sizeof(path_save));
auto retf = chdir(fname);
auto reta = getcwd(abs_exe_path, sizeof(abs_exe_path));
auto retp = chdir(path_save);
auto* rets = getcwd(path_save, sizeof(path_save));
auto retf = chdir(fname);
auto* reta = getcwd(abs_exe_path, sizeof(abs_exe_path));
auto retp = chdir(path_save);
consume_parameters(rets, retf, reta, retp);
}
return string_t(abs_exe_path);
@@ -285,34 +281,32 @@ struct function_signature
TIMEMORY_DEFAULT_OBJECT(function_signature)
function_signature(string_t _ret, string_t _name, string_t _file,
function_signature(string_t _ret, const string_t& _name, string_t _file,
location_t _row = { 0, 0 }, location_t _col = { 0, 0 },
bool _loop = false, bool _info_beg = false, bool _info_end = false)
: m_loop(_loop)
, m_info_beg(_info_beg)
, m_info_end(_info_end)
, m_row(_row)
, m_col(_col)
, m_return(_ret)
, m_row(std::move(_row))
, m_col(std::move(_col))
, m_return(std::move(_ret))
, m_name(tim::demangle(_name))
, m_file(_file)
, m_file(std::move(_file))
{
if(m_file.find('/') != string_t::npos)
m_file = m_file.substr(m_file.find_last_of('/') + 1);
}
function_signature(string_t _ret, string_t _name, string_t _file,
std::vector<string_t> _params, location_t _row = { 0, 0 },
location_t _col = { 0, 0 }, bool _loop = false,
function_signature(const string_t& _ret, const string_t& _name, const string_t& _file,
const std::vector<string_t>& _params, location_t&& _row = { 0, 0 },
location_t&& _col = { 0, 0 }, bool _loop = false,
bool _info_beg = false, bool _info_end = false)
: function_signature(_ret, _name, _file, _row, _col, _loop, _info_beg, _info_end)
{
std::stringstream ss;
ss << "(";
for(auto& itr : _params)
ss << itr << ", ";
m_params = ss.str();
m_params = m_params.substr(0, m_params.length() - 2);
m_params = "(";
for(const auto& itr : _params)
m_params.append(itr + ", ");
if(!_params.empty()) m_params = m_params.substr(0, m_params.length() - 2);
m_params += ")";
}
@@ -373,11 +367,11 @@ struct module_function
get_width()[2] = std::max<size_t>(get_width()[2], rhs.signature.get().length());
}
module_function(const string_t& _module, const string_t& _func,
const function_signature& _sign, procedure_t* proc)
: module(_module)
, function(_func)
, signature(_sign)
module_function(string_t _module, string_t _func, function_signature _sign,
procedure_t* proc)
: module(std::move(_module))
, function(std::move(_func))
, signature(std::move(_sign))
{
if(proc)
{
@@ -482,35 +476,43 @@ dump_info(std::ostream& _os, const fmodset_t& _data)
}
//
static inline void
dump_info(const string_t& _oname, const fmodset_t& _data, int _level)
dump_info(const string_t& _oname, const fmodset_t& _data, int _level, bool _fail)
{
if(!debug_print && verbose_level < _level) return;
std::ofstream ofs(_oname);
std::ofstream ofs{ _oname };
if(ofs)
{
verbprintf(_level, "Dumping '%s'... ", _oname.c_str());
dump_info(ofs, _data);
verbprintf(_level, "Done\n");
}
else
{
std::stringstream _msg{};
_msg << "[" << __FUNCTION__ << "] Error opening '" << _oname << " for output";
verbprintf(_level, "%s\n", _msg.str().c_str());
if(_fail) throw std::runtime_error(_msg.str());
}
ofs.close();
}
//
//======================================================================================//
//
template <typename Tp>
template <typename Tp, std::enable_if_t<!std::is_same<Tp, std::string>::value, int> = 0>
snippet_pointer_t
get_snippet(Tp arg)
{
return snippet_pointer_t(new const_expr_t(arg));
return std::make_shared<snippet_t>(const_expr_t{ arg });
}
//
//======================================================================================//
//
inline snippet_pointer_t
get_snippet(string_t arg)
template <typename Tp, std::enable_if_t<std::is_same<Tp, std::string>::value, int> = 0>
snippet_pointer_t
get_snippet(const Tp& arg)
{
return snippet_pointer_t(new const_expr_t(arg.c_str()));
return std::make_shared<snippet_t>(const_expr_t{ arg.c_str() });
}
//
//======================================================================================//
@@ -519,7 +521,7 @@ template <typename... Args>
snippet_pointer_vec_t
get_snippets(Args&&... args)
{
snippet_pointer_vec_t _tmp;
snippet_pointer_vec_t _tmp{};
TIMEMORY_FOLD_EXPRESSION(_tmp.push_back(get_snippet(std::forward<Args>(args))));
return _tmp;
}
@@ -587,8 +589,8 @@ private:
//======================================================================================//
//
static inline address_space_t*
omnitrace_get_address_space(patch_pointer_t _bpatch, int _cmdc, char** _cmdv,
bool _rewrite, int _pid = -1, string_t _name = {})
omnitrace_get_address_space(patch_pointer_t& _bpatch, int _cmdc, char** _cmdv,
bool _rewrite, int _pid = -1, const string_t& _name = {})
{
address_space_t* mutatee = nullptr;
@@ -599,7 +601,8 @@ omnitrace_get_address_space(patch_pointer_t _bpatch, int _cmdc, char** _cmdv,
if(!_name.empty()) mutatee = _bpatch->openBinary(_name.c_str(), false);
if(!mutatee)
{
fprintf(stderr, "[omnitrace]> Failed to open binary '%s'\n", _name.c_str());
fprintf(stderr, "[omnitrace][exe] Failed to open binary '%s'\n",
_name.c_str());
throw std::runtime_error("Failed to open binary");
}
verbprintf(1, "Done\n");
@@ -612,7 +615,8 @@ omnitrace_get_address_space(patch_pointer_t _bpatch, int _cmdc, char** _cmdv,
mutatee = _bpatch->processAttach(_cmdv0, _pid);
if(!mutatee)
{
fprintf(stderr, "[omnitrace]> Failed to connect to process %i\n", (int) _pid);
fprintf(stderr, "[omnitrace][exe] Failed to connect to process %i\n",
(int) _pid);
throw std::runtime_error("Failed to attach to process");
}
verbprintf(1, "Done\n");
@@ -630,7 +634,7 @@ omnitrace_get_address_space(patch_pointer_t _bpatch, int _cmdc, char** _cmdv,
if(!_cmdv[i]) continue;
ss << _cmdv[i] << " ";
}
fprintf(stderr, "[omnitrace]> Failed to create process: '%s'\n",
fprintf(stderr, "[omnitrace][exe] Failed to create process: '%s'\n",
ss.str().c_str());
throw std::runtime_error("Failed to create process");
}
@@ -651,7 +655,7 @@ omnitrace_thread_exit(thread_t* thread, BPatch_exitType exit_type)
if(!terminate_expr)
{
fprintf(stderr, "[omnitrace]> continuing execution\n");
fprintf(stderr, "[omnitrace][exe] continuing execution\n");
app->continueExecution();
return;
}
@@ -660,18 +664,18 @@ omnitrace_thread_exit(thread_t* thread, BPatch_exitType exit_type)
{
case ExitedNormally:
{
fprintf(stderr, "[omnitrace]> Thread exited normally\n");
fprintf(stderr, "[omnitrace][exe] Thread exited normally\n");
break;
}
case ExitedViaSignal:
{
fprintf(stderr, "[omnitrace]> Thread terminated unexpectedly\n");
fprintf(stderr, "[omnitrace][exe] Thread terminated unexpectedly\n");
break;
}
case NoExit:
default:
{
fprintf(stderr, "[omnitrace]> %s invoked with NoExit\n", __FUNCTION__);
fprintf(stderr, "[omnitrace][exe] %s invoked with NoExit\n", __FUNCTION__);
break;
}
}
@@ -679,7 +683,7 @@ omnitrace_thread_exit(thread_t* thread, BPatch_exitType exit_type)
// terminate_expr = nullptr;
thread->oneTimeCode(*terminate_expr);
fprintf(stderr, "[omnitrace]> continuing execution\n");
fprintf(stderr, "[omnitrace][exe] continuing execution\n");
app->continueExecution();
}
//
@@ -703,7 +707,7 @@ omnitrace_fork_callback(thread_t* parent, thread_t* child)
if(parent)
{
auto app = parent->getProcess();
auto* app = parent->getProcess();
if(app)
{
verbprintf(4, "Continuing execution on parent after fork callback...\n");
+11 -17
Просмотреть файл
@@ -1,29 +1,23 @@
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// with the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
// sell copies of the Software, and to permit persons to whom the Software is
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// * Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimers.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimers in the
// documentation and/or other materials provided with the distribution.
//
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this Software without specific prior written permission.
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#pragma once
+61 -8
Просмотреть файл
@@ -1,15 +1,68 @@
#!/bin/bash
#!/bin/bash -e
: ${EXTRA_ARGS:=""}
: ${EXTRA_TAGS:=""}
: ${VERSION:=0.0.3}
: ${ROCM_VERSION:=4.3.0}
: ${NJOBS:=8}
STANDARD_ARGS="-DCPACK_GENERATOR=STGZ -DCMAKE_BUILD_TYPE=Release -DOMNITRACE_BUILD_DYNINST=ON -DTIMEMORY_BUILD_PORTABLE=ON"
STANDARD_ARGS="-DCPACK_GENERATOR=STGZ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF -DOMNITRACE_MAX_THREADS=2048 -DOMNITRACE_BUILD_TESTING=OFF -DTIMEMORY_USE_LIBUNWIND=ON -DTIMEMORY_BUILD_LIBUNWIND=ON -DTIMEMORY_BUILD_PORTABLE=ON"
STANDARD_ARGS="${STANDARD_ARGS} -DOMNITRACE_BUILD_DYNINST=ON $(echo -DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON)"
if [ -n "${EXTRA_ARGS}" ]; then
STANDARD_ARGS="${STANDARD_ARGS} ${EXTRA_ARGS}"
fi
cmake -B build-release/core ${STANDARD_ARGS} -DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON -DDYNINST_USE_OpenMP=OFF -DOMNITRACE_USE_MPI_HEADERS=ON -DOMNITRACE_USE_ROCTRACER=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF -DOMNITRACE_MAX_THREADS=2048 .
cmake --build build-release/core --target package --parallel 8
cp build-release/core/omnitrace-${VERSION}-Linux.sh build-release/omnitrace-${VERSION}-Linux.sh
PACKAGE_BASE_TAG=omnitrace-${VERSION}-Linux
if [ -n "${EXTRA_TAGS}" ]; then
PACKAGE_BASE_TAG="${PACKAGE_BASE_TAG}-${EXTRA_TAGS}"
fi
cmake -B build-release/rocm-mpi ${STANDARD_ARGS} -DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON -DDYNINST_USE_OpenMP=ON -DOMNITRACE_USE_MPI_HEADERS=ON -DOMNITRACE_USE_ROCTRACER=ON -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF -DOMNITRACE_MAX_THREADS=2048 .
cmake --build build-release/rocm-mpi --target package --parallel 8
cp build-release/rocm-mpi/omnitrace-${VERSION}-Linux.sh build-release/omnitrace-${VERSION}-Linux-ROCm-${ROCM_VERSION}.sh
SCRIPT_DIR=$(realpath $(dirname ${BASH_SOURCE[0]}))
cd $(dirname ${SCRIPT_DIR})
echo -e "Working directory: $(pwd)"
umask 000
if [ ! -f build-release/${PACKAGE_BASE_TAG}.sh ]; then
cmake -B build-release/core ${STANDARD_ARGS} -DCMAKE_INSTALL_PREFIX=build-release/core/install-release -DDYNINST_USE_OpenMP=OFF -DOMNITRACE_USE_MPI_HEADERS=OFF -DOMNITRACE_USE_ROCTRACER=OFF .
cmake --build build-release/core --target package --parallel ${NJOBS}
cp build-release/core/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}.sh
fi
apt-get install -y libmpich-dev mpich
STANDARD_ARGS="${STANDARD_ARGS} -DOMNITRACE_USE_ROCTRACER=ON -DOMNITRACE_USE_MPI_HEADERS=ON -DDYNINST_USE_OpenMP=ON"
if [ ! -f build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}.sh ]; then
cmake -B build-release/rocm-${ROCM_VERSION} -DCMAKE_INSTALL_PREFIX=build-release/rocm-${ROCM_VERSION}/install-release ${STANDARD_ARGS} .
cmake --build build-release/rocm-${ROCM_VERSION} --target package --parallel ${NJOBS}
cp build-release/rocm-${ROCM_VERSION}/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}.sh
fi
apt-get install -y libpapi-dev libpfm4-dev
STANDARD_ARGS="${STANDARD_ARGS} -DTIMEMORY_USE_PAPI=ON"
if [ ! -f build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI.sh ]; then
cmake -B build-release/rocm-${ROCM_VERSION}-papi -DCMAKE_INSTALL_PREFIX=build-release/rocm-${ROCM_VERSION}-papi/install-release ${STANDARD_ARGS} .
cmake --build build-release/rocm-${ROCM_VERSION}-papi --target package --parallel ${NJOBS}
cp build-release/rocm-${ROCM_VERSION}-papi/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI.sh
fi
STANDARD_ARGS="${STANDARD_ARGS} -DOMNITRACE_USE_MPI=ON"
apt-get install -y libmpich-dev mpich
if [ ! -f build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI-MPICH.sh ]; then
cmake -B build-release/rocm-${ROCM_VERSION}-papi-mpich -DCMAKE_INSTALL_PREFIX=build-release/rocm-${ROCM_VERSION}-papi-mpich/install-release ${STANDARD_ARGS} .
cmake --build build-release/rocm-${ROCM_VERSION}-papi-mpich --target package --parallel ${NJOBS}
cp build-release/rocm-${ROCM_VERSION}-papi-mpich/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI-MPICH.sh
fi
apt-get purge -y libmpich-dev mpich
apt-get install -y libopenmpi-dev openmpi-bin
if [ ! -f build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI-OpenMPI.sh ]; then
cmake -B build-release/rocm-${ROCM_VERSION}-papi-openmpi -DCMAKE_INSTALL_PREFIX=build-release/rocm-${ROCM_VERSION}-papi-openmpi/install-release ${STANDARD_ARGS} .
cmake --build build-release/rocm-${ROCM_VERSION}-papi-openmpi --target package --parallel ${NJOBS}
cp build-release/rocm-${ROCM_VERSION}-papi-openmpi/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI-OpenMPI.sh
fi
Разница между файлами не показана из-за своего большого размера Загрузить разницу
+209 -82
Просмотреть файл
@@ -27,15 +27,26 @@
// THE SOFTWARE.
#include "library.hpp"
#include "library/components/fork_gotcha.hpp"
#include "library/components/mpi_gotcha.hpp"
#include "library/config.hpp"
#include "library/critical_trace.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
#include "library/gpu.hpp"
#include "library/sampling.hpp"
#include "library/thread_data.hpp"
#include "library/timemory.hpp"
#include <mutex>
#include <string_view>
using namespace omnitrace;
namespace
{
std::vector<bool>&
get_sample_data()
get_interval_data()
{
static thread_local auto _v = std::vector<bool>{};
return _v;
@@ -48,33 +59,29 @@ setup_gotchas()
if(_initialized) return;
_initialized = true;
OMNITRACE_DEBUG(
OMNITRACE_CONDITIONAL_PRINT(
get_debug_env(),
"[%s] Configuring gotcha wrapper around fork, MPI_Init, and MPI_Init_thread\n",
__FUNCTION__);
fork_gotcha_t::get_initializer() = []() {
TIMEMORY_C_GOTCHA(fork_gotcha_t, 0, fork);
};
mpi_gotcha_t::get_initializer() = []() {
mpi_gotcha_t::template configure<0, int, int*, char***>("MPI_Init");
mpi_gotcha_t::template configure<1, int, int*, char***, int, int*>(
"MPI_Init_thread");
mpi_gotcha_t::template configure<2, int>("MPI_Finalize");
// mpi_gotcha_t::template configure<3, int, tim::mpi::comm_t,
// int*>("MPI_Comm_rank");
// mpi_gotcha_t::template configure<4, int, tim::mpi::comm_t,
// int*>("MPI_Comm_size");
};
mpi_gotcha::configure();
fork_gotcha::configure();
pthread_gotcha::configure();
}
auto
ensure_finalization(bool _static_init = false)
{
auto _main_tid = threading::get_id();
(void) _main_tid;
if(!_static_init)
{
OMNITRACE_DEBUG("[%s]\n", __FUNCTION__);
}
else
{
OMNITRACE_CONDITIONAL_PRINT(get_debug_env(), "[%s]\n", __FUNCTION__);
}
return scope::destructor{ []() { omnitrace_trace_finalize(); } };
}
@@ -127,9 +134,52 @@ omnitrace_init_tooling()
if(get_state() != State::PreInit || _once) return false;
_once = true;
auto _tid = threading::get_id();
(void) _tid;
auto _mode = tim::get_env<std::string>("OMNITRACE_MODE", "");
OMNITRACE_CONDITIONAL_BASIC_PRINT(true, "Instrumentation mode: %s\n", _mode.c_str());
// configure the settings
configure_settings();
if(gpu::device_count() == 0)
{
OMNITRACE_DEBUG("No HIP devices were found: disabling roctracer...\n");
get_use_roctracer() = false;
}
if(_mode == "sampling")
{
OMNITRACE_PRINT(
"Disabling perfetto, timemory, and critical trace in %s mode...\n",
_mode.c_str());
get_use_sampling() = true;
get_use_timemory() = false;
get_use_perfetto() = false;
get_use_roctracer() = false;
get_use_critical_trace() = false;
}
auto _dtor = scope::destructor{ []() {
if(get_use_sampling())
{
pthread_gotcha::enable_sampling_on_child_threads() = false;
sampling::setup();
pthread_gotcha::enable_sampling_on_child_threads() = true;
sampling::unblock_signals();
}
} };
if(get_use_sampling())
{
pthread_gotcha::enable_sampling_on_child_threads() = false;
sampling::block_signals();
}
OMNITRACE_DEBUG("[%s]\n", __FUNCTION__);
if(!get_use_timemory() && !get_use_perfetto())
if(!get_use_timemory() && !get_use_perfetto() && !get_use_sampling())
{
get_state() = State::Finalized;
OMNITRACE_DEBUG("[%s] Both perfetto and timemory are disabled. Setting the state "
@@ -138,20 +188,15 @@ omnitrace_init_tooling()
return false;
}
int _threadpool_verbose = (get_debug()) ? 4 : -1;
tasking::get_roctracer_thread_pool().set_verbose(_threadpool_verbose);
tasking::get_critical_trace_thread_pool().set_verbose(_threadpool_verbose);
// below will effectively do:
// get_cpu_cid_stack(0)->emplace_back(-1);
// plus query some env variables
add_critical_trace<Device::CPU, Phase::NONE>(0, -1, 0, 0, 0, 0, 0, 0);
// configure the settings
configure_settings();
tim::trait::runtime_enabled<comp::roctracer>::set(get_use_roctracer());
if(get_sample_rate() < 1) get_sample_rate() = 1;
get_sample_data().reserve(512);
if(get_instrumentation_interval() < 1) get_instrumentation_interval() = 1;
get_interval_data().reserve(512);
if(get_use_timemory())
{
@@ -178,7 +223,7 @@ omnitrace_init_tooling()
}
else
{
tim::trait::runtime_enabled<omnitrace>::set(false);
tim::trait::runtime_enabled<api::omnitrace>::set(false);
}
}
@@ -186,9 +231,13 @@ omnitrace_init_tooling()
auto& _main_bundle = get_main_bundle();
_main_bundle->start();
assert(_main_bundle->get<mpi_gotcha_t>()->get_is_running());
#if defined(OMNITRACE_USE_ROCTRACER)
assert(_main_bundle->get<comp::roctracer>() != nullptr);
assert(_main_bundle->get<comp::roctracer>()->get_is_running());
if(get_use_roctracer())
{
assert(_main_bundle->get<comp::roctracer>() != nullptr);
assert(_main_bundle->get<comp::roctracer>()->get_is_running());
}
#endif
perfetto::TracingInitArgs args{};
@@ -227,7 +276,13 @@ omnitrace_init_tooling()
omnitrace_thread_data<omnitrace_thread_bundle_t>::construct(
TIMEMORY_JOIN("", _exe, "/thread-", threading::get_id()),
quirk::config<quirk::auto_start>{});
if(get_use_sampling())
{
static thread_local auto _once = std::once_flag{};
std::call_once(_once, sampling::setup);
}
static thread_local auto _dtor = scope::destructor{ []() {
if(get_use_sampling()) sampling::shutdown();
omnitrace_thread_data<omnitrace_thread_bundle_t>::instance()->stop();
} };
(void) _dtor;
@@ -247,15 +302,7 @@ omnitrace_init_tooling()
static auto _push_perfetto = [](const char* name) {
_thread_init();
TRACE_EVENT_BEGIN("host", perfetto::StaticString(name),
[&](perfetto::EventContext ctx) {
// compile-time check
IF_CONSTEXPR(trait::is_available<papi_tot_ins>::value)
{
ctx.event()->set_thread_instruction_count_absolute(
papi_tot_ins::record().at(0));
}
});
TRACE_EVENT_BEGIN("host", perfetto::StaticString(name));
};
static auto _pop_timemory = [](const char* name) {
@@ -272,15 +319,7 @@ omnitrace_init_tooling()
_data.bundles.pop_back();
};
static auto _pop_perfetto = [](const char*) {
TRACE_EVENT_END("host", [&](perfetto::EventContext ctx) {
IF_CONSTEXPR(trait::is_available<papi_tot_ins>::value)
{
ctx.event()->set_thread_instruction_count_absolute(
papi_tot_ins::record().at(0));
}
});
};
static auto _pop_perfetto = [](const char*) { TRACE_EVENT_END("host"); };
if(get_use_perfetto() && get_use_timemory())
{
@@ -306,19 +345,50 @@ omnitrace_init_tooling()
if(dmp::rank() == 0)
{
static std::set<tim::string_view_t> _sample_options = {
"OMNITRACE_SAMPLING_FREQ", "OMNITRACE_SAMPLING_DELAY",
"OMNITRACE_FLAT_SAMPLING", "OMNITRACE_TIMELINE_SAMPLING",
"OMNITRACE_FLAT_SAMPLING", "OMNITRACE_TIMELINE_SAMPLING",
};
static std::set<tim::string_view_t> _perfetto_options = {
"OMNITRACE_OUTPUT_FILE",
"OMNITRACE_BACKEND",
"OMNITRACE_SHMEM_SIZE_HINT_KB",
"OMNITRACE_BUFFER_SIZE_KB",
};
static std::set<tim::string_view_t> _timemory_options = {
"OMNITRACE_ROCTRACER_FLAT_PROFILE", "OMNITRACE_ROCTRACER_TIMELINE_PROFILE"
};
// generic filter for filtering relevant options
auto _is_omnitrace_option = [](const auto& _v) {
#if !defined(OMNITRACE_USE_ROCTRACER)
if(_v.find("OMNITRACE_ROCTRACER_") == 0) return false;
#endif
auto _is_omnitrace_option = [](const auto& _v, const auto& _c) {
if(!get_use_roctracer() && _v.find("OMNITRACE_ROCTRACER_") == 0) return false;
if(!get_use_critical_trace() && _v.find("OMNITRACE_CRITICAL_TRACE_") == 0)
return false;
return (_v.find("OMNITRACE_") == 0) ||
((_v.find("TIMEMORY_") != 0) && (_v.find("SIGNAL_") != 0));
if(!get_use_perfetto() && _perfetto_options.count(_v) > 0) return false;
if(!get_use_timemory() && _timemory_options.count(_v) > 0) return false;
if(!get_use_sampling() && _sample_options.count(_v) > 0) return false;
const auto npos = std::string::npos;
if(_v.find("WIDTH") != npos || _v.find("SEPARATOR_FREQ") != npos ||
_v.find("AUTO_OUTPUT") != npos || _v.find("DART_OUTPUT") != npos ||
_v.find("FILE_OUTPUT") != npos || _v.find("PLOT_OUTPUT") != npos ||
_v.find("FLAMEGRAPH_OUTPUT") != npos)
return false;
if(!_c.empty())
{
if(_c.find("omnitrace") != _c.end()) return true;
if(_c.find("debugging") != _c.end() && _v.find("DEBUG") != npos)
return true;
if(_c.find("config") != _c.end()) return true;
if(_c.find("dart") != _c.end()) return false;
if(_c.find("io") != _c.end() && _v.find("_OUTPUT") != npos) return true;
if(_c.find("format") != _c.end()) return true;
return false;
}
return (_v.find("OMNITRACE_") == 0);
};
tim::print_env(std::cerr, [_is_omnitrace_option](const std::string& _v) {
return _is_omnitrace_option(_v);
return _is_omnitrace_option(_v, std::set<std::string>{});
});
print_config_settings(std::cerr, _is_omnitrace_option);
@@ -354,6 +424,7 @@ omnitrace_init_tooling()
static auto _ensure_finalization = ensure_finalization();
if(dmp::rank() == 0) puts("");
return true;
}
} // namespace
@@ -378,10 +449,10 @@ extern "C"
OMNITRACE_DEBUG("[%s] %s\n", __FUNCTION__, name);
}
static auto _sample_rate = std::max<size_t>(get_sample_rate(), 1);
static thread_local size_t _sample_idx = 0;
auto _enabled = (_sample_idx++ % _sample_rate == 0);
get_sample_data().emplace_back(_enabled);
static auto _sample_rate = std::max<size_t>(get_instrumentation_interval(), 1);
static thread_local size_t _sample_idx = 0;
auto _enabled = (_sample_idx++ % _sample_rate == 0);
get_interval_data().emplace_back(_enabled);
if(_enabled) get_functors().first(name);
if(get_use_critical_trace())
{
@@ -405,24 +476,27 @@ extern "C"
if(get_state() == State::Active)
{
OMNITRACE_DEBUG("[%s] %s\n", __FUNCTION__, name);
auto& _sample_data = get_sample_data();
if(!_sample_data.empty())
auto& _interval_data = get_interval_data();
if(!_interval_data.empty())
{
if(_sample_data.back()) get_functors().second(name);
_sample_data.pop_back();
if(_interval_data.back()) get_functors().second(name);
_interval_data.pop_back();
}
if(get_use_critical_trace())
{
if(get_cpu_cid_stack() && !get_cpu_cid_stack()->empty())
{
auto _ts = comp::wall_clock::record();
auto _cid = get_cpu_cid_stack()->back();
uint64_t _parent_cid = 0;
uint16_t _depth = 0;
std::tie(_parent_cid, _depth) = get_cpu_cid_parents().at(_cid);
add_critical_trace<Device::CPU, Phase::END>(
threading::get_id(), _cid, 0, _parent_cid, _ts, _ts,
critical_trace::add_hash_id(name), _depth);
auto _cid = get_cpu_cid_stack()->back();
if(get_cpu_cid_parents().find(_cid) != get_cpu_cid_parents().end())
{
uint64_t _parent_cid = 0;
uint16_t _depth = 0;
auto _ts = comp::wall_clock::record();
std::tie(_parent_cid, _depth) = get_cpu_cid_parents().at(_cid);
add_critical_trace<Device::CPU, Phase::END>(
threading::get_id(), _cid, 0, _parent_cid, _ts, _ts,
critical_trace::add_hash_id(name), _depth);
}
}
}
}
@@ -432,9 +506,10 @@ extern "C"
}
}
void omnitrace_trace_init(const char*, bool, const char*)
void omnitrace_trace_init(const char* _info, bool _b, const char* _extra)
{
OMNITRACE_DEBUG("[%s]\n", __FUNCTION__);
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s | %s | %s\n",
__FUNCTION__, _info, (_b) ? "y" : "n", _extra);
omnitrace_init_tooling();
}
@@ -445,13 +520,26 @@ extern "C"
OMNITRACE_DEBUG("[%s]\n", __FUNCTION__);
if(get_use_sampling())
{
OMNITRACE_DEBUG("[%s] Shutting down sampling...\n", __FUNCTION__);
pthread_gotcha::enable_sampling_on_child_threads() = false;
sampling::shutdown();
sampling::block_signals();
}
int _threadpool_verbose = (get_debug()) ? 4 : -1;
tasking::get_roctracer_thread_pool().set_verbose(_threadpool_verbose);
tasking::get_critical_trace_thread_pool().set_verbose(_threadpool_verbose);
if(dmp::rank() == 0) puts("");
get_state() = State::Finalized;
#if defined(OMNITRACE_USE_ROCTRACER)
OMNITRACE_DEBUG("[%s] Shutting down roctracer...\n", __FUNCTION__);
// ensure that threads running roctracer callbacks shutdown
comp::roctracer::tear_down();
if(get_use_roctracer()) comp::roctracer::tear_down();
#endif
// join extra thread(s) used by roctracer
@@ -459,6 +547,7 @@ extern "C"
__FUNCTION__);
tasking::get_roctracer_task_group().join();
OMNITRACE_DEBUG("[%s] Stopping main bundle...\n", __FUNCTION__);
// stop the main bundle and report the high-level metrics
if(get_main_bundle())
{
@@ -474,6 +563,7 @@ extern "C"
// if they are still running (e.g. thread-pool still alive), the
// thread-specific data will be wrong if try to stop them from
// the main thread.
OMNITRACE_DEBUG("[%s] Destroying thread bundle data...\n", __FUNCTION__);
for(auto& itr : omnitrace_thread_data<omnitrace_thread_bundle_t>::instances())
{
if(itr && itr->get<comp::wall_clock>() &&
@@ -487,6 +577,8 @@ extern "C"
}
// ensure that all the MT instances are flushed
OMNITRACE_DEBUG("[%s] Stopping and destroying instrumentation bundles...\n",
__FUNCTION__);
for(auto& itr : instrumentation_bundles::instances())
{
while(!itr.bundles.empty())
@@ -499,8 +591,21 @@ extern "C"
}
}
// ensure that all the MT instances are flushed
if(get_use_sampling())
{
OMNITRACE_DEBUG("[%s] Post-processing the sampling backtraces...\n",
__FUNCTION__);
for(size_t i = 0; i < max_supported_threads; ++i)
{
sampling::backtrace::post_process(i);
sampling::get_sampler(i).reset();
}
}
if(get_use_critical_trace())
{
OMNITRACE_DEBUG("[%s] Generating the critical trace...\n", __FUNCTION__);
// increase the thread-pool size
tasking::get_critical_trace_thread_pool().initialize_threadpool(
get_critical_trace_num_threads());
@@ -540,12 +645,16 @@ extern "C"
bool _perfetto_output_error = false;
if(get_use_perfetto() && !is_system_backend())
{
OMNITRACE_DEBUG("[%s] Flushing perfetto...\n", __FUNCTION__);
// Make sure the last event is closed for this example.
perfetto::TrackEvent::Flush();
auto& tracing_session = get_trace_session();
OMNITRACE_DEBUG("[%s] Stopping the blocking perfetto trace sessions...\n",
__FUNCTION__);
tracing_session->StopBlocking();
OMNITRACE_DEBUG("[%s] Getting the trace data...\n", __FUNCTION__);
std::vector<char> trace_data{ tracing_session->ReadTraceBlocking() };
if(trace_data.empty())
@@ -558,7 +667,7 @@ extern "C"
// Write the trace into a file.
fprintf(stderr,
"[%s]> Outputting '%s'. Trace data: %lu B (%.2f KB / %.2f MB / %.2f "
"GB)...\n",
"GB)... ",
__FUNCTION__, get_perfetto_output_filename().c_str(),
(unsigned long) trace_data.size(),
static_cast<double>(trace_data.size()) / units::KB,
@@ -568,23 +677,33 @@ extern "C"
if(!tim::filepath::open(ofs, get_perfetto_output_filename(),
std::ios::out | std::ios::binary))
{
fprintf(stderr, "[%s]> Error opening '%s'...\n", __FUNCTION__,
fprintf(stderr, "\n[%s]> Error opening '%s'...\n", __FUNCTION__,
get_perfetto_output_filename().c_str());
_perfetto_output_error = true;
}
else
{
// Write the trace into a file.
fprintf(stderr, "Done\n");
ofs.write(&trace_data[0], trace_data.size());
}
ofs.close();
}
// these should be destroyed before timemory is finalized, especially the
// roctracer thread-pool
OMNITRACE_DEBUG("[%s] Destroing the thread pools...\n", __FUNCTION__);
tasking::get_roctracer_thread_pool().destroy_threadpool();
tasking::get_critical_trace_thread_pool().destroy_threadpool();
OMNITRACE_DEBUG("Finalizing timemory...\n");
if(get_use_sampling())
static_cast<tim::tsettings<bool>*>(
tim::settings::instance()->find("OMNITRACE_DEBUG")->second.get())
->set(false);
OMNITRACE_DEBUG("[%s] Finalizing timemory...\n", __FUNCTION__);
tim::timemory_finalize();
OMNITRACE_DEBUG("Finalizing timemory... Done\n");
OMNITRACE_DEBUG("[%s] Finalizing timemory... Done\n", __FUNCTION__);
if(_perfetto_output_error)
throw std::runtime_error("Unable to create perfetto output file");
@@ -592,25 +711,32 @@ extern "C"
void omnitrace_trace_set_env(const char* env_name, const char* env_val)
{
OMNITRACE_DEBUG("[%s] Setting env: %s=%s\n", __FUNCTION__, env_name, env_val);
// just search env to avoid initializing the settings
OMNITRACE_CONDITIONAL_PRINT(get_debug_env(), "[%s] Setting env: %s=%s\n",
__FUNCTION__, env_name, env_val);
tim::set_env(env_name, env_val, 0);
}
void omnitrace_trace_set_mpi(bool use, bool attached)
{
OMNITRACE_DEBUG("[%s] use: %s, attached: %s\n", __FUNCTION__, (use) ? "y" : "n",
(attached) ? "y" : "n");
if(use && !attached)
// just search env to avoid initializing the settings
OMNITRACE_CONDITIONAL_PRINT(get_debug_env(), "[%s] use: %s, attached: %s\n",
__FUNCTION__, (use) ? "y" : "n",
(attached) ? "y" : "n");
if(use && !attached &&
(get_state() == State::PreInit || get_state() == State::DelayedInit))
{
auto& _main_bundle = get_main_bundle();
_main_bundle->start();
get_use_pid() = true;
get_state() = State::DelayedInit;
tim::set_env("OMNITRACE_USE_PID", "ON", 1);
get_state() = State::DelayedInit;
}
}
}
namespace omnitrace
{
std::unique_ptr<main_bundle_t>&
get_main_bundle()
{
@@ -619,6 +745,7 @@ get_main_bundle()
"omnitrace", quirk::config<quirk::auto_start>{}));
return _v;
}
} // namespace omnitrace
namespace
{
+590
Просмотреть файл
@@ -0,0 +1,590 @@
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// with the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
// sell copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// * Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimers.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimers in the
// documentation and/or other materials provided with the distribution.
//
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this Software without specific prior written permission.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#include "library/components/fwd.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/ptl.hpp"
#include "library/sampling.hpp"
#include <timemory/backends/papi.hpp>
#include <timemory/backends/threading.hpp>
#include <timemory/components/data_tracker/components.hpp>
#include <timemory/components/macros.hpp>
#include <timemory/components/papi/extern.hpp>
#include <timemory/components/papi/papi_array.hpp>
#include <timemory/components/papi/papi_vector.hpp>
#include <timemory/components/timing/backends.hpp>
#include <timemory/components/trip_count/extern.hpp>
#include <timemory/macros.hpp>
#include <timemory/math.hpp>
#include <timemory/mpl.hpp>
#include <timemory/mpl/quirks.hpp>
#include <timemory/mpl/type_traits.hpp>
#include <timemory/operations.hpp>
#include <timemory/sampling/allocator.hpp>
#include <timemory/sampling/sampler.hpp>
#include <timemory/storage.hpp>
#include <timemory/utility/backtrace.hpp>
#include <timemory/utility/demangle.hpp>
#include <timemory/utility/types.hpp>
#include <timemory/variadic.hpp>
#include <array>
#include <cstring>
#include <ctime>
#include <initializer_list>
#include <mutex>
#include <regex>
#include <sstream>
#include <string>
#include <type_traits>
#include <pthread.h>
#include <signal.h>
namespace
{
template <typename... Tp>
struct ensure_storage
{
TIMEMORY_DEFAULT_OBJECT(ensure_storage)
void operator()() const { TIMEMORY_FOLD_EXPRESSION((*this)(tim::type_list<Tp>{})); }
private:
template <typename Up, std::enable_if_t<tim::trait::is_available<Up>::value, int> = 0>
void operator()(tim::type_list<Up>) const
{
using namespace tim;
static thread_local auto _storage = operation::get_storage<Up>{}();
static thread_local auto _tid = threading::get_id();
static thread_local auto _dtor =
scope::destructor{ []() { operation::set_storage<Up>{}(nullptr, _tid); } };
tim::operation::set_storage<Up>{}(_storage, _tid);
if(_tid == 0 && !_storage) tim::trait::runtime_enabled<Up>::set(false);
}
template <typename Up,
std::enable_if_t<!tim::trait::is_available<Up>::value, long> = 0>
void operator()(tim::type_list<Up>) const
{
tim::trait::runtime_enabled<Up>::set(false);
}
};
} // namespace
namespace omnitrace
{
namespace component
{
using signal_type_instances = omnitrace_thread_data<std::set<int>, api::sampling>;
using backtrace_init_instances = omnitrace_thread_data<backtrace, api::sampling>;
using sampler_running_instances = omnitrace_thread_data<bool, api::sampling>;
using papi_vector_instances = omnitrace_thread_data<comp::papi_vector, api::sampling>;
namespace
{
std::unique_ptr<comp::papi_vector>&
get_papi_vector(int64_t _tid)
{
static auto& _v = papi_vector_instances::instances();
if(_tid == threading::get_id()) papi_vector_instances::construct();
return _v.at(_tid);
}
std::unique_ptr<backtrace>&
get_backtrace_init(int64_t _tid)
{
static auto& _v = backtrace_init_instances::instances();
return _v.at(_tid);
}
std::unique_ptr<bool>&
get_sampler_running(int64_t _tid)
{
static auto& _v = sampler_running_instances::instances();
return _v.at(_tid);
}
std::unique_ptr<std::set<int>>&
get_signal_types(int64_t _tid)
{
static auto& _v = signal_type_instances::instances();
// on the main thread, use both SIGALRM and SIGPROF.
// on secondary threads, only use SIGPROF.
signal_type_instances::construct((_tid == 0) ? std::set<int>{ SIGALRM, SIGPROF }
: std::set<int>{ SIGPROF });
return _v.at(_tid);
}
} // namespace
bool
backtrace::operator<(const backtrace& rhs) const
{
return (m_ts == rhs.m_ts) ? (m_tid < rhs.m_tid) : (m_ts < rhs.m_ts);
}
std::vector<std::string>
backtrace::get() const
{
std::vector<std::string> _v{};
_v.reserve(m_size);
for(size_t i = 0; i < m_size; ++i)
_v.emplace_back(m_data.at(i));
return _v;
}
void
backtrace::preinit()
{
sampling_wall_clock::label() = "sampling_wall_clock";
sampling_wall_clock::description() = "Wall clock time (via sampling)";
sampling_cpu_clock::label() = "sampling_cpu_clock";
sampling_cpu_clock::description() = "CPU clock time (via sampling)";
sampling_percent::label() = "sampling_percent";
sampling_percent::description() = "Percentage of samples";
sampling_percent::display_unit() = "%";
}
std::string
backtrace::label()
{
return "backtrace";
}
std::string
backtrace::description()
{
return "Records backtrace data";
}
void
backtrace::start()
{}
void
backtrace::stop()
{}
bool
backtrace::empty() const
{
return (m_size == 0);
}
size_t
backtrace::size() const
{
return m_size;
}
backtrace::time_point_type
backtrace::get_timestamp() const
{
return m_ts;
}
int64_t
backtrace::get_thread_cpu_timestamp() const
{
return m_thr_cpu_ts;
}
void
backtrace::sample(int signum)
{
static bool _debug = tim::get_env<bool>("OMNITRACE_DEBUG_SAMPLING", get_debug());
if(_debug)
{
static auto _timestamp_str = [](const auto& _tp) {
char _repr[64];
std::memset(_repr, '\0', sizeof(_repr));
std::time_t _value = system_clock::to_time_t(_tp);
// alternative: "%c %Z"
if(std::strftime(_repr, sizeof(_repr), "%a %b %d %T %Y %Z",
std::localtime(&_value)) > 0)
return std::string{ _repr };
return std::string{};
};
static thread_local size_t _tot = 0;
static thread_local auto _last = system_clock::now();
auto _now = system_clock::now();
auto _diff = (_now - _last).count();
_last = _now;
_tot += _diff;
OMNITRACE_PRINT(
"Sample on signal %i taken at %s after interval %zu :: total %zu\n", signum,
_timestamp_str(_now).c_str(), _diff, _tot);
}
m_size = 0;
m_tid = threading::get_id();
m_ts = clock_type::now();
m_thr_cpu_ts = tim::get_clock_thread_now<int64_t, std::nano>();
m_data = tim::get_unw_backtrace<128, 4, false>();
auto* itr = m_data.begin();
for(; itr != m_data.end(); ++itr, ++m_size)
{
if(strlen(*itr) == 0) break;
}
std::reverse(m_data.begin(), itr);
if(!get_debug())
{
bool _ignore = false;
for(auto& itr : m_data)
{
if(strlen(itr) == 0) break;
if(strncmp(itr, "funlockfile", 11) == 0) _ignore = true;
if(_ignore && strlen(itr) > 0)
{
OMNITRACE_DEBUG("Discarding sample: '%s'...\n", itr);
itr[0] = '\0';
--m_size;
}
}
}
if constexpr(tim::trait::is_available<comp::papi_vector>::value)
{
assert(get_papi_vector(m_tid).get() != nullptr);
static thread_local auto& _pv = get_papi_vector(m_tid);
auto _hw_counter = _pv->record();
for(size_t i = 0; i < std::min<size_t>(_hw_counter.size(), num_hw_counters); ++i)
{
auto& _last = get_last_hwcounters().at(i);
auto itr = _hw_counter.at(i);
m_hw_counter[i] = itr - _last;
_last = itr;
}
}
}
std::set<int>
backtrace::configure(bool _setup, int64_t _tid)
{
auto& _sampler = sampling::get_sampler(_tid);
auto& _running = get_sampler_running(_tid);
bool _is_running = (!_running) ? false : *_running;
auto& _signal_types = get_signal_types(_tid);
ensure_storage<comp::trip_count, sampling_wall_clock, sampling_cpu_clock, hw_counters,
sampling_percent>{}();
if(_setup && !_sampler && !_is_running)
{
assert(_tid == threading::get_id());
sampling::block_signals(*_signal_types);
if constexpr(tim::trait::is_available<comp::papi_vector>::value)
{
OMNITRACE_DEBUG("HW COUNTER: starting...\n");
if(get_papi_vector(_tid)) get_papi_vector(_tid)->start();
}
auto _alrm_freq = 1.0 / std::min<double>(get_sampling_freq(), 10.0);
auto _prof_freq = 1.0 / get_sampling_freq();
auto _delay = std::max<double>(1.0e-3, get_sampling_delay());
OMNITRACE_DEBUG("Configuring sampler for thread %lu...\n", _tid);
sampler_running_instances::construct(true);
backtrace_init_instances::construct();
sampling::sampler_instances::construct("omnitrace", _tid, *_signal_types);
_sampler->set_signals(*_signal_types);
_sampler->set_flags(SA_RESTART);
_sampler->set_delay(_delay);
_sampler->set_frequency(_prof_freq, { SIGPROF });
_sampler->set_frequency(_alrm_freq, { SIGALRM });
OMNITRACE_DEBUG("Sampler for thread %lu will be triggered %5.1fx per second "
"(every %5.2e seconds)...\n",
_tid, _sampler->get_frequency(units::sec),
_sampler->get_rate(units::sec));
(void) sampling::sampler_t::get_samplers(_tid);
get_backtrace_init(_tid)->sample();
_sampler->configure(false);
_sampler->start();
}
else if(!_setup && _sampler && _is_running)
{
OMNITRACE_DEBUG("Destroying sampler for thread %lu...\n", _tid);
*_running = false;
if(_tid == threading::get_id())
{
sampling::block_signals(*_signal_types);
}
// this propagates to all threads
if(_tid == 0) _sampler->ignore(*_signal_types);
_sampler->stop();
_sampler->swap_data();
if constexpr(tim::trait::is_available<comp::papi_vector>::value)
{
if(_tid == threading::get_id())
{
if(get_papi_vector(_tid)) get_papi_vector(_tid)->stop();
OMNITRACE_DEBUG("HW COUNTER: stopped...\n");
}
}
}
return (_signal_types) ? *_signal_types : std::set<int>{};
}
backtrace::hw_counter_data_t&
backtrace::get_last_hwcounters()
{
static thread_local auto _v = hw_counter_data_t{ 0 };
return _v;
}
void
backtrace::post_process(int64_t _tid)
{
configure(false, _tid);
auto& _sampler = sampling::sampler_instances::instances().at(_tid);
if(!_sampler)
{
// this should be relatively common
OMNITRACE_DEBUG(
"Post-processing sampling entries for thread %lu skipped (no sampler)\n",
_tid);
return;
}
auto& _init = backtrace_init_instances::instances().at(_tid);
if(!_init)
{
// this is not common
OMNITRACE_PRINT(
"Post-processing sampling entries for thread %lu skipped (not initialized)\n",
_tid);
return;
}
// check whether the call-stack entry should be used. -1 means break, 0 means continue
auto _use_label = [](const std::string& _lbl, bool _check_internal) -> short {
// debugging feature
static bool _keep_internal =
tim::get_env<bool>("OMNITRACE_SAMPLING_KEEP_INTERNAL", get_debug());
if(_keep_internal) return 1;
const auto _npos = std::string::npos;
if(_lbl.find("omnitrace_init_tooling") != _npos) return -1;
if(_check_internal)
{
if(std::regex_search(
_lbl, std::regex("(14pthread_gotcha7wrapper|default_error_condition)",
std::regex_constants::optimize)))
return 0;
else if(std::regex_search(
_lbl, std::regex("(8sampling9backtrace9configure|"
"8sampling15unblock_signals|pthread_sigmask)",
std::regex_constants::optimize)))
return 0;
}
return 1;
};
// in the dyninst binary rewrite runtime, instrumented functions are appended with
// "_dyninst", i.e. "main" will show up as "main_dyninst" in the backtrace.
auto _patch_label = [](std::string _lbl) -> std::string {
// debugging feature
static bool _keep_suffix =
tim::get_env<bool>("OMNITRACE_SAMPLING_KEEP_DYNINST_SUFFIX", get_debug());
if(_keep_suffix) return _lbl;
const std::string _dyninst{ "_dyninst" };
auto _pos = _lbl.find(_dyninst);
if(_pos == std::string::npos) return _lbl;
return _lbl.replace(_pos, _dyninst.length(), "");
};
auto _data = _sampler->get_allocator().get_data();
// single sample that is useless (backtrace to unblocking signals)
if(_data.size() == 1 && _data.front().size() <= 1) _data.clear();
OMNITRACE_DEBUG("Post-processing %zu sampling entries for thread %lu...\n",
_data.size(), _tid);
std::map<int64_t, std::map<int64_t, int64_t>> _depth_sum = {};
auto _scope = tim::scope::config{};
if(get_timeline_sampling()) _scope += scope::timeline{};
if(get_flat_sampling()) _scope += scope::flat{};
time_point_type _last_wall_ts = _init->get_timestamp();
int64_t _last_cpu_ts = _init->get_thread_cpu_timestamp();
for(auto& ditr : _data)
{
using bundle_t = tim::lightweight_tuple<comp::trip_count, sampling_wall_clock,
sampling_cpu_clock, hw_counters>;
for(auto& ritr : ditr)
{
auto* _bt = ritr.get<backtrace>();
if(!_bt)
{
OMNITRACE_PRINT(
"Warning! Nullptr to backtrace instance for thread %lu...\n", _tid);
continue;
}
if(_bt->empty()) continue;
double _elapsed_wc = (_bt->m_ts - _last_wall_ts).count();
double _elapsed_cc = (_bt->m_thr_cpu_ts - _last_cpu_ts);
std::vector<bundle_t> _tc{};
_tc.reserve(_bt->size());
// generate the instances of the tuple of components and start them
for(const auto& itr : _bt->get())
{
auto _lbl = _patch_label(itr);
auto _use = _use_label(_lbl, !_tc.empty() &&
(_tc.back().key() == "start_thread" ||
_tc.back().key() == "clone"));
if(_use == -1) break;
if(_use == 0) continue;
_tc.emplace_back(tim::string_view_t{ _lbl }, _scope);
_tc.back().push(_bt->m_tid);
_tc.back().start();
}
// stop the instances and update the values as needed
for(size_t i = 0; i < _tc.size(); ++i)
{
auto& itr = _tc.at(_tc.size() - i - 1);
size_t _depth = 0;
_depth_sum[_bt->m_tid][_depth] += 1;
itr.stop();
if constexpr(tim::trait::is_available<sampling_wall_clock>::value)
{
auto* _sc = itr.get<sampling_wall_clock>();
if(_sc)
{
auto _value = _elapsed_wc / sampling_wall_clock::get_unit();
_sc->set_value(_value);
_sc->set_accum(_value);
}
}
if constexpr(tim::trait::is_available<sampling_cpu_clock>::value)
{
auto* _cc = itr.get<sampling_cpu_clock>();
if(_cc)
{
_cc->set_value(_elapsed_cc / sampling_cpu_clock::get_unit());
_cc->set_accum(_elapsed_cc / sampling_cpu_clock::get_unit());
}
}
if constexpr(tim::trait::is_available<hw_counters>::value)
{
auto* _hw_counter = itr.get<hw_counters>();
if(_hw_counter)
{
_hw_counter->set_value(_bt->m_hw_counter);
_hw_counter->set_accum(_bt->m_hw_counter);
}
}
itr.pop();
}
_last_wall_ts = _bt->m_ts;
_last_cpu_ts = _bt->m_thr_cpu_ts;
}
}
namespace quirk = tim::quirk;
for(auto&& ditr : _data)
{
using bundle_t =
tim::lightweight_tuple<sampling_percent, quirk::config<quirk::tree_scope>>;
for(auto& ritr : ditr)
{
auto* _bt = ritr.get<backtrace>();
if(!_bt)
{
OMNITRACE_PRINT(
"Warning! Nullptr to backtrace instance for thread %lu...\n", _tid);
continue;
}
if(_bt->empty()) continue;
std::vector<bundle_t> _tc{};
_tc.reserve(_bt->size());
// generate the instances of the tuple of components and start them
for(const auto& itr : _bt->get())
{
auto _lbl = _patch_label(itr);
auto _use =
_use_label(_lbl, !_tc.empty() && _tc.back().key() == "start_thread");
if(_use == -1) break;
if(_use == 0) continue;
_tc.emplace_back(tim::string_view_t{ _lbl });
_tc.back().push(_bt->m_tid);
_tc.back().start();
}
// stop the instances and update the values as needed
for(size_t i = 0; i < _tc.size(); ++i)
{
auto& itr = _tc.at(_tc.size() - i - 1);
size_t _depth = 0;
double _value = (1.0 / _depth_sum[_bt->m_tid][_depth]) * 100.0;
itr.store(std::plus<double>{}, _value);
itr.stop();
itr.pop();
}
}
}
}
} // namespace component
} // namespace omnitrace
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_wall_clock>), true,
double)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_cpu_clock>), true,
double)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_fraction>), true,
double)
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::backtrace)
@@ -26,21 +26,38 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#include "library/fork_gotcha.hpp"
#include "library/components/fork_gotcha.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
namespace omnitrace
{
void
fork_gotcha::configure()
{
fork_gotcha_t::get_initializer() = []() {
TIMEMORY_C_GOTCHA(fork_gotcha_t, 0, fork);
};
pthread_gotcha_t::get_initializer() = []() {
TIMEMORY_C_GOTCHA(pthread_gotcha_t, 0, pthread_create);
};
}
void
fork_gotcha::audit(const gotcha_data_t&, audit::incoming)
{
OMNITRACE_DEBUG(
OMNITRACE_CONDITIONAL_BASIC_PRINT(
get_debug_env(),
"Warning! Calling fork() within an OpenMPI application using libfabric "
"may result is segmentation fault\n");
TIMEMORY_CONDITIONAL_DEMANGLED_BACKTRACE(get_debug(), 16);
TIMEMORY_CONDITIONAL_DEMANGLED_BACKTRACE(get_debug_env(), 16);
}
void
fork_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, pid_t _pid)
{
OMNITRACE_DEBUG("%s() return PID %i\n", _data.tool_id.c_str(), (int) _pid);
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "%s() return PID %i\n",
_data.tool_id.c_str(), (int) _pid);
}
} // namespace omnitrace
@@ -26,11 +26,14 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#include "library/mpi_gotcha.hpp"
#include "library/components/mpi_gotcha.hpp"
#include "library/components/omnitrace.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/omnitrace_component.hpp"
#include "timemory/backends/mpi.hpp"
namespace omnitrace
{
namespace
{
uint64_t mpip_index = std::numeric_limits<uint64_t>::max();
@@ -45,10 +48,11 @@ omnitrace_mpi_set_attr()
return MPI_SUCCESS;
};
static auto _mpi_fini = [](MPI_Comm, int, void*, void*) {
OMNITRACE_DEBUG("MPI Comm attribute finalize\n");
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(),
"MPI Comm attribute finalize\n");
if(mpip_index != std::numeric_limits<uint64_t>::max())
comp::deactivate_mpip<tim::component_tuple<omnitrace_component>, omnitrace>(
mpip_index);
comp::deactivate_mpip<tim::component_tuple<omnitrace::component::omnitrace>,
api::omnitrace>(mpip_index);
if(!mpi_init_string.empty()) omnitrace_pop_trace(mpi_init_string.c_str());
mpi_init_string = {};
omnitrace_trace_finalize();
@@ -65,46 +69,75 @@ omnitrace_mpi_set_attr()
}
} // namespace
void
mpi_gotcha::configure()
{
mpi_gotcha_t::get_initializer() = []() {
mpi_gotcha_t::template configure<0, int, int*, char***>("MPI_Init");
mpi_gotcha_t::template configure<1, int, int*, char***, int, int*>(
"MPI_Init_thread");
mpi_gotcha_t::template configure<2, int>("MPI_Finalize");
mpi_gotcha_t::template configure<3, int, tim::mpi::comm_t, int*>("MPI_Comm_rank");
mpi_gotcha_t::template configure<4, int, tim::mpi::comm_t, int*>("MPI_Comm_size");
};
}
void
mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, int*, char***)
{
OMNITRACE_DEBUG("[%s] %s(int*, char***)\n", __FUNCTION__, _data.tool_id.c_str());
if(get_state() == ::State::DelayedInit)
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s(int*, char***)\n",
__FUNCTION__, _data.tool_id.c_str());
if(get_state() == ::omnitrace::State::DelayedInit)
{
get_state() = ::State::PreInit;
get_state() = ::omnitrace::State::PreInit;
mpi_init_string = _data.tool_id;
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
tim::mpi::is_initialized_callback() = []() { return true; };
tim::mpi::is_finalized() = false;
#endif
}
}
void
mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, int*, char***, int, int*)
{
OMNITRACE_DEBUG("[%s] %s(int*, char***, int, int*)\n", __FUNCTION__,
_data.tool_id.c_str());
if(get_state() == ::State::DelayedInit)
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(),
"[%s] %s(int*, char***, int, int*)\n", __FUNCTION__,
_data.tool_id.c_str());
if(get_state() == ::omnitrace::State::DelayedInit)
{
get_state() = ::State::PreInit;
get_state() = ::omnitrace::State::PreInit;
mpi_init_string = _data.tool_id;
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
tim::mpi::is_initialized_callback() = []() { return true; };
tim::mpi::is_finalized() = false;
#endif
}
}
void
mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming)
{
OMNITRACE_DEBUG("[%s] %s()\n", __FUNCTION__, _data.tool_id.c_str());
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s()\n", __FUNCTION__,
_data.tool_id.c_str());
if(mpip_index != std::numeric_limits<uint64_t>::max())
comp::deactivate_mpip<tim::component_tuple<omnitrace_component>, omnitrace>(
mpip_index);
comp::deactivate_mpip<tim::component_tuple<omnitrace::component::omnitrace>,
api::omnitrace>(mpip_index);
if(!mpi_init_string.empty()) omnitrace_pop_trace(mpi_init_string.c_str());
mpi_init_string = {};
omnitrace_trace_finalize();
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
tim::mpi::is_initialized_callback() = []() { return false; };
tim::mpi::is_finalized() = true;
#endif
}
void
mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, comm_t _comm, int* _val)
{
OMNITRACE_DEBUG("[%s] %s()\n", __FUNCTION__, _data.tool_id.c_str());
m_comm = _comm;
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s()\n", __FUNCTION__,
_data.tool_id.c_str());
m_comm = &_comm;
if(_data.tool_id == "MPI_Comm_rank")
{
m_rank = _val;
@@ -123,9 +156,9 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, comm_t _comm, int
void
mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
{
OMNITRACE_DEBUG("[%s] %s() returned %i\n", __FUNCTION__, _data.tool_id.c_str(),
(int) _retval);
if(_retval == tim::mpi::success_v && get_state() == ::State::PreInit &&
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s() returned %i\n",
__FUNCTION__, _data.tool_id.c_str(), (int) _retval);
if(_retval == tim::mpi::success_v && get_state() == ::omnitrace::State::PreInit &&
_data.tool_id.find("MPI_Init") == 0)
{
omnitrace_mpi_set_attr();
@@ -136,19 +169,27 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
// were excluded via a regex expression)
if(get_use_mpip())
{
OMNITRACE_DEBUG("[%s] Activating MPI wrappers...\n", __FUNCTION__);
comp::configure_mpip<tim::component_tuple<omnitrace_component>, omnitrace>();
mpip_index = comp::activate_mpip<tim::component_tuple<omnitrace_component>,
omnitrace>();
OMNITRACE_CONDITIONAL_BASIC_PRINT(
get_debug_env(), "[%s] Activating MPI wrappers...\n", __FUNCTION__);
comp::configure_mpip<tim::component_tuple<omnitrace::component::omnitrace>,
api::omnitrace>();
mpip_index =
comp::activate_mpip<tim::component_tuple<omnitrace::component::omnitrace>,
api::omnitrace>();
}
omnitrace_push_trace(_data.tool_id.c_str());
}
else if(_retval == tim::mpi::success_v && _data.tool_id.find("MPI_Comm_") == 0)
{
/*if(_data.tool_id == "MPI_Comm_rank")
if(_data.tool_id == "MPI_Comm_rank")
{
if(m_rank)
tim::mpi::set_rank(*m_rank, m_comm);
{
tim::mpi::set_rank(*m_rank, *static_cast<comm_t*>(m_comm));
tim::settings::default_process_suffix() = *m_rank;
get_perfetto_output_filename().clear();
(void) get_perfetto_output_filename();
}
else
{
OMNITRACE_PRINT("[%s] %s() returned %i :: nullptr to rank\n",
@@ -158,7 +199,7 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
else if(_data.tool_id == "MPI_Comm_size")
{
if(m_size)
tim::mpi::set_size(*m_size, m_comm);
tim::mpi::set_size(*m_size, *static_cast<comm_t*>(m_comm));
else
{
OMNITRACE_PRINT("[%s] %s() returned %i :: nullptr to size\n",
@@ -169,8 +210,9 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
{
OMNITRACE_PRINT("[%s] %s() returned %i :: unexpected function wrapper\n",
__FUNCTION__, _data.tool_id.c_str(), (int) _retval);
}*/
}
}
}
} // namespace omnitrace
TIMEMORY_INITIALIZE_STORAGE(mpi_gotcha)
TIMEMORY_INITIALIZE_STORAGE(omnitrace::mpi_gotcha)
@@ -26,25 +26,31 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#include "library/omnitrace_component.hpp"
#include "library/components/omnitrace.hpp"
#include "library/api.hpp"
namespace omnitrace
{
namespace component
{
void
omnitrace_component::start()
omnitrace::start()
{
if(m_prefix) omnitrace_push_trace(m_prefix);
}
void
omnitrace_component::stop()
omnitrace::stop()
{
if(m_prefix) omnitrace_pop_trace(m_prefix);
}
void
omnitrace_component::set_prefix(const char* _prefix)
omnitrace::set_prefix(const char* _prefix)
{
m_prefix = _prefix;
}
} // namespace component
} // namespace omnitrace
TIMEMORY_INITIALIZE_STORAGE(omnitrace_component)
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::omnitrace)
+151
Просмотреть файл
@@ -0,0 +1,151 @@
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// with the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
// sell copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// * Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimers.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimers in the
// documentation and/or other materials provided with the distribution.
//
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this Software without specific prior written permission.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#include "library/components/pthread_gotcha.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/sampling.hpp"
#include <timemory/sampling/allocator.hpp>
#include <timemory/utility/types.hpp>
#include <pthread.h>
namespace omnitrace
{
namespace sampling
{
std::set<int>
setup();
std::set<int>
shutdown();
} // namespace sampling
pthread_gotcha::wrapper::wrapper(routine_t _routine, void* _arg, bool _enable_sampling,
promise_t* _p)
: m_enable_sampling{ _enable_sampling }
, m_routine{ _routine }
, m_arg{ _arg }
, m_promise{ _p }
{}
void*
pthread_gotcha::wrapper::operator()() const
{
std::set<int> _signals{};
auto& _enable_sampling = pthread_gotcha::enable_sampling_on_child_threads();
if(m_enable_sampling && _enable_sampling)
{
_enable_sampling = false;
_signals = sampling::setup();
_enable_sampling = true;
sampling::unblock_signals();
}
if(m_promise) m_promise->set_value();
// execute the original function
auto* _ret = m_routine(m_arg);
if(m_enable_sampling && _enable_sampling)
{
sampling::block_signals(_signals);
sampling::shutdown();
}
return _ret;
}
void*
pthread_gotcha::wrapper::wrap(void* _arg)
{
if(_arg == nullptr) return nullptr;
// convert the argument
wrapper* _wrapper = static_cast<wrapper*>(_arg);
// execute the original function
return (*_wrapper)();
}
void
pthread_gotcha::configure()
{
pthread_gotcha_t::get_initializer() = []() {
TIMEMORY_C_GOTCHA(pthread_gotcha_t, 0, pthread_create);
};
}
bool&
pthread_gotcha::enable_sampling_on_child_threads()
{
static thread_local bool _v = get_use_sampling();
return _v;
}
// pthread_create
int
pthread_gotcha::operator()(pthread_t* thread, const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg) const
{
auto _enable_sampling = enable_sampling_on_child_threads();
if(!_enable_sampling)
{
auto* _obj = new wrapper(start_routine, arg, _enable_sampling, nullptr);
// create the thread
return pthread_create(thread, attr, &wrapper::wrap, static_cast<void*>(_obj));
}
// block the signals in entire process
OMNITRACE_DEBUG("blocking signals...\n");
tim::sampling::block_signals({ SIGALRM, SIGPROF },
tim::sampling::sigmask_scope::process);
// promise set by thread when signal handler is configured
auto _promise = std::promise<void>{};
auto _fut = _promise.get_future();
auto* _obj = new wrapper(start_routine, arg, _enable_sampling, &_promise);
// create the thread
auto _ret = pthread_create(thread, attr, &wrapper::wrap, static_cast<void*>(_obj));
// wait for thread to set promise
OMNITRACE_DEBUG("waiting for child to signal it is setup...\n");
_fut.wait();
// unblock the signals in the entire process
OMNITRACE_DEBUG("unblocking signals...\n");
tim::sampling::unblock_signals({ SIGALRM, SIGPROF },
tim::sampling::sigmask_scope::process);
OMNITRACE_DEBUG("returning success...\n");
return _ret;
}
} // namespace omnitrace
@@ -26,12 +26,14 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#include "library/roctracer.hpp"
#include "library/components/roctracer.hpp"
#include "library/components/roctracer_callbacks.hpp"
#include "library/config.hpp"
#include "library/defines.hpp"
#include "library/roctracer_callbacks.hpp"
#include "library/thread_data.hpp"
using namespace omnitrace;
namespace tim
{
namespace component
@@ -26,7 +26,7 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#include "library/roctracer_callbacks.hpp"
#include "library/components/roctracer_callbacks.hpp"
#include "library.hpp"
#include "library/config.hpp"
#include "library/critical_trace.hpp"
@@ -36,6 +36,8 @@
#include <cstdint>
TIMEMORY_DEFINE_API(roctracer)
namespace omnitrace
{
namespace api = tim::api;
std::unordered_set<uint64_t>&
@@ -364,9 +366,17 @@ hip_api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void*
}
if(get_use_timemory())
{
get_roctracer_hip_data()->emplace(
data->correlation_id,
roctracer_bundle_t{ op_name, quirk::config<quirk::auto_start>{} });
auto itr = get_roctracer_hip_data()->emplace(data->correlation_id,
roctracer_bundle_t{ op_name });
if(itr.second)
{
itr.first->second.start();
}
else if(itr.first != get_roctracer_hip_data()->end())
{
itr.first->second.stop();
get_roctracer_hip_data()->erase(itr.first);
}
}
if(get_use_critical_trace())
{
@@ -403,7 +413,7 @@ hip_api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void*
auto itr = _data->find(data->correlation_id);
if(itr != get_roctracer_hip_data()->end())
{
itr->second.stop().pop();
itr->second.stop();
_data->erase(itr);
return true;
}
@@ -597,3 +607,4 @@ roctracer_tear_down_routines()
static auto _v = roctracer_functions_t{};
return _v;
}
} // namespace omnitrace
+198 -69
Просмотреть файл
@@ -28,11 +28,16 @@
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
#include "library/thread_data.hpp"
#include "timemory/backends/dmp.hpp"
#include "timemory/backends/process.hpp"
#include "timemory/settings/types.hpp"
#include "timemory/utility/argparse.hpp"
#include <timemory/backends/dmp.hpp>
#include <timemory/backends/mpi.hpp>
#include <timemory/backends/process.hpp>
#include <timemory/environment.hpp>
#include <timemory/settings.hpp>
#include <timemory/settings/types.hpp>
#include <timemory/utility/argparse.hpp>
#include <array>
#include <cstdint>
@@ -40,9 +45,9 @@
#include <numeric>
#include <ostream>
#include <string>
#include <timemory/environment.hpp>
#include <timemory/settings.hpp>
namespace omnitrace
{
using settings = tim::settings;
namespace
@@ -55,9 +60,10 @@ get_config()
(void) _once;
}
#define OMNITRACE_CONFIG_SETTING(TYPE, ENV_NAME, DESCRIPTION, INITIAL_VALUE) \
_config->insert<TYPE, TYPE>(ENV_NAME, ENV_NAME, DESCRIPTION, INITIAL_VALUE, \
std::vector<std::string>{})
#define OMNITRACE_CONFIG_SETTING(TYPE, ENV_NAME, DESCRIPTION, INITIAL_VALUE, ...) \
_config->insert<TYPE, TYPE>( \
ENV_NAME, ENV_NAME, DESCRIPTION, INITIAL_VALUE, \
std::set<std::string>{ "custom", "omnitrace", __VA_ARGS__ })
} // namespace
void
@@ -81,28 +87,49 @@ configure_settings()
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_CONFIG_FILE",
"Configuration file of omnitrace and timemory settings",
_default_config_file);
_default_config_file, "config");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_DEBUG", "Enable debugging output",
_config->get_debug());
_config->get_debug(), "debugging");
auto _omnitrace_debug = _config->get<bool>("OMNITRACE_DEBUG");
if(_omnitrace_debug) tim::set_env("TIMEMORY_DEBUG_SETTINGS", "1", 0);
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_USE_PERFETTO", "Enable perfetto backend",
_default_perfetto_v);
_default_perfetto_v, "backend", "perfetto");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_USE_TIMEMORY", "Enable timemory backend",
!_config->get<bool>("OMNITRACE_USE_PERFETTO"));
!_config->get<bool>("OMNITRACE_USE_PERFETTO"), "backend",
"timemory");
#if defined(OMNITRACE_USE_ROCTRACER)
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_USE_ROCTRACER", "Enable ROCM tracing", true,
"backend", "roctracer");
#endif
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_USE_SAMPLING",
"Enable statistical sampling of call-stack", false,
"backend", "sampling");
OMNITRACE_CONFIG_SETTING(
bool, "OMNITRACE_USE_PID",
"Enable tagging filenames with process identifier (either MPI rank or pid)",
true);
"Enable tagging filenames with process identifier (either MPI rank or pid)", true,
"io");
OMNITRACE_CONFIG_SETTING(size_t, "OMNITRACE_INSTRUMENTATION_INTERVAL",
"Instrumentation only takes measurements once every N "
"function calls (not statistical)",
1, "instrumentation");
OMNITRACE_CONFIG_SETTING(
size_t, "OMNITRACE_SAMPLE_RATE",
"Counts every function call (N), only record function if (N % <VALUE> == 0)", 1);
double, "OMNITRACE_SAMPLING_FREQ",
"Number of software interrupts per second when OMNITTRACE_USE_SAMPLING=ON", 10.0,
"sampling");
OMNITRACE_CONFIG_SETTING(
double, "OMNITRACE_SAMPLING_DELAY",
"Number of seconds to delay activating the statistical sampling", 0.05,
"sampling");
auto _backend = tim::get_env_choice<std::string>(
"OMNITRACE_BACKEND",
@@ -114,71 +141,84 @@ configure_settings()
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_BACKEND",
"Specify the perfetto backend to activate. Options are: "
"'inprocess', 'system', or 'all'",
_backend);
_backend, "perfetto");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_CRITICAL_TRACE",
"Enable generation of the critical trace", false);
"Enable generation of the critical trace", false, "feature");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_FLAT_SAMPLING",
"Ignore hierarchy in all statistical sampling entries",
_config->get_flat_profile(), "sampling", "data_layout");
OMNITRACE_CONFIG_SETTING(
bool, "OMNITRACE_ROCTRACER_TIMELINE_PROFILE",
"Create unique entries for every kernel with timemory backend",
_config->get_timeline_profile());
bool, "OMNITRACE_TIMELINE_SAMPLING",
"Create unique entries for every sample when statistical sampling is enabled",
_config->get_timeline_profile(), "sampling", "data_layout");
OMNITRACE_CONFIG_SETTING(
bool, "OMNITRACE_ROCTRACER_FLAT_PROFILE",
"Ignore hierarchy in all kernels entries with timemory backend",
_config->get_flat_profile());
_config->get_flat_profile(), "roctracer", "data_layout");
OMNITRACE_CONFIG_SETTING(
bool, "OMNITRACE_ROCTRACER_TIMELINE_PROFILE",
"Create unique entries for every kernel with timemory backend",
_config->get_timeline_profile(), "roctracer", "data_layout");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_ROCTRACER_HSA_ACTIVITY",
"Enable HSA activity tracing support", false);
"Enable HSA activity tracing support", false, "roctracer");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_ROCTRACER_HSA_API",
"Enable HSA API tracing support", false);
"Enable HSA API tracing support", false, "roctracer");
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_ROCTRACER_HSA_API_TYPES",
"HSA API type to collect", "");
"HSA API type to collect", "", "roctracer");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_CRITICAL_TRACE_DEBUG",
"Enable debugging for critical trace", _omnitrace_debug);
"Enable debugging for critical trace", _omnitrace_debug,
"debugging");
OMNITRACE_CONFIG_SETTING(
bool, "OMNITRACE_CRITICAL_TRACE_SERIALIZE_NAMES",
"Include names in serialization of critical trace (mainly for debugging)",
_omnitrace_debug);
_omnitrace_debug, "debugging");
OMNITRACE_CONFIG_SETTING(size_t, "OMNITRACE_SHMEM_SIZE_HINT_KB",
"Hint for shared-memory buffer size in perfetto (in KB)",
40960);
40960, "perfetto", "data");
OMNITRACE_CONFIG_SETTING(size_t, "OMNITRACE_BUFFER_SIZE_KB",
"Size of perfetto buffer (in KB)", 1024000);
"Size of perfetto buffer (in KB)", 1024000, "perfetto",
"data");
OMNITRACE_CONFIG_SETTING(int64_t, "OMNITRACE_CRITICAL_TRACE_COUNT",
"Number of critical trace to export (0 == all)", 0);
"Number of critical trace to export (0 == all)", 0, "data");
OMNITRACE_CONFIG_SETTING(uint64_t, "OMNITRACE_CRITICAL_TRACE_BUFFER_COUNT",
"Number of critical trace records to store in thread-local "
"memory before submitting to shared buffer",
2000);
2000, "data");
OMNITRACE_CONFIG_SETTING(
uint64_t, "OMNITRACE_CRITICAL_TRACE_NUM_THREADS",
"Number of threads to use when generating the critical trace",
std::min<uint64_t>(8, std::thread::hardware_concurrency()));
std::min<uint64_t>(8, std::thread::hardware_concurrency()), "parallelism");
OMNITRACE_CONFIG_SETTING(
int64_t, "OMNITRACE_CRITICAL_TRACE_PER_ROW",
"How many critical traces per row in perfetto (0 == all in one row)", 0);
"How many critical traces per row in perfetto (0 == all in one row)", 0, "io");
OMNITRACE_CONFIG_SETTING(
std::string, "OMNITRACE_COMPONENTS",
"List of components to collect via timemory (see timemory-avail)", "wall_clock");
std::string, "OMNITRACE_TIMEMORY_COMPONENTS",
"List of components to collect via timemory (see timemory-avail)", "wall_clock",
"timemory", "component");
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_OUTPUT_FILE", "Perfetto filename",
"");
"", "perfetto", "io");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_SETTINGS_DESC",
"Provide descriptions when printing settings", false);
"Provide descriptions when printing settings", false,
"debugging");
_config->get_flamegraph_output() = false;
_config->get_cout_output() = false;
@@ -199,7 +239,8 @@ configure_settings()
_config->read(itr);
}
_config->get_global_components() = _config->get<std::string>("OMNITRACE_COMPONENTS");
_config->get_global_components() =
_config->get<std::string>("OMNITRACE_TIMEMORY_COMPONENTS");
// always initialize timemory because gotcha wrappers are always used
auto _cmd = tim::read_command_line(process::get_id());
@@ -225,14 +266,23 @@ configure_settings()
settings::suppress_parsing() = true;
settings::suppress_config() = true;
settings::use_output_suffix() = _config->get<bool>("OMNITRACE_USE_PID");
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
if(tim::mpi::is_initialized()) settings::default_process_suffix() = tim::mpi::rank();
#endif
OMNITRACE_CONDITIONAL_BASIC_PRINT(true, "configuration complete\n");
}
void
print_config_settings(std::ostream& _os,
std::function<bool(const std::string_view&)>&& _filter)
print_config_settings(
std::ostream& _os,
std::function<bool(const std::string_view&, const std::set<std::string>&)>&& _filter)
{
OMNITRACE_CONDITIONAL_BASIC_PRINT(true, "configuration:\n");
auto _flags = _os.flags();
bool _md = tim::get_env<bool>("OMNITRACE_SETTINGS_DESC_MARKDOWN", false);
constexpr size_t nfields = 3;
using str_array_t = std::array<std::string, nfields>;
std::vector<str_array_t> _data{};
@@ -240,14 +290,17 @@ print_config_settings(std::ostream& _os,
_widths.fill(0);
for(const auto& itr : *get_config())
{
if(_filter(itr.first))
if(_filter(itr.first, itr.second->get_categories()))
{
auto _disp = itr.second->get_display(std::ios::boolalpha);
_data.emplace_back(str_array_t{ _disp.at("name"), _disp.at("value"),
_data.emplace_back(str_array_t{ _disp.at("env_name"), _disp.at("value"),
_disp.at("description") });
for(size_t i = 0; i < nfields; ++i)
_widths.at(i) =
std::max<size_t>(_widths.at(i), _data.back().at(i).length());
{
size_t _wextra = (_md && i < 2) ? 2 : 0;
_widths.at(i) = std::max<size_t>(_widths.at(i),
_data.back().at(i).length() + _wextra);
}
}
}
@@ -261,10 +314,8 @@ print_config_settings(std::ostream& _os,
auto _rhs_use = rhs.at(0).find("OMNITRACE_USE_");
if(_lhs_use != _rhs_use && _lhs_use < _rhs_use) return true;
if(_lhs_use != _rhs_use && _lhs_use > _rhs_use) return false;
// length sort followed by alphabetical sort
return (lhs.at(0).length() == rhs.at(0).length())
? (lhs.at(0) < rhs.at(0))
: (lhs.at(0).length() < rhs.at(0).length());
// alphabetical sort
return lhs.at(0) < rhs.at(0);
});
bool _print_desc = get_debug() || get_config()->get<bool>("OMNITRACE_SETTINGS_DESC");
@@ -272,15 +323,20 @@ print_config_settings(std::ostream& _os,
auto tot_width = std::accumulate(_widths.begin(), _widths.end(), 0);
if(!_print_desc) tot_width -= _widths.back() + 4;
size_t _spacer_extra = 9;
if(!_md)
_spacer_extra += 2;
else if(_md && _print_desc)
_spacer_extra -= 1;
std::stringstream _spacer{};
_spacer.fill('-');
_spacer << "#" << std::setw(tot_width + 11) << ""
_spacer << "#" << std::setw(tot_width + _spacer_extra) << ""
<< "#";
_os << _spacer.str() << "\n";
// _os << "# Omnitrace settings:" << std::setw(tot_width - 8) << "#" << "\n";
// _os << "# api::omnitrace settings:" << std::setw(tot_width - 8) << "#" << "\n";
for(const auto& itr : _data)
{
_os << "# ";
_os << ((_md) ? "| " : "# ");
for(size_t i = 0; i < nfields; ++i)
{
switch(i)
@@ -289,16 +345,28 @@ print_config_settings(std::ostream& _os,
case 1: _os << std::left; break;
case 2: _os << std::left; break;
}
_os << std::setw(_widths.at(i)) << itr.at(i) << " ";
if(!_print_desc && i == 1) break;
switch(i)
if(_md)
{
case 0: _os << "= "; break;
case 1: _os << "[ "; break;
case 2: _os << "]"; break;
std::stringstream _ss{};
_ss.setf(_os.flags());
std::string _extra = (i < 2) ? "`" : "";
_ss << _extra << itr.at(i) << _extra;
_os << std::setw(_widths.at(i)) << _ss.str() << " | ";
if(!_print_desc && i == 1) break;
}
else
{
_os << std::setw(_widths.at(i)) << itr.at(i) << " ";
if(!_print_desc && i == 1) break;
switch(i)
{
case 0: _os << "= "; break;
case 1: _os << "[ "; break;
case 2: _os << "]"; break;
}
}
}
_os << " #\n";
_os << ((_md) ? "\n" : " #\n");
}
_os << _spacer.str() << "\n";
@@ -319,6 +387,12 @@ get_config_file()
return static_cast<tim::tsettings<std::string>&>(*_v->second).get();
}
bool
get_debug_env()
{
return tim::get_env<bool>("OMNITRACE_DEBUG", false);
}
bool
get_debug()
{
@@ -326,20 +400,47 @@ get_debug()
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
}
bool
bool&
get_use_perfetto()
{
static auto _v = get_config()->find("OMNITRACE_USE_PERFETTO");
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
}
bool
bool&
get_use_timemory()
{
static auto _v = get_config()->find("OMNITRACE_USE_TIMEMORY");
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
}
bool&
get_use_roctracer()
{
#if defined(OMNITRACE_USE_ROCTRACER)
static auto _v = get_config()->find("OMNITRACE_USE_ROCTRACER");
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
#else
static auto _v = false;
return _v;
#endif
}
bool&
get_use_sampling()
{
#if defined(TIMEMORY_USE_LIBUNWIND)
static auto _v = get_config()->find("OMNITRACE_USE_SAMPLING");
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
#else
static bool _v = false;
if(_v)
throw std::runtime_error("Error! sampling was enabled but omnitrace was not "
"built with libunwind support");
return _v;
#endif
}
bool&
get_use_pid()
{
@@ -347,14 +448,14 @@ get_use_pid()
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
}
bool
bool&
get_use_mpip()
{
static bool _v = tim::get_env("OMNITRACE_USE_MPIP", false, false);
return _v;
}
bool
bool&
get_use_critical_trace()
{
static auto _v = get_config()->find("OMNITRACE_CRITICAL_TRACE");
@@ -375,6 +476,20 @@ get_critical_trace_serialize_names()
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
}
bool
get_timeline_sampling()
{
static auto _v = get_config()->find("OMNITRACE_TIMELINE_SAMPLING");
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
}
bool
get_flat_sampling()
{
static auto _v = get_config()->find("OMNITRACE_FLAT_SAMPLING");
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
}
bool
get_roctracer_timeline_profile()
{
@@ -456,7 +571,7 @@ get_backend()
return static_cast<tim::tsettings<std::string>&>(*_v->second).get();
}
std::string
std::string&
get_perfetto_output_filename()
{
static auto _v = get_config()->find("OMNITRACE_OUTPUT_FILE");
@@ -464,9 +579,8 @@ get_perfetto_output_filename()
if(_t.get().empty())
{
// default name: perfetto-trace.<pid>.proto or perfetto-trace.<rank>.proto
auto _default_fname = settings::compose_output_filename(
"perfetto-trace", "proto", get_use_pid(),
(tim::dmp::is_initialized()) ? tim::dmp::rank() : process::get_id());
auto _default_fname =
settings::compose_output_filename("perfetto-trace", "proto", get_use_pid());
auto _pid_patch = std::string{ "/" } + std::to_string(tim::process::get_id()) +
"-perfetto-trace";
auto _dpos = _default_fname.find(_pid_patch);
@@ -483,12 +597,26 @@ get_perfetto_output_filename()
}
size_t&
get_sample_rate()
get_instrumentation_interval()
{
static auto _v = get_config()->find("OMNITRACE_SAMPLE_RATE");
static auto _v = get_config()->find("OMNITRACE_INSTRUMENTATION_INTERVAL");
return static_cast<tim::tsettings<size_t>&>(*_v->second).get();
}
double&
get_sampling_freq()
{
static auto _v = get_config()->find("OMNITRACE_SAMPLING_FREQ");
return static_cast<tim::tsettings<double>&>(*_v->second).get();
}
double&
get_sampling_delay()
{
static auto _v = get_config()->find("OMNITRACE_SAMPLING_DELAY");
return static_cast<tim::tsettings<double>&>(*_v->second).get();
}
int64_t
get_critical_trace_count()
{
@@ -526,3 +654,4 @@ get_cpu_cid_stack(int64_t _tid)
return _v.at(_tid);
(void) _v_check;
}
} // namespace omnitrace
+15 -10
Просмотреть файл
@@ -27,18 +27,20 @@
// THE SOFTWARE.
#include "library/critical_trace.hpp"
#include "PTL/ThreadPool.hh"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
#include "library/perfetto.hpp"
#include "library/ptl.hpp"
#include "timemory/backends/dmp.hpp"
#include "timemory/hash/types.hpp"
#include "timemory/tpls/cereal/cereal/archives/json.hpp"
#include "timemory/tpls/cereal/cereal/cereal.hpp"
#include "timemory/utility/macros.hpp"
#include "timemory/utility/types.hpp"
#include "timemory/utility/utility.hpp"
#include <PTL/ThreadPool.hh>
#include <timemory/backends/dmp.hpp>
#include <timemory/hash/types.hpp>
#include <timemory/tpls/cereal/cereal/archives/json.hpp>
#include <timemory/tpls/cereal/cereal/cereal.hpp>
#include <timemory/utility/macros.hpp>
#include <timemory/utility/types.hpp>
#include <timemory/utility/utility.hpp>
#include <cctype>
#include <cstdint>
@@ -47,6 +49,8 @@
#include <stdexcept>
#include <utility>
namespace omnitrace
{
namespace critical_trace
{
namespace
@@ -1165,7 +1169,7 @@ compute_critical_trace()
try
{
PTL::ThreadPool _tp{ get_critical_trace_num_threads(), []() { copy_hash_ids(); },
false };
[]() {} };
_tp.set_verbose(-1);
PTL::TaskGroup<void> _tg{ &_tp };
@@ -1191,7 +1195,7 @@ compute_critical_trace()
OMNITRACE_CT_DEBUG("%s\n", JOIN("", _perf).c_str());
_tg.run(
[](call_chain _chain, std::string _func) {
[](call_chain _chain, std::string _func) { // NOLINT
save_call_chain_json(tim::settings::compose_output_filename(
"call-chain", ".json", get_use_pid(),
(tim::dmp::is_initialized())
@@ -1298,3 +1302,4 @@ compute_critical_trace()
}
} // namespace
} // namespace critical_trace
} // namespace omnitrace
+51
Просмотреть файл
@@ -0,0 +1,51 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library/gpu.hpp"
#if defined(OMNITRACE_USE_HIP)
# if !defined(TIMEMORY_USE_HIP)
# define TIMEMORY_USE_HIP 1
# endif
# include "timemory/components/hip/backends.hpp"
#endif
namespace omnitrace
{
namespace gpu
{
#if defined(OMNITRACE_USE_HIP)
int
device_count()
{
return ::tim::hip::device_count();
}
#else
int
device_count()
{
return 0;
}
#endif
} // namespace gpu
} // namespace omnitrace
+26 -7
Просмотреть файл
@@ -27,9 +27,32 @@
// THE SOFTWARE.
#include "library/ptl.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
#include <PTL/ThreadPool.hh>
#include <timemory/utility/declaration.hpp>
namespace omnitrace
{
namespace tasking
{
namespace
{
auto _thread_pool_cfg = []() {
PTL::ThreadPool::Config _v{};
_v.init = true;
_v.use_affinity = false;
_v.use_tbb = false;
_v.initializer = []() {};
_v.finalizer = []() {};
_v.priority = 5;
_v.pool_size = 1;
return _v;
}();
}
std::mutex&
get_roctracer_mutex()
{
@@ -40,7 +63,7 @@ get_roctracer_mutex()
PTL::ThreadPool&
get_roctracer_thread_pool()
{
static auto _v = PTL::ThreadPool{ 1 };
static auto _v = PTL::ThreadPool{ _thread_pool_cfg };
return _v;
}
@@ -61,7 +84,7 @@ get_critical_trace_mutex()
PTL::ThreadPool&
get_critical_trace_thread_pool()
{
static auto _v = PTL::ThreadPool{ 1 };
static auto _v = PTL::ThreadPool{ _thread_pool_cfg };
return _v;
}
@@ -72,9 +95,5 @@ get_critical_trace_task_group()
return _v;
}
namespace
{
bool _ptl_initialized =
(get_roctracer_thread_pool(), get_critical_trace_thread_pool(), true);
}
} // namespace tasking
} // namespace omnitrace
+219
Просмотреть файл
@@ -0,0 +1,219 @@
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// with the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
// sell copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// * Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimers.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimers in the
// documentation and/or other materials provided with the distribution.
//
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this Software without specific prior written permission.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
// THE SOFTWARE.
#include "library/sampling.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/ptl.hpp"
#include <timemory/backends/papi.hpp>
#include <timemory/backends/threading.hpp>
#include <timemory/components/data_tracker/components.hpp>
#include <timemory/components/macros.hpp>
#include <timemory/components/papi/extern.hpp>
#include <timemory/components/papi/papi_array.hpp>
#include <timemory/components/papi/papi_vector.hpp>
#include <timemory/components/timing/backends.hpp>
#include <timemory/components/trip_count/extern.hpp>
#include <timemory/macros.hpp>
#include <timemory/math.hpp>
#include <timemory/mpl.hpp>
#include <timemory/mpl/quirks.hpp>
#include <timemory/mpl/type_traits.hpp>
#include <timemory/operations.hpp>
#include <timemory/sampling/allocator.hpp>
#include <timemory/sampling/sampler.hpp>
#include <timemory/storage.hpp>
#include <timemory/utility/backtrace.hpp>
#include <timemory/utility/demangle.hpp>
#include <timemory/utility/types.hpp>
#include <timemory/variadic.hpp>
#include <array>
#include <cstring>
#include <ctime>
#include <initializer_list>
#include <mutex>
#include <regex>
#include <sstream>
#include <string>
#include <type_traits>
#include <pthread.h>
#include <signal.h>
namespace omnitrace
{
namespace sampling
{
using bundle_t = tim::lightweight_tuple<backtrace>;
using sampler_t = tim::sampling::sampler<bundle_t, tim::sampling::dynamic>;
} // namespace sampling
} // namespace omnitrace
TIMEMORY_DEFINE_CONCRETE_TRAIT(check_signals, omnitrace::sampling::sampler_t,
std::false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(buffer_size, omnitrace::sampling::sampler_t,
TIMEMORY_ESC(std::integral_constant<size_t, 256>))
namespace omnitrace
{
namespace sampling
{
using signal_type_instances = omnitrace_thread_data<std::set<int>, api::sampling>;
using backtrace_init_instances = omnitrace_thread_data<backtrace, api::sampling>;
using sampler_running_instances = omnitrace_thread_data<bool, api::sampling>;
using papi_vector_instances = omnitrace_thread_data<comp::papi_vector, api::sampling>;
namespace
{
std::unique_ptr<comp::papi_vector>&
get_papi_vector(int64_t _tid)
{
static auto& _v = papi_vector_instances::instances();
if(_tid == threading::get_id()) papi_vector_instances::construct();
return _v.at(_tid);
}
std::unique_ptr<backtrace>&
get_backtrace_init(int64_t _tid)
{
static auto& _v = backtrace_init_instances::instances();
return _v.at(_tid);
}
std::unique_ptr<bool>&
get_sampler_running(int64_t _tid)
{
static auto& _v = sampler_running_instances::instances();
return _v.at(_tid);
}
std::unique_ptr<std::set<int>>&
get_signal_types(int64_t _tid)
{
static auto& _v = signal_type_instances::instances();
// on the main thread, use both SIGALRM and SIGPROF.
// on secondary threads, only use SIGPROF.
signal_type_instances::construct((_tid == 0) ? std::set<int>{ SIGALRM, SIGPROF }
: std::set<int>{ SIGPROF });
return _v.at(_tid);
}
template <typename... Args>
void
thread_sigmask(Args... _args)
{
auto _err = pthread_sigmask(_args...);
if(_err != 0)
{
errno = _err;
perror("pthread_sigmask");
exit(EXIT_FAILURE);
}
}
template <typename Tp>
sigset_t
get_signal_set(Tp&& _v)
{
sigset_t _sigset;
sigemptyset(&_sigset);
for(auto itr : _v)
sigaddset(&_sigset, itr);
return _sigset;
}
template <typename Tp>
std::string
get_signal_names(Tp&& _v)
{
std::string _sig_names{};
for(auto&& itr : _v)
_sig_names += std::get<0>(tim::signal_settings::get_info(
static_cast<tim::sys_signal>(itr))) +
" ";
return _sig_names.substr(0, _sig_names.length() - 1);
}
} // namespace
std::set<int>
setup()
{
return backtrace::configure(true);
}
std::set<int>
shutdown()
{
return backtrace::configure(false);
}
void
block_signals(std::set<int> _signals)
{
if(_signals.empty()) _signals = *get_signal_types(threading::get_id());
if(_signals.empty())
{
OMNITRACE_PRINT("No signals to block...\n");
return;
}
OMNITRACE_DEBUG("Blocking signals [%s] on thread #%lu...\n",
get_signal_names(_signals).c_str(), threading::get_id());
sigset_t _v = get_signal_set(_signals);
thread_sigmask(SIG_BLOCK, &_v, nullptr);
}
void
unblock_signals(std::set<int> _signals)
{
if(_signals.empty()) _signals = *get_signal_types(threading::get_id());
if(_signals.empty())
{
OMNITRACE_PRINT("No signals to unblock...\n");
return;
}
OMNITRACE_DEBUG("Unblocking signals [%s] on thread #%lu...\n",
get_signal_names(_signals).c_str(), threading::get_id());
sigset_t _v = get_signal_set(_signals);
thread_sigmask(SIG_UNBLOCK, &_v, nullptr);
}
std::unique_ptr<sampler_t>&
get_sampler(int64_t _tid)
{
static auto& _v = sampler_instances::instances();
return _v.at(_tid);
}
} // namespace sampling
} // namespace omnitrace
+3
Просмотреть файл
@@ -28,9 +28,12 @@
#include "library/thread_data.hpp"
namespace omnitrace
{
instrumentation_bundles::instance_array_t&
instrumentation_bundles::instances()
{
static auto _v = instance_array_t{};
return _v;
}
} // namespace omnitrace
+2
Просмотреть файл
@@ -28,4 +28,6 @@
#include "library/timemory.hpp"
using namespace omnitrace;
TIMEMORY_INITIALIZE_STORAGE(comp::wall_clock, comp::user_global_bundle)
Разница между файлами не показана из-за своего большого размера Загрузить разницу
+489 -30
Просмотреть файл
@@ -28,6 +28,465 @@
#include "omnitrace.hpp"
static int expect_error = NO_ERROR;
static int error_print = 0;
static auto regex_opts = std::regex_constants::egrep | std::regex_constants::optimize;
// set of whole function names to exclude
strset_t
get_whole_function_names()
{
return strset_t{ "a64l",
"advance",
"aio_return",
"aio_return64",
"argp_error",
"argp_failure",
"argp_help",
"argp_parse",
"argp_state_help",
"argp_usage",
"argz_add",
"argz_add_sep",
"argz_append",
"argz_count",
"argz_create",
"argz_create_sep",
"argz_delete",
"argz_extract",
"argz_insert",
"argz_next",
"argz_replace",
"argz_stringify",
"atexit",
"atof",
"atoi",
"atol",
"atoll",
"atomic_flag_clear_explicit",
"atomic_flag_test_and_set_explicit",
"authdes_create",
"authdes_getucred",
"authdes_pk_create",
"authnone_create",
"authunix_create",
"authunix_create_default",
"backtrace",
"backtrace_symbols",
"backtrace_symbols_fd",
"bindresvport",
"bindtextdomain",
"bind_textdomain_codeset",
"bsearch",
"btowc",
"c16rtomb",
"callrpc",
"canonicalize_file_name",
"catclose",
"catgets",
"catopen",
"cfmakeraw",
"cfsetspeed",
"chflags",
"clearerr",
"clearerr_unlocked",
"clnt_broadcast",
"clnt_create",
"clnt_pcreateerror",
"clnt_perrno",
"clnt_perror",
"clntraw_create",
"clnt_spcreateerror",
"clnt_sperrno",
"clnt_sperror",
"clnttcp_create",
"clntudp_bufcreate",
"clntudp_create",
"clntunix_create",
"confstr",
"daemon",
"des_setparity",
"div",
"dysize",
"endutxent",
"envz_add",
"envz_entry",
"envz_get",
"envz_merge",
"envz_remove",
"envz_strip",
"ether_aton",
"ether_hostton",
"ether_line",
"ether_ntoa",
"ether_ntohost",
"execl",
"execle",
"execlp",
"execv",
"execvp",
"execvpe",
"explicit_bzero",
"fattach",
"fclose",
"fdetach",
"fdopen",
"feof_unlocked",
"ferror_unlocked",
"fflush",
"fflush_unlocked",
"fgetpos",
"fgets",
"fgets_unlocked",
"fgetws",
"fgetws_unlocked",
"_fini",
"fini",
"fmemopen",
"fopen",
"fopen64",
"fopencookie",
"fork",
"fork_alias",
"fork_compat",
"fputc_unlocked",
"fputs",
"fputs_unlocked",
"fputwc_unlocked",
"fputws",
"fputws_unlocked",
"fread",
"fread_unlocked",
"fsetpos",
"fsetpos64",
"ftell",
"fwrite",
"fwrite_unlocked",
"getdelim",
"getgrouplist",
"gethostbyname2",
"getmntent",
"getmsg",
"getnetname",
"getopt_long",
"getopt_long_only",
"getpmsg",
"getpublickey",
"gets",
"getsecretkey",
"glob_pattern_p",
"gnu_dev_major",
"gnu_dev_makedev",
"gnu_dev_minor",
"gnu_get_libc_release",
"gnu_get_libc_version",
"group_member",
"gtty",
"hcreate",
"hdestroy",
"herror",
"host2netname",
"hsearch",
"hstrerror",
"htons",
"iconv",
"iconv_close",
"iconv_open",
"inet6_opt_append",
"inet6_opt_find",
"inet6_opt_finish",
"inet6_opt_get_val",
"inet6_opt_init",
"inet6_option_alloc",
"inet6_option_append",
"inet6_option_find",
"inet6_option_init",
"inet6_option_next",
"inet6_option_space",
"inet6_opt_next",
"inet6_opt_set_val",
"inet6_rth_add",
"inet6_rth_getaddr",
"inet6_rth_init",
"inet6_rth_reverse",
"inet6_rth_segments",
"inet6_rth_space",
"inet_addr",
"inet_aton",
"inet_lnaof",
"inet_makeaddr",
"inet_netof",
"inet_network",
"inet_nsap_addr",
"inet_nsap_ntoa",
"inet_ntoa",
"inet_ntop",
"inet_pton",
"_init",
"init",
"initgroups",
"initstate",
"insque",
"iruserok",
"iruserok_af",
"key_decryptsession",
"key_decryptsession_pk",
"key_encryptsession",
"key_encryptsession_pk",
"key_gendes",
"key_get_conv",
"key_secretkey_is_set",
"key_setnet",
"key_setsecret",
"l64a",
"lchmod",
"lckpwdf",
"lfind",
"llabs",
"lldiv",
"localeconv",
"lockf",
"lsearch",
"mbrtoc16",
"mbrtoc32",
"mcheck",
"mcheck_check_all",
"mcheck_pedantic",
"mkdtemp",
"mkdtemp64",
"mkostemp",
"mkostemp64",
"mkostemps",
"mkostemps64",
"mkstemp",
"mkstemp64",
"mkstemps",
"mkstemps64",
"mktemp",
"mktemp64",
"moncontrol",
"monstartup",
"mprobe",
"mtrace",
"muntrace",
"nanosleep",
"netname2host",
"netname2user",
"nl_langinfo",
"nl_langinfo_l",
"ntohs",
"parse_printf_format",
"passwd2des",
"pclose",
"perror",
"pmap_getmaps",
"pmap_getport",
"pmap_rmtcall",
"pmap_set",
"pmap_unset",
"popen",
"printf_size",
"printf_size_info",
"psiginfo",
"psignal",
"putchar",
"putchar_unlocked",
"putc_unlocked",
"putenv",
"putgrent",
"putmsg",
"putpmsg",
"putpwent",
"puts",
"putsgent",
"putspent",
"pututxline",
"putw",
"putwc",
"putwchar",
"putwchar_unlocked",
"putwc_unlocked",
"rcmd",
"rcmd_af",
"reallocarray",
"realpath",
"re_comp",
"re_compile_fastmap",
"re_compile_pattern",
"re_exec",
"regcomp",
"regerror",
"regexec",
"register_printf_modifier",
"register_printf_type",
"registerrpc",
"re_match",
"re_match_2",
"remque",
"re_search",
"re_search_2",
"re_set_registers",
"re_set_syntax",
"revoke",
"rexec",
"rexec_af",
"rpmatch",
"rresvport",
"rresvport_af",
"ruserok",
"ruserok_af",
"ruserpass",
"secure_getenv",
"seed48",
"setbuffer",
"setstate",
"setvbuf",
"sgetsgent",
"sgetspent",
"sigcancel_handler",
"sighandler_setxid",
"sstk",
"step",
"stty",
"svcerr_auth",
"svcerr_decode",
"svcerr_noproc",
"svcerr_noprog",
"svcerr_progvers",
"svcerr_systemerr",
"svcerr_weakauth",
"svc_exit",
"svcfd_create",
"svc_getreq",
"svc_getreq_common",
"svc_getreq_poll",
"svc_getreqset",
"svcraw_create",
"svc_register",
"svc_run",
"svc_sendreply",
"svctcp_create",
"svcudp_bufcreate",
"svcudp_create",
"svcudp_enablecache",
"svcunix_create",
"svcunixfd_create",
"svc_unregister",
"swab",
"tcgetsid",
"tdelete",
"tdestroy",
"tempnam",
"textdomain",
"tfind",
"thrd_create",
"thrd_current",
"thrd_detach",
"thrd_equal",
"thrd_exit",
"thrd_join",
"thrd_sleep",
"thrd_yield",
"tmpnam",
"tolower",
"toupper",
"towctrans",
"towctrans_l",
"tr_break",
"tsearch",
"tss_create",
"tss_delete",
"tss_get",
"tss_set",
"ttyslot",
"twalk",
"twalk_r",
"tzset",
"ulckpwdf",
"ungetc",
"ungetwc",
"unwind_stop",
"updwtmpx",
"user2netname",
"utmpname",
"utmpxname",
"vlimit",
"vtimes",
"wait",
"wait3",
"waitpid",
"wordexp",
"xdecrypt",
"xdr_accepted_reply",
"xdr_array",
"xdr_authdes_cred",
"xdr_authdes_verf",
"xdr_authunix_parms",
"xdr_bool",
"xdr_bytes",
"xdr_callhdr",
"xdr_callmsg",
"xdr_char",
"xdr_cryptkeyarg",
"xdr_cryptkeyarg2",
"xdr_cryptkeyres",
"xdr_des_block",
"xdr_double",
"xdr_enum",
"xdr_float",
"xdr_getcredres",
"xdr_hyper",
"xdr_int",
"xdr_int16_t",
"xdr_int64_t",
"xdr_int8_t",
"xdr_keybuf",
"xdr_key_netstarg",
"xdr_key_netstres",
"xdr_keystatus",
"xdr_longlong_t",
"xdrmem_create",
"xdr_netnamestr",
"xdr_netobj",
"xdr_opaque",
"xdr_opaque_auth",
"xdr_pmap",
"xdr_pmaplist",
"xdr_pointer",
"xdr_quad_t",
"xdrrec_create",
"xdrrec_endofrecord",
"xdrrec_eof",
"xdrrec_skiprecord",
"xdr_reference",
"xdr_rejected_reply",
"xdr_replymsg",
"xdr_rmtcall_args",
"xdr_rmtcallres",
"xdr_short",
"xdr_sizeof",
"xdrstdio_create",
"xdr_string",
"xdr_u_char",
"xdr_u_hyper",
"xdr_u_int",
"xdr_uint16_t",
"xdr_uint64_t",
"xdr_uint8_t",
"xdr_u_long",
"xdr_u_longlong_t",
"xdr_union",
"xdr_unixcred",
"xdr_u_quad_t",
"xdr_u_short",
"xdr_vector",
"xdr_void",
"xdr_wrapstring",
"xencrypt",
"xprt_register",
"xprt_unregister" };
}
//======================================================================================//
//
// For selective instrumentation (unused)
@@ -49,11 +508,14 @@ get_loop_file_line_info(module_t* mutatee_module, procedure_t* f, flow_graph_t*
{
if(!cfGraph || !loopToInstrument || !f) return function_signature{ "", "", "" };
char fname[MUTNAMELEN];
char mname[MUTNAMELEN];
char fname[FUNCNAMELEN + 1];
char mname[FUNCNAMELEN + 1];
std::string typeName = {};
mutatee_module->getName(mname, MUTNAMELEN);
memset(fname, '\0', FUNCNAMELEN + 1);
memset(mname, '\0', FUNCNAMELEN + 1);
mutatee_module->getName(mname, FUNCNAMELEN);
bpvector_t<point_t*>* loopStartInst =
cfGraph->findLoopInstPoints(BPatch_locLoopStartIter, loopToInstrument);
@@ -69,7 +531,7 @@ get_loop_file_line_info(module_t* mutatee_module, procedure_t* f, flow_graph_t*
(unsigned long) loopExitInst->size(), (unsigned long) baseAddr,
(unsigned long) lastAddr);
f->getName(fname, MUTNAMELEN);
f->getName(fname, FUNCNAMELEN);
auto* returnType = f->getReturnType();
@@ -78,11 +540,11 @@ get_loop_file_line_info(module_t* mutatee_module, procedure_t* f, flow_graph_t*
typeName = returnType->getName();
}
auto params = f->getParams();
auto* params = f->getParams();
std::vector<string_t> _params;
if(params)
{
for(auto itr : *params)
for(auto* itr : *params)
{
string_t _name = itr->getType()->getName();
if(_name.empty()) _name = itr->getName();
@@ -147,18 +609,21 @@ get_func_file_line_info(module_t* mutatee_module, procedure_t* f)
{
bool info1, info2;
unsigned long baseAddr, lastAddr;
char fname[MUTNAMELEN];
char mname[MUTNAMELEN];
char fname[FUNCNAMELEN + 1];
char mname[FUNCNAMELEN + 1];
int row1, col1, row2, col2;
string_t filename = {};
string_t typeName = {};
mutatee_module->getName(mname, MUTNAMELEN);
memset(fname, '\0', FUNCNAMELEN + 1);
memset(mname, '\0', FUNCNAMELEN + 1);
mutatee_module->getName(mname, FUNCNAMELEN);
baseAddr = (unsigned long) (f->getBaseAddr());
f->getAddressRange(baseAddr, lastAddr);
bpvector_t<BPatch_statement> lines;
f->getName(fname, MUTNAMELEN);
f->getName(fname, FUNCNAMELEN);
auto* returnType = f->getReturnType();
@@ -167,11 +632,11 @@ get_func_file_line_info(module_t* mutatee_module, procedure_t* f)
typeName = returnType->getName();
}
auto params = f->getParams();
auto* params = f->getParams();
std::vector<string_t> _params;
if(params)
{
for(auto itr : *params)
for(auto* itr : *params)
{
string_t _name = itr->getType()->getName();
if(_name.empty()) _name = itr->getName();
@@ -238,27 +703,34 @@ errorFunc(error_level_t level, int num, const char** params)
// For compatibility purposes
//
procedure_t*
find_function(image_t* app_image, const std::string& _name, strset_t _extra)
find_function(image_t* app_image, const std::string& _name, const strset_t& _extra)
{
if(_name.empty()) return nullptr;
auto _find = [app_image](const string_t& _f) -> procedure_t* {
// Extract the vector of functions
bpvector_t<procedure_t*> _found;
auto ret = app_image->findFunction(_f.c_str(), _found, false, true, true);
auto* ret = app_image->findFunction(_f.c_str(), _found, false, true, true);
if(ret == nullptr || _found.empty()) return nullptr;
return _found.at(0);
};
procedure_t* _func = _find(_name);
auto itr = _extra.begin();
while(!_func && itr != _extra.end())
while(_func == nullptr && itr != _extra.end())
{
_func = _find(*itr);
++itr;
}
if(!_func) verbprintf(2, "omnitrace: Unable to find function %s\n", _name.c_str());
if(!_func)
{
verbprintf(1, "function: '%s' ... not found\n", _name.c_str());
}
else
{
verbprintf(1, "function: '%s' ... found\n", _name.c_str());
}
return _func;
}
@@ -271,7 +743,7 @@ error_func_real(error_level_t level, int num, const char* const* params)
if(num == 0)
{
// conditional reporting of warnings and informational messages
if(error_print)
if(error_print > 0)
{
if(level == BPatchInfo)
{
@@ -440,16 +912,3 @@ c_stdlib_function_constraint(const std::string& _func)
}
//======================================================================================//
//
inline void
consume()
{
consume_parameters(initialize_expr, bpatch, use_mpi, stl_func_instr, werror,
loop_level_instr, error_print, binary_rewrite, debug_print,
expect_error, is_static_exe, available_module_functions,
instrumented_module_functions);
}
//
namespace
{
static auto _consumed = (consume(), true);
}
+216 -49
Просмотреть файл
@@ -3,78 +3,245 @@ if(NOT OMNITRACE_DYNINST_API_RT_DIR AND OMNITRACE_DYNINST_API_RT)
DIRECTORY)
endif()
include(ProcessorCount)
if(NOT DEFINED NUM_PROCS_REAL)
processorcount(NUM_PROCS_REAL)
endif()
if(NOT DEFINED NUM_PROCS)
set(NUM_PROCS 2)
endif()
math(EXPR NUM_THREADS "${NUM_PROCS_REAL} + (${NUM_PROCS_REAL} / 2)")
if(NUM_THREADS GREATER 12)
set(NUM_THREADS 12)
endif()
if(OMNITRACE_BUILD_DYNINST)
set(OMNITRACE_DYNINST_API_RT_DIR
"${PROJECT_BINARY_DIR}/external/dyninst/dyninstAPI_RT:${PROJECT_BINARY_DIR}/external/dyninst/dyninstAPI"
)
endif()
set(_test_environment
set(_base_environment
"OMNITRACE_USE_PERFETTO=ON"
"OMNITRACE_USE_TIMEMORY=ON"
"OMNITRACE_USE_SAMPLING=OFF"
"OMNITRACE_TIME_OUTPUT=OFF"
"OMP_PROC_BIND=spread"
"OMP_PLACES=threads"
"OMP_NUM_THREADS=2"
"LD_LIBRARY_PATH=${PROJECT_BINARY_DIR}:${OMNITRACE_DYNINST_API_RT_DIR}:$ENV{LD_LIBRARY_PATH}"
)
if(TARGET transpose)
if(TRANSPOSE_USE_MPI AND NUM_PROCS GREATER 0)
set(COMMAND_PREFIX ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${NUM_PROCS})
set(_test_environment ${_base_environment} "OMNITRACE_CRITICAL_TRACE=OFF")
set(_fast_environment
"OMNITRACE_USE_PERFETTO=OFF"
"OMNITRACE_USE_TIMEMORY=OFF"
"OMNITRACE_USE_SAMPLING=OFF"
"OMNITRACE_CRITICAL_TRACE=OFF"
"OMNITRACE_TIME_OUTPUT=OFF"
"OMP_PROC_BIND=spread"
"OMP_PLACES=threads"
"OMP_NUM_THREADS=2"
"LD_LIBRARY_PATH=${PROJECT_BINARY_DIR}:${OMNITRACE_DYNINST_API_RT_DIR}:$ENV{LD_LIBRARY_PATH}"
)
function(OMNITRACE_ADD_TEST)
cmake_parse_arguments(
TEST
"" # options
"NAME;TARGET;MPI;NUM_PROCS" # single value args
"REWRITE_ARGS;RUNTIME_ARGS;RUN_ARGS;ENVIRONMENT;LABELS" # multiple value args
${ARGN})
if("${TEST_MPI}" STREQUAL "")
set(TEST_MPI OFF)
endif()
add_test(
NAME transpose-binary-rewrite
COMMAND
$<TARGET_FILE:omnitrace-exe> -o $<TARGET_FILE_DIR:transpose>/transpose.inst -v
1 -- $<TARGET_FILE:transpose>
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
if(NOT DEFINED TEST_NUM_PROCS)
set(TEST_NUM_PROCS ${NUM_PROCS})
endif()
add_test(
NAME transpose-binary-rewrite-run
COMMAND ${COMMAND_PREFIX} $<TARGET_FILE_DIR:transpose>/transpose.inst
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
if(NUM_PROCS EQUAL 0)
set(TEST_NUM_PROCS 0)
endif()
add_test(
NAME transpose-runtime-instrument
COMMAND $<TARGET_FILE:omnitrace-exe> -v 1 --label file line return args --
$<TARGET_FILE:transpose>
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
if(NOT DEFINED TEST_ENVIRONMENT OR "${TEST_ENVIRONMENT}" STREQUAL "")
set(TEST_ENVIRONMENT "${_test_environment}")
endif()
set_tests_properties(transpose-binary-rewrite-run PROPERTIES DEPENDS
transpose-binary-rewrite)
if(TARGET ${TEST_TARGET})
if(DEFINED TEST_MPI
AND ${TEST_MPI}
AND TEST_NUM_PROCS GREATER 0)
if(NOT TEST_NUM_PROCS GREATER NUM_PROCS_REAL)
set(COMMAND_PREFIX ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG}
${TEST_NUM_PROCS})
else()
set(COMMAND_PREFIX ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} 1)
endif()
else()
list(APPEND TEST_ENVIRONMENT "OMNITRACE_USE_PID=OFF")
endif()
set_tests_properties(
transpose-binary-rewrite transpose-binary-rewrite-run transpose-runtime-instrument
PROPERTIES ENVIRONMENT "${_test_environment}" TIMEOUT 600)
endif()
add_test(
NAME ${TEST_NAME}-baseline
COMMAND $<TARGET_FILE:${TEST_TARGET}> ${TEST_RUN_ARGS}
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
if(TARGET parallel-overhead)
add_test(
NAME parallel-overhead-binary-rewrite
COMMAND
$<TARGET_FILE:omnitrace-exe> -o
$<TARGET_FILE_DIR:parallel-overhead>/parallel-overhead.inst -v 1
--min-address-range-loop=72 --label file line return args --
$<TARGET_FILE:parallel-overhead>
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
add_test(
NAME ${TEST_NAME}-binary-rewrite
COMMAND
$<TARGET_FILE:omnitrace-exe> -o
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_TARGET}.inst ${TEST_REWRITE_ARGS}
-- $<TARGET_FILE:${TEST_TARGET}>
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
add_test(
NAME parallel-overhead-binary-rewrite-run
COMMAND $<TARGET_FILE_DIR:parallel-overhead>/parallel-overhead.inst
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
add_test(
NAME ${TEST_NAME}-binary-rewrite-sampling
COMMAND
$<TARGET_FILE:omnitrace-exe> -o
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_TARGET}.samp -M sampling
${TEST_REWRITE_ARGS} -- $<TARGET_FILE:${TEST_TARGET}>
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
add_test(
NAME parallel-overhead-runtime-instrument
COMMAND $<TARGET_FILE:omnitrace-exe> -v 1 -- $<TARGET_FILE:parallel-overhead>
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
add_test(
NAME ${TEST_NAME}-binary-rewrite-run
COMMAND ${COMMAND_PREFIX}
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_TARGET}.inst ${TEST_RUN_ARGS}
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
set_tests_properties(parallel-overhead-binary-rewrite-run
PROPERTIES DEPENDS parallel-overhead-binary-rewrite)
add_test(
NAME ${TEST_NAME}-binary-rewrite-run-sampling
COMMAND ${COMMAND_PREFIX}
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_TARGET}.samp ${TEST_RUN_ARGS}
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
set_tests_properties(
parallel-overhead-binary-rewrite parallel-overhead-binary-rewrite-run
parallel-overhead-runtime-instrument
PROPERTIES ENVIRONMENT "${_test_environment}" TIMEOUT 600)
endif()
add_test(
NAME ${TEST_NAME}-runtime-instrument
COMMAND $<TARGET_FILE:omnitrace-exe> ${TEST_RUNTIME_ARGS} --
$<TARGET_FILE:${TEST_TARGET}> ${TEST_RUN_ARGS}
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
add_test(
NAME ${TEST_NAME}-runtime-instrument-sampling
COMMAND
$<TARGET_FILE:omnitrace-exe> -M sampling --env
OMNITRACE_OUTPUT_PREFIX=sampling- ${TEST_RUNTIME_ARGS} --
$<TARGET_FILE:${TEST_TARGET}> ${TEST_RUN_ARGS}
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
set_tests_properties(${TEST_NAME}-binary-rewrite-run
PROPERTIES DEPENDS ${TEST_NAME}-binary-rewrite)
set_tests_properties(${TEST_NAME}-binary-rewrite-run-sampling
PROPERTIES DEPENDS ${TEST_NAME}-binary-rewrite-sampling)
foreach(
_TEST
baseline binary-rewrite binary-rewrite-run binary-rewrite-sampling
binary-rewrite-run-sampling runtime-instrument runtime-instrument-sampling)
string(REPLACE "-run-" "-" _PREFIX "${TEST_NAME}-${_TEST}-")
set(_environ "${TEST_ENVIRONMENT}")
list(APPEND _environ "OMNITRACE_OUTPUT_PATH=omnitrace-tests-output"
"OMNITRACE_OUTPUT_PREFIX=${_PREFIX}")
set(_LABELS "${_TEST}")
string(REPLACE "-run" "" _LABELS "${_TEST}")
string(REPLACE "-sampling" ";sampling" _LABELS "${_LABELS}")
set_tests_properties(
${TEST_NAME}-${_TEST} PROPERTIES ENVIRONMENT "${_environ}" TIMEOUT 600
LABELS "${_LABELS};${TEST_LABELS}")
endforeach()
endif()
endfunction()
omnitrace_add_test(
NAME transpose
TARGET transpose
MPI ${TRANSPOSE_USE_MPI}
NUM_PROCS ${NUM_PROCS}
REWRITE_ARGS -e -v 1
RUNTIME_ARGS -e -v 1 --label file line return args
RUN_ARGS ""
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=ON")
omnitrace_add_test(
NAME transpose-no-save-fpr
TARGET transpose
MPI ${TRANSPOSE_USE_MPI}
NUM_PROCS ${NUM_PROCS}
REWRITE_ARGS -e -v 1 --dyninst-options DelayedParsing TypeChecking
RUNTIME_ARGS
-e
-v
1
--label
file
line
return
args
--dyninst-options
DelayedParsing
TypeChecking
RUN_ARGS ""
ENVIRONMENT "${_fast_environment}")
omnitrace_add_test(
NAME parallel-overhead
TARGET parallel-overhead
REWRITE_ARGS -e -v 1 --min-address-range-loop=64
RUNTIME_ARGS
-e
-v
1
--min-address-range-loop=64
--label
file
line
return
args
RUN_ARGS 10 ${NUM_THREADS}
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF")
omnitrace_add_test(
NAME parallel-overhead-no-save-fpr
TARGET parallel-overhead
REWRITE_ARGS -e -v 1 --min-address-range-loop=32 --dyninst-options DelayedParsing
TypeChecking
RUNTIME_ARGS
-e
-v
1
--min-address-range-loop=32
--label
file
line
return
args
--dyninst-options
DelayedParsing
TypeChecking
RUN_ARGS 20 ${NUM_THREADS}
ENVIRONMENT "${_fast_environment}")
omnitrace_add_test(
NAME lulesh
TARGET lulesh
MPI ${LULESH_USE_MPI}
NUM_PROCS 8
REWRITE_ARGS -e -v 1
RUNTIME_ARGS
-e
-v
1
--label
file
line
return
args
-ME
[==['lib(gomp|m-)']==]
RUN_ARGS -i 10 -s 20 -p
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF")