Sampling support + testing + omnitrace namespace (#19)
* omnitrace namespace
* Kokkos + Lulesh example/tests
* Sampling support + more
- OMNITRACE_BUILD_TESTING option
- sampling support
- pthread_gotcha
- fixes to labels for mpi_gotcha, fork_gotcha, omnitrace_component
- tasking::block_signals, tasking::unblock_signals
- instrumentation mode option in omnitrace exe
- argument option groups in omnitrace exe
- categories in omnitrace settings
- remove TIMEMORY_ prefixed options
* Release workflow updates
* Updated settings printing
* Fixed defaults in README
* Tweak setting defaults in README
* CMake fixes
* cmake-format
* clang-format
* LULESH_USE_MPI OFF
* LULESH_USE_MPI fix
* timemory add_secondary fix
* timemory ambiguous internal namespace fix
* Update timemory submodule
* Handle output path/prefix in omnitrace
- updated timemory
- updated test environment
* sampling + papi fix
* Fix to sampling without PAPI
* Fix for using too many processors in CI
* formatting
* Updated CI
- minor cmake tweaks
- updated timemory submodule
* Updated CI
* Updated CI
* CI + timemory updates
- data race fixes
* CI updates + debug for sampling
* Sampling updates
- moved tasking::{block,unblock}_signals to sampling namespace
- improvements to sampling w.r.t. thread-locality
* Minimum OMNITRACE_THREAD_COUNT of 128
* Handle multiple dims in sampler data
* Configure libunwind support for timemory
* Improved safeguards for sampling
- updated CI
- lulesh runtime-instrument test tweak
* formatting
* CI updates + sampler updates + misc
- fixed stack-buffer-overflow in omnitrace (get_*file_line_info)
- test labels
- steady_clock instead of system_clock in sampler
- update dyninst submodule with upgradePlaceholder fix
- disable OMNITRACE_BUILD_TESTING by default
* Updated timemory submodule
- hidden visibility for timemory
- storage finalizers do not capture this
* Update timemory submodule
- component visibility updates
* Reworked header includes
- use <...> for timemory headers
- always include <library/defines.hpp>
* Rename some config options
* Update PTL submodule
* Update kokkos submodule
* Updated sampling
* Updated CI
* Reworked instrumentation exe
- lowered min-address-range threshold to 256
- extended whole function exclude
* CI fix + timemory submodule update
- TIMEMORY_VISIBLE on component base
- RelWithDebugInfo -> RelWithDebInfo
- Info output for parallel-overhead
* Sampling flags + transpose update + CI update
- disable critical trace for parallel-overhead in CI
- SA_RESTART only in sampler
- reworked transpose example to use fewer threads
* CI update
- removed ubuntu-focal-external-debug
- reduced data artifacts upload
* CI timeouts
- updated timemory submodule
- minor tweaks to omnitrace exe logging
* LICENSE updates (partial)
* CI Test stage timeout extension
* Docker and Packaging updates
* Miscellaneous fixes/tweaks
- gpu.hpp / gpu.cpp
- disable roctracer component if no devices
- re-enable InstrStackFrames by default
- disable sampling by default
- pthread_gotcha::m_enable_sampling is false by default
- timemory submodule update w/ sampler and pop(tid) updates
- fix minor bug in sampler logic
- CMake: OMNITRACE_USE_HIP option
- roctracer + timemory fix
* Replaced OMNITRACE_USE_ROCTRACER with OMNITRACE_USE_HIP where appropriate
* cmake format
* Sampler deadlock fixes
* Removed debug messages from sampler
* Fix for MPI detection + test tweaks + misc
* Sampler deadlock fixes + misc
- removed papi_tot_ins
- pthread_gotcha blocks signals globally until sampler is setup
- metadata specialization for sampling components
- OMNITRACE_INSTRUMENTATION_MODE -> OMNITRACE_MODE
- default sampling delay increased to 0.05 from 1.0e-6
- removed {block,unblock}_signals from critical_trace and ptl
- no longer necessary to use
- sampling delay minimum is 1.0e-3
- OMNITRACE_BUILD_HIDDEN_VISIBILITY
* omnitrace-avail + libunwind update + restructure
- restructured omnitrace components
- build custom omnitrace-avail executable
- updated libunwind to avoid malloc in get_unw_backtrace
* Fix remaining reorganization issues
- removed some duplicate code
- fixed some trait specializations after implicit instatiation
- formatting
* ensure_storage fix + avail improvements
- fix ensure_storage when component not avail
- suppress irrelevant info in omnitrace-avail
* Delay settings initialization
- slight tweak to tests w/ MPI
* Disable OpenMPI testing w/ ubuntu-bionic
- MPI testing is hanging bc of network interface issue on system:
> [[20462,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
> Host: fv-az19-371
> Another transport will be used instead, although this may result in
> lower performance.
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.
This commit is contained in:
zatwierdzone przez
GitHub
rodzic
39cf760a4e
commit
778af2a760
@@ -18,6 +18,17 @@ parse:
|
||||
kwargs:
|
||||
VARIABLES: '*'
|
||||
CONDITION: '*'
|
||||
omnitrace_add_test:
|
||||
kwargs:
|
||||
NAME: '*'
|
||||
TARGET: '*'
|
||||
MPI: '*'
|
||||
NUM_PROCS: '*'
|
||||
REWRITE_ARGS: '*'
|
||||
RUNTIME_ARGS: '*'
|
||||
RUN_ARGS: '*'
|
||||
ENVIRONMENT: '*'
|
||||
LABELS: '*'
|
||||
override_spec: {}
|
||||
vartags: []
|
||||
proptags: []
|
||||
|
||||
@@ -24,9 +24,9 @@ jobs:
|
||||
- name: Install Packages
|
||||
run:
|
||||
sudo apt-get update &&
|
||||
sudo apt-get install -y build-essential python3-pip libtbb-dev libboost-{atomic,system,thread,date-time,filesystem,timer}-dev ${{ matrix.compiler }} ${{ matrix.mpi }} &&
|
||||
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip libtbb-dev libboost-{atomic,system,thread,date-time,filesystem,timer}-dev ${{ matrix.compiler }} ${{ matrix.mpi }} &&
|
||||
python3 -m pip install --upgrade pip &&
|
||||
python3 -m pip install 'cmake==3.15.3'
|
||||
python3 -m pip install 'cmake==3.16.3'
|
||||
|
||||
- name: Configure Env
|
||||
run:
|
||||
@@ -44,15 +44,17 @@ jobs:
|
||||
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
|
||||
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
|
||||
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
|
||||
-DOMNITRACE_USE_MPI=${USE_MPI}
|
||||
-DOMNITRACE_USE_ROCTRACER=OFF
|
||||
-DOMNITRACE_BUILD_TESTING=ON
|
||||
-DOMNITRACE_BUILD_DYNINST=ON
|
||||
-DOMNITRACE_USE_MPI=${USE_MPI}
|
||||
-DOMNITRACE_USE_HIP=OFF
|
||||
-DDYNINST_BUILD_ELFUTILS=ON
|
||||
-DDYNINST_BUILD_LIBIBERTY=ON
|
||||
-DDYNINST_BUILD_SHARED_LIBS=ON
|
||||
-DDYNINST_BUILD_STATIC_LIBS=OFF
|
||||
|
||||
- name: Build
|
||||
timeout-minutes: 45
|
||||
run:
|
||||
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
|
||||
|
||||
@@ -61,31 +63,40 @@ jobs:
|
||||
cmake --build ${{ github.workspace }}/build --target install --parallel 2
|
||||
|
||||
- name: Test
|
||||
timeout-minutes: 30
|
||||
working-directory: ${{ github.workspace }}/build
|
||||
run:
|
||||
ctest -V --output-log ${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal.log
|
||||
|
||||
- name: Test Install
|
||||
timeout-minutes: 10
|
||||
run:
|
||||
omnitrace --help &&
|
||||
omnitrace -- sleep 1 &&
|
||||
omnitrace -o sleep.inst -- sleep &&
|
||||
./sleep.inst 1 &&
|
||||
rm ./sleep.inst
|
||||
omnitrace -e -v 1 -o ls.inst -- ls &&
|
||||
./ls.inst &&
|
||||
rm ./ls.inst &&
|
||||
omnitrace -e -v 1 -- ls
|
||||
|
||||
- name: Artifacts
|
||||
- name: CTest Artifacts
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: ctest-log
|
||||
path: |
|
||||
${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal.log
|
||||
${{ github.workspace }}/build/*.log
|
||||
|
||||
- name: Data Artifacts
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: data-files
|
||||
path: |
|
||||
${{ github.workspace }}/build/omnitrace-tests-output/*.txt
|
||||
|
||||
ubuntu-bionic:
|
||||
runs-on: ubuntu-18.04
|
||||
strategy:
|
||||
matrix:
|
||||
compiler: ['g++-7', 'g++-8']
|
||||
mpi: [ '', 'libmpich-dev mpich', 'libopenmpi-dev openmpi-bin libfabric-dev' ]
|
||||
mpi: [ '', 'libmpich-dev mpich' ]
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
@@ -93,9 +104,9 @@ jobs:
|
||||
- name: Install Packages
|
||||
run:
|
||||
sudo apt-get update &&
|
||||
sudo apt-get install -y build-essential python3-pip ${{ matrix.compiler }} ${{ matrix.mpi }} &&
|
||||
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip ${{ matrix.compiler }} ${{ matrix.mpi }} &&
|
||||
python3 -m pip install --upgrade pip &&
|
||||
python3 -m pip install 'cmake==3.15.3'
|
||||
python3 -m pip install 'cmake==3.16.3'
|
||||
|
||||
- name: Configure Env
|
||||
run:
|
||||
@@ -113,15 +124,17 @@ jobs:
|
||||
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
|
||||
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
|
||||
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
|
||||
-DOMNITRACE_USE_MPI=${USE_MPI}
|
||||
-DOMNITRACE_USE_ROCTRACER=OFF
|
||||
-DOMNITRACE_BUILD_TESTING=ON
|
||||
-DOMNITRACE_BUILD_DYNINST=ON
|
||||
-DOMNITRACE_USE_MPI=${USE_MPI}
|
||||
-DOMNITRACE_USE_HIP=OFF
|
||||
-DDYNINST_BUILD_TBB=ON
|
||||
-DDYNINST_BUILD_BOOST=ON
|
||||
-DDYNINST_BUILD_ELFUTILS=ON
|
||||
-DDYNINST_BUILD_LIBIBERTY=ON
|
||||
|
||||
- name: Build
|
||||
timeout-minutes: 45
|
||||
run:
|
||||
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
|
||||
|
||||
@@ -130,24 +143,33 @@ jobs:
|
||||
cmake --build ${{ github.workspace }}/build --target install --parallel 2
|
||||
|
||||
- name: Test
|
||||
timeout-minutes: 30
|
||||
working-directory: ${{ github.workspace }}/build
|
||||
run:
|
||||
ctest -V --output-log ${{ github.workspace }}/build/omnitrace-ctest-ubuntu-bionic.log
|
||||
|
||||
- name: Test Install
|
||||
timeout-minutes: 10
|
||||
run:
|
||||
omnitrace --help &&
|
||||
omnitrace -- sleep 1 &&
|
||||
omnitrace -o sleep.inst -- sleep &&
|
||||
./sleep.inst 1 &&
|
||||
rm ./sleep.inst
|
||||
omnitrace -e -v 1 -o ls.inst -- ls &&
|
||||
./ls.inst &&
|
||||
rm ./ls.inst &&
|
||||
omnitrace -e -v 1 -- ls
|
||||
|
||||
- name: Artifacts
|
||||
- name: CTest Artifacts
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: ctest-log
|
||||
path: |
|
||||
${{ github.workspace }}/build/omnitrace-ctest-ubuntu-bionic.log
|
||||
${{ github.workspace }}/build/*.log
|
||||
|
||||
- name: Data Artifacts
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: data-files
|
||||
path: |
|
||||
${{ github.workspace }}/build/omnitrace-tests-output/*.txt
|
||||
|
||||
ubuntu-focal-external:
|
||||
runs-on: ubuntu-20.04
|
||||
@@ -161,15 +183,15 @@ jobs:
|
||||
- name: Install Packages
|
||||
run:
|
||||
sudo apt-get update &&
|
||||
sudo apt-get install -y build-essential python3-pip libboost-{atomic,system,thread,date-time,filesystem,timer}-dev libtbb-dev libiberty-dev ${{ matrix.compiler }} &&
|
||||
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip libboost-{atomic,system,thread,date-time,filesystem,timer}-dev libtbb-dev libiberty-dev ${{ matrix.compiler }} &&
|
||||
sudo python3 -m pip install --upgrade pip &&
|
||||
python3 -m pip install 'cmake==3.15.3'
|
||||
python3 -m pip install 'cmake==3.16.3'
|
||||
|
||||
- name: Configure Env
|
||||
run:
|
||||
echo "CC=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')" >> $GITHUB_ENV &&
|
||||
echo "CXX=${{ matrix.compiler }}" >> $GITHUB_ENV &&
|
||||
echo "CMAKE_PREFIX_PATH=/opt/opt/dyninst:/opt/elfutils:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
|
||||
echo "CMAKE_PREFIX_PATH=/opt/dyninst:/opt/elfutils:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
|
||||
echo "/opt/omnitrace/bin:/opt/dyninst/bin:/opt/elfutils/bin:${HOME}/.local/bin" >> $GITHUB_PATH &&
|
||||
echo "LD_LIBRARY_PATH=/opt/omnitrace/lib:/opt/dyninst/lib:/opt/elfutils/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
|
||||
|
||||
@@ -193,7 +215,7 @@ jobs:
|
||||
cmake -B build
|
||||
-DCMAKE_C_COMPILER=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')
|
||||
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
|
||||
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
|
||||
-DCMAKE_BUILD_TYPE=Release
|
||||
-DCMAKE_INSTALL_PREFIX=/opt/dyninst &&
|
||||
cmake --build build --target all --parallel 2 &&
|
||||
cmake --build build --target install --parallel 2 &&
|
||||
@@ -205,12 +227,14 @@ jobs:
|
||||
cmake -B ${{ github.workspace }}/build
|
||||
-DCMAKE_C_COMPILER=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')
|
||||
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
|
||||
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
|
||||
-DCMAKE_BUILD_TYPE=RelWithDebInfo
|
||||
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
|
||||
-DOMNITRACE_BUILD_TESTING=ON
|
||||
-DOMNITRACE_USE_MPI=OFF
|
||||
-DOMNITRACE_USE_ROCTRACER=OFF
|
||||
-DOMNITRACE_USE_HIP=OFF
|
||||
|
||||
- name: Build
|
||||
timeout-minutes: 45
|
||||
run:
|
||||
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
|
||||
|
||||
@@ -219,6 +243,7 @@ jobs:
|
||||
cmake --build ${{ github.workspace }}/build --target install --parallel 2
|
||||
|
||||
- name: Test
|
||||
timeout-minutes: 30
|
||||
working-directory: ${{ github.workspace }}/build
|
||||
run:
|
||||
ldd ./omnitrace &&
|
||||
@@ -226,20 +251,28 @@ jobs:
|
||||
ctest -V --output-log ${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal-external.log
|
||||
|
||||
- name: Test Install
|
||||
timeout-minutes: 10
|
||||
run:
|
||||
ldd $(which omnitrace) &&
|
||||
omnitrace --help &&
|
||||
omnitrace -- sleep 1 &&
|
||||
omnitrace -o sleep.inst -- sleep &&
|
||||
./sleep.inst 1 &&
|
||||
rm ./sleep.inst
|
||||
omnitrace -e -v 1 -o ls.inst -- ls &&
|
||||
./ls.inst &&
|
||||
rm ./ls.inst &&
|
||||
omnitrace -e -v 1 -- ls
|
||||
|
||||
- name: Artifacts
|
||||
- name: CTest Artifacts
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: ctest-log
|
||||
path: |
|
||||
${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal-external.log
|
||||
${{ github.workspace }}/build/*.log
|
||||
|
||||
- name: Data Artifacts
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: data-files
|
||||
path: |
|
||||
${{ github.workspace }}/build/omnitrace-tests-output/*.txt
|
||||
|
||||
ubuntu-focal-dyninst-package:
|
||||
runs-on: ubuntu-20.04
|
||||
@@ -253,15 +286,15 @@ jobs:
|
||||
- name: Install Packages
|
||||
run:
|
||||
sudo apt-get update &&
|
||||
sudo apt-get install -y build-essential python3-pip ${{ matrix.compiler }} &&
|
||||
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip ${{ matrix.compiler }} &&
|
||||
sudo python3 -m pip install --upgrade pip &&
|
||||
python3 -m pip install 'cmake==3.15.3'
|
||||
python3 -m pip install 'cmake==3.16.3'
|
||||
|
||||
- name: Configure Env
|
||||
run:
|
||||
echo "CC=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')" >> $GITHUB_ENV &&
|
||||
echo "CXX=${{ matrix.compiler }}" >> $GITHUB_ENV &&
|
||||
echo "CMAKE_PREFIX_PATH=/opt/opt/dyninst:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
|
||||
echo "CMAKE_PREFIX_PATH=/opt/dyninst:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
|
||||
echo "/opt/omnitrace/bin:/opt/dyninst/bin:${HOME}/.local/bin" >> $GITHUB_PATH &&
|
||||
echo "LD_LIBRARY_PATH=/opt/omnitrace/lib:/opt/dyninst/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
|
||||
|
||||
@@ -292,10 +325,12 @@ jobs:
|
||||
-DCMAKE_CXX_COMPILER=${{ matrix.compiler }}
|
||||
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }}
|
||||
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
|
||||
-DOMNITRACE_BUILD_TESTING=ON
|
||||
-DOMNITRACE_USE_MPI=OFF
|
||||
-DOMNITRACE_USE_ROCTRACER=OFF
|
||||
-DOMNITRACE_USE_HIP=OFF
|
||||
|
||||
- name: Build
|
||||
timeout-minutes: 45
|
||||
run:
|
||||
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
|
||||
|
||||
@@ -304,6 +339,7 @@ jobs:
|
||||
cmake --build ${{ github.workspace }}/build --target install --parallel 2
|
||||
|
||||
- name: Test
|
||||
timeout-minutes: 30
|
||||
working-directory: ${{ github.workspace }}/build
|
||||
run:
|
||||
ldd ./omnitrace &&
|
||||
@@ -311,17 +347,127 @@ jobs:
|
||||
ctest -V --output-log ${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal-dyninst-package.log
|
||||
|
||||
- name: Test Install
|
||||
timeout-minutes: 10
|
||||
run:
|
||||
ldd $(which omnitrace) &&
|
||||
omnitrace --help &&
|
||||
omnitrace -- sleep 1 &&
|
||||
omnitrace -o sleep.inst -- sleep &&
|
||||
./sleep.inst 1 &&
|
||||
rm ./sleep.inst
|
||||
omnitrace -e -v 1 -o ls.inst -- ls &&
|
||||
./ls.inst &&
|
||||
rm ./ls.inst &&
|
||||
omnitrace -e -v 1 -- ls
|
||||
|
||||
- name: Artifacts
|
||||
- name: CTest Artifacts
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: ctest-log
|
||||
path: |
|
||||
${{ github.workspace }}/build/omnitrace-ctest-ubuntu-focal-dyninst-package.log
|
||||
${{ github.workspace }}/build/*.log
|
||||
|
||||
- name: Data Artifacts
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: data-files
|
||||
path: |
|
||||
${{ github.workspace }}/build/omnitrace-tests-output/*.txt
|
||||
|
||||
ubuntu-focal-external-rocm:
|
||||
runs-on: ubuntu-20.04
|
||||
strategy:
|
||||
matrix:
|
||||
compiler: ['g++']
|
||||
rocm_version: ['4.3', '4.3.1', '4.5']
|
||||
mpi: [ 'libmpich-dev mpich', 'libopenmpi-dev openmpi-bin libfabric-dev' ]
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
|
||||
- name: Install Packages
|
||||
run:
|
||||
echo '1' | sudo tee /proc/sys/kernel/perf_event_paranoid &&
|
||||
sudo apt-get update &&
|
||||
sudo apt-get install -y software-properties-common wget gnupg2 &&
|
||||
sudo wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add - &&
|
||||
echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/${{ matrix.rocm_version }}/ ubuntu main" | sudo tee /etc/apt/sources.list.d/rocm.list &&
|
||||
sudo apt-get update &&
|
||||
sudo apt-get install -y build-essential m4 autoconf libtool python3-pip libboost-{atomic,system,thread,date-time,filesystem,timer}-dev libtbb-dev libiberty-dev ${{ matrix.compiler }} libnuma-dev rocm-dev rocm-utils roctracer-dev rocprofiler-dev hip-base hsa-amd-aqlprofile hsa-rocr-dev hsakmt-roct-dev ${{ matrix.mpi }} libpapi-dev &&
|
||||
sudo python3 -m pip install --upgrade pip &&
|
||||
python3 -m pip install 'cmake==3.16.3'
|
||||
|
||||
- name: Configure Env
|
||||
run:
|
||||
echo "CC=$(echo '${{ matrix.compiler }}' | sed 's/+/c/g')" >> $GITHUB_ENV &&
|
||||
echo "CXX=${{ matrix.compiler }}" >> $GITHUB_ENV &&
|
||||
echo "CMAKE_PREFIX_PATH=/opt/dyninst:/opt/elfutils:${CMAKE_PREFIX_PATH}" >> $GITHUB_ENV &&
|
||||
echo "/opt/omnitrace/bin:/opt/dyninst/bin:/opt/elfutils/bin:${HOME}/.local/bin" >> $GITHUB_PATH &&
|
||||
echo "LD_LIBRARY_PATH=/opt/omnitrace/lib:/opt/dyninst/lib:/opt/elfutils/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
|
||||
|
||||
- name: Install ElfUtils
|
||||
run:
|
||||
pushd external &&
|
||||
wget https://sourceware.org/elfutils/ftp/${ELFUTILS_DOWNLOAD_VERSION}/elfutils-${ELFUTILS_DOWNLOAD_VERSION}.tar.bz2 &&
|
||||
tar xjf elfutils-${ELFUTILS_DOWNLOAD_VERSION}.tar.bz2 &&
|
||||
pushd elfutils-${ELFUTILS_DOWNLOAD_VERSION} &&
|
||||
CFLAGS="-O3" ./configure --enable-install-elfh --prefix=/opt/elfutils --disable-libdebuginfod --disable-debuginfod &&
|
||||
make -j2 &&
|
||||
make install -j2 &&
|
||||
popd &&
|
||||
rm -rf elfutils*
|
||||
|
||||
- name: Install Dyninst
|
||||
run:
|
||||
cmake --version &&
|
||||
git submodule update --init external/dyninst &&
|
||||
cd external/dyninst &&
|
||||
cmake -B build
|
||||
-DCMAKE_C_COMPILER=${CC}
|
||||
-DCMAKE_CXX_COMPILER=${CXX}
|
||||
-DCMAKE_BUILD_TYPE=Release
|
||||
-DCMAKE_INSTALL_PREFIX=/opt/dyninst &&
|
||||
cmake --build build --target all --parallel 2 &&
|
||||
cmake --build build --target install --parallel 2 &&
|
||||
rm -rf build
|
||||
|
||||
- name: Configure CMake
|
||||
run:
|
||||
cmake --version &&
|
||||
cmake -B ${{ github.workspace }}/build
|
||||
-DCMAKE_C_COMPILER=${CC}
|
||||
-DCMAKE_CXX_COMPILER=${CXX}
|
||||
-DCMAKE_BUILD_TYPE=RelWithDebInfo
|
||||
-DCMAKE_INSTALL_PREFIX=/opt/omnitrace
|
||||
-DOMNITRACE_BUILD_TESTING=OFF
|
||||
-DOMNITRACE_BUILD_DEVELOPER=ON
|
||||
-DOMNITRACE_BUILD_EXTRA_OPTIMIZATIONS=OFF
|
||||
-DOMNITRACE_BUILD_LTO=OFF
|
||||
-DOMNITRACE_USE_MPI=OFF
|
||||
-DOMNITRACE_USE_MPI_HEADERS=ON
|
||||
-DOMNITRACE_USE_HIP=ON
|
||||
-DOMNITRACE_MAX_THREADS=256
|
||||
-DOMNITRACE_USE_SANITIZER=OFF
|
||||
-DTIMEMORY_USE_PAPI=ON
|
||||
|
||||
- name: Build
|
||||
timeout-minutes: 45
|
||||
run:
|
||||
cmake --build ${{ github.workspace }}/build --target all --parallel 2 -- VERBOSE=1
|
||||
|
||||
- name: Install
|
||||
run:
|
||||
cmake --build ${{ github.workspace }}/build --target install --parallel 2
|
||||
|
||||
- name: Test
|
||||
timeout-minutes: 30
|
||||
working-directory: ${{ github.workspace }}/build
|
||||
run:
|
||||
ldd ./omnitrace &&
|
||||
./omnitrace --help
|
||||
|
||||
- name: Test Install
|
||||
timeout-minutes: 10
|
||||
run:
|
||||
ldd $(which omnitrace) &&
|
||||
omnitrace --help &&
|
||||
omnitrace -e -v 1 -o ls.inst -- ls &&
|
||||
./ls.inst &&
|
||||
rm ./ls.inst &&
|
||||
omnitrace -e -v 1 -- ls
|
||||
|
||||
@@ -13,3 +13,6 @@
|
||||
[submodule "external/PTL"]
|
||||
path = external/PTL
|
||||
url = https://github.com/jrmadsen/PTL.git
|
||||
[submodule "external/kokkos"]
|
||||
path = examples/lulesh/external/kokkos
|
||||
url = https://github.com/kokkos/kokkos.git
|
||||
|
||||
+101
-52
@@ -58,21 +58,37 @@ set(CMAKE_CXX_STANDARD
|
||||
17
|
||||
CACHE STRING "CXX language standard")
|
||||
omnitrace_add_feature(CMAKE_CXX_STANDARD "CXX language standard")
|
||||
omnitrace_add_feature(CMAKE_BUILD_TYPE "Build optimization level")
|
||||
omnitrace_add_option(CMAKE_CXX_STANDARD_REQUIRED "Require C++ language standard" ON)
|
||||
omnitrace_add_option(CMAKE_CXX_EXTENSIONS "Compiler specific language extensions" OFF)
|
||||
omnitrace_add_option(CMAKE_INSTALL_RPATH_USE_LINK_PATH "Enable rpath to linked libraries"
|
||||
ON)
|
||||
|
||||
omnitrace_add_option(OMNITRACE_USE_CLANG_TIDY "Enable clang-tidy" OFF)
|
||||
omnitrace_add_option(OMNITRACE_USE_MPI "Enable MPI support" OFF)
|
||||
omnitrace_add_option(OMNITRACE_CUSTOM_DATA_SOURCE "Enable custom data source" OFF)
|
||||
omnitrace_add_option(OMNITRACE_USE_ROCTRACER "Enable roctracer support" ON)
|
||||
omnitrace_add_option(OMNITRACE_BUILD_DYNINST "Build dyninst from submodule" OFF)
|
||||
omnitrace_add_option(OMNITRACE_USE_HIP "Enable HIP support" ON)
|
||||
omnitrace_add_option(OMNITRACE_USE_ROCTRACER "Enable roctracer support"
|
||||
${OMNITRACE_USE_HIP})
|
||||
omnitrace_add_option(OMNITRACE_USE_MPI_HEADERS
|
||||
"Enable wrapping MPI functions w/o enabling MPI dependency" OFF)
|
||||
omnitrace_add_option(OMNITRACE_BUILD_DYNINST "Build dyninst from submodule" OFF)
|
||||
omnitrace_add_option(OMNITRACE_BUILD_TESTING "Enable building the testing suite" OFF)
|
||||
omnitrace_add_option(OMNITRACE_CUSTOM_DATA_SOURCE "Enable custom data source" OFF)
|
||||
omnitrace_add_option(OMNITRACE_BUILD_HIDDEN_VISIBILITY
|
||||
"Build with hidden visibility (disable for Debug builds)" ON)
|
||||
|
||||
if(NOT OMNITRACE_USE_HIP)
|
||||
set(OMNITRACE_USE_ROCTRACER
|
||||
OFF
|
||||
CACHE BOOL "Disabled via OMNITRACE_USE_HIP=OFF" FORCE)
|
||||
endif()
|
||||
|
||||
include(ProcessorCount)
|
||||
processorcount(OMNITRACE_PROCESSOR_COUNT)
|
||||
math(EXPR OMNITRACE_THREAD_COUNT "8 * ${OMNITRACE_PROCESSOR_COUNT}")
|
||||
math(EXPR OMNITRACE_THREAD_COUNT "16 * ${OMNITRACE_PROCESSOR_COUNT}")
|
||||
if(OMNITRACE_THREAD_COUNT LESS 128)
|
||||
set(OMNITRACE_THREAD_COUNT 128)
|
||||
endif()
|
||||
set(OMNITRACE_MAX_THREADS
|
||||
"${OMNITRACE_THREAD_COUNT}"
|
||||
CACHE
|
||||
@@ -81,24 +97,25 @@ set(OMNITRACE_MAX_THREADS
|
||||
)
|
||||
omnitrace_add_feature(
|
||||
OMNITRACE_MAX_THREADS
|
||||
"Maximum number of total threads supported in the host application (default: 8 * nproc)"
|
||||
"Maximum number of total threads supported in the host application (default: max of 128 or 16 * nproc)"
|
||||
)
|
||||
|
||||
# ensure synced
|
||||
set(TIMEMORY_USE_MPI
|
||||
${OMNITRACE_USE_MPI}
|
||||
CACHE BOOL "Enable MPI support" FORCE)
|
||||
|
||||
# default visibility settings
|
||||
set(CMAKE_C_VISIBILITY_PRESET "default")
|
||||
set(CMAKE_CXX_VISIBILITY_PRESET "default")
|
||||
set(CMAKE_VISIBILITY_INLINES_HIDDEN OFF)
|
||||
set(CMAKE_C_VISIBILITY_PRESET
|
||||
"default"
|
||||
CACHE STRING "Visibility preset for non-inline C functions")
|
||||
set(CMAKE_CXX_VISIBILITY_PRESET
|
||||
"default"
|
||||
CACHE STRING "Visibility preset for non-inline C++ functions/objects")
|
||||
set(CMAKE_VISIBILITY_INLINES_HIDDEN
|
||||
OFF
|
||||
CACHE BOOL "Visibility preset for inline functions")
|
||||
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
|
||||
|
||||
include(Formatting) # format target
|
||||
include(Packages) # finds third-party libraries
|
||||
|
||||
if(OMNITRACE_USE_ROCTRACER)
|
||||
if(OMNITRACE_USE_HIP OR OMNITRACE_USE_ROCTRACER)
|
||||
find_package(HIP QUIET)
|
||||
if(HIP_VERSION_MAJOR GREATER_EQUAL 4 AND HIP_VERSION_MINOR GREATER 3)
|
||||
set(roctracer_kfdwrapper_LIBRARY)
|
||||
@@ -116,9 +133,11 @@ configure_file(${PROJECT_SOURCE_DIR}/include/library/defines.hpp.in
|
||||
omnitrace_activate_clang_tidy()
|
||||
|
||||
# custom visibility settings
|
||||
set(CMAKE_C_VISIBILITY_PRESET "hidden")
|
||||
set(CMAKE_CXX_VISIBILITY_PRESET "hidden")
|
||||
set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)
|
||||
if(OMNITRACE_BUILD_HIDDEN_VISIBILITY)
|
||||
set(CMAKE_C_VISIBILITY_PRESET "hidden")
|
||||
set(CMAKE_CXX_VISIBILITY_PRESET "hidden")
|
||||
set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)
|
||||
endif()
|
||||
|
||||
if(OMNITRACE_BUILD_LTO)
|
||||
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION ON)
|
||||
@@ -134,13 +153,17 @@ set(library_sources
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/config.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/critical_trace.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/fork_gotcha.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/omnitrace_component.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/mpi_gotcha.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/gpu.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/perfetto.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/ptl.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/sampling.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/thread_data.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/timemory.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/components/backtrace.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/components/fork_gotcha.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/components/mpi_gotcha.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/components/omnitrace.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/components/pthread_gotcha.cpp
|
||||
${perfetto_DIR}/sdk/perfetto.cc)
|
||||
|
||||
set(library_headers
|
||||
@@ -150,49 +173,53 @@ set(library_headers
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/common.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/critical_trace.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/debug.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/fork_gotcha.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/omnitrace_component.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/mpi_gotcha.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/gpu.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/perfetto.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/ptl.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/sampling.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/state.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/thread_data.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/timemory.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/components/fwd.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/components/backtrace.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/components/fork_gotcha.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/components/mpi_gotcha.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/components/omnitrace.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/components/pthread_gotcha.hpp
|
||||
${perfetto_DIR}/sdk/perfetto.h)
|
||||
|
||||
if(NOT TIMEMORY_USE_PERFETTO)
|
||||
|
||||
endif()
|
||||
|
||||
add_library(omnitrace-library SHARED ${library_sources} ${library_headers})
|
||||
|
||||
if(OMNITRACE_USE_ROCTRACER)
|
||||
target_sources(
|
||||
omnitrace-library
|
||||
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/include/library/roctracer.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/roctracer.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/roctracer_callbacks.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/roctracer_callbacks.cpp)
|
||||
PRIVATE
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/components/roctracer.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/src/library/components/roctracer_callbacks.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/components/roctracer.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/library/components/roctracer_callbacks.hpp)
|
||||
endif()
|
||||
|
||||
target_include_directories(omnitrace-library SYSTEM PRIVATE ${perfetto_DIR}/sdk)
|
||||
|
||||
target_compile_definitions(
|
||||
omnitrace-library
|
||||
PRIVATE $<IF:$<BOOL:${OMNITRACE_CUSTOM_DATA_SOURCE}>,CUSTOM_DATA_SOURCE,>)
|
||||
PRIVATE OMNITRACE_MAX_THREADS=${OMNITRACE_MAX_THREADS}
|
||||
$<IF:$<BOOL:${OMNITRACE_CUSTOM_DATA_SOURCE}>,CUSTOM_DATA_SOURCE,>)
|
||||
|
||||
target_link_libraries(
|
||||
omnitrace-library
|
||||
PRIVATE omnitrace::omnitrace-headers
|
||||
omnitrace::omnitrace-threading
|
||||
omnitrace::omnitrace-compile-options
|
||||
omnitrace::omnitrace-roctracer
|
||||
omnitrace::omnitrace-mpi
|
||||
omnitrace::omnitrace-ptl
|
||||
$<BUILD_INTERFACE:timemory::timemory-headers>
|
||||
$<BUILD_INTERFACE:timemory::timemory-gotcha>
|
||||
$<BUILD_INTERFACE:timemory::timemory-cxx-shared>
|
||||
$<IF:$<BOOL:${OMNITRACE_USE_SANITIZER}>,omnitrace::omnitrace-sanitizer,>)
|
||||
PUBLIC $<BUILD_INTERFACE:omnitrace::omnitrace-headers>
|
||||
$<BUILD_INTERFACE:omnitrace::omnitrace-threading>
|
||||
$<BUILD_INTERFACE:omnitrace::omnitrace-compile-options>
|
||||
$<BUILD_INTERFACE:omnitrace::omnitrace-hip>
|
||||
$<BUILD_INTERFACE:omnitrace::omnitrace-roctracer>
|
||||
$<BUILD_INTERFACE:omnitrace::omnitrace-mpi>
|
||||
$<BUILD_INTERFACE:omnitrace::omnitrace-ptl>
|
||||
$<BUILD_INTERFACE:timemory::timemory-headers>
|
||||
$<BUILD_INTERFACE:timemory::timemory-gotcha>
|
||||
$<BUILD_INTERFACE:timemory::timemory-cxx-shared>
|
||||
$<IF:$<BOOL:${OMNITRACE_USE_SANITIZER}>,omnitrace::omnitrace-sanitizer,>)
|
||||
|
||||
if(OMNITRACE_DYNINST_API_RT)
|
||||
get_filename_component(OMNITRACE_DYNINST_API_RT_DIR "${OMNITRACE_DYNINST_API_RT}"
|
||||
@@ -200,14 +227,35 @@ if(OMNITRACE_DYNINST_API_RT)
|
||||
endif()
|
||||
|
||||
set_target_properties(
|
||||
omnitrace-library PROPERTIES OUTPUT_NAME omnitrace
|
||||
INSTALL_RPATH "\$ORIGIN:\$ORIGIN/dyninst-tpls/libs")
|
||||
omnitrace-library
|
||||
PROPERTIES OUTPUT_NAME omnitrace
|
||||
INSTALL_RPATH
|
||||
"\$ORIGIN:\$ORIGIN/timemory/libunwind:\$ORIGIN/dyninst-tpls/libs")
|
||||
|
||||
install(
|
||||
TARGETS omnitrace-library
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
OPTIONAL)
|
||||
|
||||
# ------------------------------------------------------------------------------#
|
||||
#
|
||||
# omnitrace-avail target
|
||||
#
|
||||
# ------------------------------------------------------------------------------#
|
||||
|
||||
add_executable(omnitrace-avail ${CMAKE_CURRENT_LIST_DIR}/src/avail.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/include/avail.hpp)
|
||||
|
||||
target_include_directories(omnitrace-avail PRIVATE ${CMAKE_CURRENT_LIST_DIR}/include)
|
||||
target_compile_definitions(omnitrace-avail PRIVATE OMNITRACE_EXTERN_COMPONENTS=0)
|
||||
target_link_libraries(omnitrace-avail PRIVATE omnitrace-library)
|
||||
set_target_properties(omnitrace-avail PROPERTIES INSTALL_RPATH_USE_LINK_PATH ON)
|
||||
|
||||
install(
|
||||
TARGETS omnitrace-avail
|
||||
DESTINATION bin
|
||||
OPTIONAL)
|
||||
|
||||
# ------------------------------------------------------------------------------#
|
||||
#
|
||||
# omnitrace-exe target
|
||||
@@ -234,7 +282,7 @@ set_target_properties(
|
||||
OUTPUT_NAME omnitrace
|
||||
INSTALL_RPATH_USE_LINK_PATH ON
|
||||
INSTALL_RPATH
|
||||
"\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}:\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}/dyninst-tpls/lib"
|
||||
"\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}:\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}/timemory/libunwind:\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}/dyninst-tpls/lib"
|
||||
)
|
||||
|
||||
install(
|
||||
@@ -242,9 +290,6 @@ install(
|
||||
DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
OPTIONAL)
|
||||
|
||||
# build the timemory-avail exe
|
||||
add_dependencies(omnitrace-exe timemory-avail)
|
||||
|
||||
# ------------------------------------------------------------------------------#
|
||||
#
|
||||
# miscellaneous installs
|
||||
@@ -275,7 +320,9 @@ install(
|
||||
#
|
||||
# ------------------------------------------------------------------------------#
|
||||
|
||||
add_subdirectory(examples)
|
||||
if(OMNITRACE_BUILD_TESTING)
|
||||
add_subdirectory(examples)
|
||||
endif()
|
||||
|
||||
# ------------------------------------------------------------------------------#
|
||||
#
|
||||
@@ -283,10 +330,12 @@ add_subdirectory(examples)
|
||||
#
|
||||
# ------------------------------------------------------------------------------#
|
||||
|
||||
include(CTest)
|
||||
enable_testing()
|
||||
if(OMNITRACE_BUILD_TESTING)
|
||||
include(CTest)
|
||||
enable_testing()
|
||||
|
||||
add_subdirectory(tests)
|
||||
add_subdirectory(tests)
|
||||
endif()
|
||||
|
||||
# ------------------------------------------------------------------------------#
|
||||
#
|
||||
|
||||
+11
-17
@@ -1,27 +1,21 @@
|
||||
Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
with the Software without restriction, including without limitation the
|
||||
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
sell copies of the Software, and to permit persons to whom the Software is
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimers.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimers in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the names of Advanced Micro Devices, Inc. nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this Software without specific prior written permission.
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
THE SOFTWARE.
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
|
||||
+48
-14
@@ -53,19 +53,53 @@ omnitrace <omnitrace-options> -- <exe-or-library> <exe-options>
|
||||
|
||||
## Omnitrace Library Environment Settings
|
||||
|
||||
| Environment Variable | Default Value | Description |
|
||||
|-----------------------------|-------------------------------|----------------------------------------------------------------------------------|
|
||||
| `OMNITRACE_DEBUG` | `false` | Enable debugging statements |
|
||||
| `OMNITRACE_USE_PERFETTO` | `true` | Collect profiling data via perfetto |
|
||||
| `OMNITRACE_USE_TIMEMORY` | `false` | Collection profiling data via timemory |
|
||||
| `OMNITRACE_SAMPLE_RATE` | `1` | Invoke perfetto and/or timemory once every N function calls |
|
||||
| `OMNITRACE_USE_MPI` | `true` | Label perfetto output files via rank instead of PID |
|
||||
| `OMNITRACE_OUTPUT_FILE` | `perfetto-trace.%rank%.proto` | Output file for perfetto (may use `%pid`) |
|
||||
| `OMNITRACE_BACKEND` | `"inprocess"` | Configure perfetto to use either "inprocess" data management, "system", or "all" |
|
||||
| `OMNITRACE_COMPONENTS` | `"wall_clock"` | Timemory components to activate when enabled |
|
||||
| `OMNITRACE_SHMEM_SIZE_HINT` | `40960` | Hint for perfetto shared memory buffer |
|
||||
| `OMNITRACE_BUFFER_SIZE_KB` | `1024000` | Maximum amount of memory perfetto will use to collect data in-process |
|
||||
| `TIMEMORY_TIME_OUTPUT` | `true` | Create unique output subdirectory with date and launch time |
|
||||
| Environment Variable | Default Value | Description |
|
||||
|--------------------------------------------|--------------------------|------------------------------------------------------------------------------------------------------------------|
|
||||
| `OMNITRACE_USE_PERFETTO` | `false` | Enable perfetto backend |
|
||||
| `OMNITRACE_USE_PID` | `true` | Enable tagging filenames with process identifier (either MPI rank or pid) |
|
||||
| `OMNITRACE_USE_ROCTRACER` | `true` | Enable ROCM tracing |
|
||||
| `OMNITRACE_USE_SAMPLING` | `true` | Enable statistical sampling of call-stack |
|
||||
| `OMNITRACE_USE_TIMEMORY` | `false` | Enable timemory backend |
|
||||
| `OMNITRACE_BACKEND` | `inprocess` | Specify the perfetto backend to activate. Options are: 'inprocess', 'system', or 'all' |
|
||||
| `OMNITRACE_BUFFER_SIZE_KB` | `1024000` | Size of perfetto buffer (in KB) |
|
||||
| `OMNITRACE_COUT_OUTPUT` | `false` | Write output to stdout |
|
||||
| `OMNITRACE_CRITICAL_TRACE` | `false` | Enable generation of the critical trace |
|
||||
| `OMNITRACE_CRITICAL_TRACE_BUFFER_COUNT` | `2000` | Number of critical trace records to store in thread-local memory before submitting to shared buffer |
|
||||
| `OMNITRACE_CRITICAL_TRACE_COUNT` | `0` | Number of critical trace to export (0 == all) |
|
||||
| `OMNITRACE_CRITICAL_TRACE_DEBUG` | `false` | Enable debugging for critical trace |
|
||||
| `OMNITRACE_CRITICAL_TRACE_NUM_THREADS` | `8` | Number of threads to use when generating the critical trace |
|
||||
| `OMNITRACE_CRITICAL_TRACE_PER_ROW` | `0` | How many critical traces per row in perfetto (0 == all in one row) |
|
||||
| `OMNITRACE_CRITICAL_TRACE_SERIALIZE_NAMES` | `false` | Include names in serialization of critical trace (mainly for debugging) |
|
||||
| `OMNITRACE_DIFF_OUTPUT` | `false` | Generate a difference output vs. a pre-existing output (see also: TIMEMORY_INPUT_PATH and TIMEMORY_INPUT_PREFIX) |
|
||||
| `OMNITRACE_FLAT_SAMPLING` | `false` | Ignore hierarchy in all statistical sampling entries |
|
||||
| `OMNITRACE_INSTRUMENTATION_INTERVAL` | `1` | Instrumentation only takes measurements once every N function calls (not statistical) |
|
||||
| `OMNITRACE_JSON_OUTPUT` | `true` | Write json output files |
|
||||
| `OMNITRACE_MEMORY_PRECISION` | `-1` | Set the precision for components with 'is_memory_category' type-trait |
|
||||
| `OMNITRACE_MEMORY_SCIENTIFIC` | `false` | Set the numerical reporting format for components with 'is_memory_category' type-trait |
|
||||
| `OMNITRACE_MEMORY_UNITS` | `""` | Set the units for components with 'uses_memory_units' type-trait |
|
||||
| `OMNITRACE_OUTPUT_FILE` | `""` | Perfetto filename |
|
||||
| `OMNITRACE_OUTPUT_PATH` | `omnitrace-{EXE}-output` | Explicitly specify the output folder for results |
|
||||
| `OMNITRACE_OUTPUT_PREFIX` | `""` | Explicitly specify a prefix for all output files |
|
||||
| `OMNITRACE_PRECISION` | `-1` | Set the global output precision for components |
|
||||
| `OMNITRACE_ROCTRACER_FLAT_PROFILE` | `false` | Ignore hierarchy in all kernels entries with timemory backend |
|
||||
| `OMNITRACE_ROCTRACER_HSA_ACTIVITY` | `false` | Enable HSA activity tracing support |
|
||||
| `OMNITRACE_ROCTRACER_HSA_API` | `false` | Enable HSA API tracing support |
|
||||
| `OMNITRACE_ROCTRACER_HSA_API_TYPES` | `""` | HSA API type to collect |
|
||||
| `OMNITRACE_ROCTRACER_TIMELINE_PROFILE` | `false` | Create unique entries for every kernel with timemory backend |
|
||||
| `OMNITRACE_SAMPLING_DELAY` | `1e-06` | Number of seconds to delay activating the statistical sampling |
|
||||
| `OMNITRACE_SAMPLING_FREQ` | `10` | Number of software interrupts per second when OMNITTRACE_USE_SAMPLING=ON |
|
||||
| `OMNITRACE_SCIENTIFIC` | `false` | Set the global numerical reporting to scientific format |
|
||||
| `OMNITRACE_SETTINGS_DESC` | `false` | Provide descriptions when printing settings |
|
||||
| `OMNITRACE_SHMEM_SIZE_HINT_KB` | `40960` | Hint for shared-memory buffer size in perfetto (in KB) |
|
||||
| `OMNITRACE_TEXT_OUTPUT` | `true` | Write text output files |
|
||||
| `OMNITRACE_TIMELINE_SAMPLING` | `false` | Create unique entries for every sample when statistical sampling is enabled |
|
||||
| `OMNITRACE_TIMEMORY_COMPONENTS` | `wall_clock` | List of components to collect via timemory (see timemory-avail) |
|
||||
| `OMNITRACE_TIME_FORMAT` | `%F_%I.%M_%p` | Customize the folder generation when TIMEMORY_TIME_OUTPUT is enabled (see also: strftime) |
|
||||
| `OMNITRACE_TIME_OUTPUT` | `true` | Output data to subfolder w/ a timestamp (see also: TIMEMORY_TIME_FORMAT) |
|
||||
| `OMNITRACE_TIMING_PRECISION` | `6` | Set the precision for components with 'is_timing_category' type-trait |
|
||||
| `OMNITRACE_TIMING_SCIENTIFIC` | `false` | Set the numerical reporting format for components with 'is_timing_category' type-trait |
|
||||
| `OMNITRACE_TIMING_UNITS` | `""` | Set the units for components with 'uses_timing_units' type-trait |
|
||||
| `OMNITRACE_TREE_OUTPUT` | `true` | Write hierarchical json output files |
|
||||
|
||||
### Example Omnitrace Instrumentation
|
||||
|
||||
@@ -165,7 +199,7 @@ variable. The special character sequences `%pid%` and `%rank%` will be replaced
|
||||
|
||||
## Merging the traces from rocprof and omnitrace
|
||||
|
||||
> NOTE: Using `rocprof` externally is deprecated. The current version has built-in support for
|
||||
> NOTE: Using `rocprof` externally for tracing is deprecated. The current version has built-in support for
|
||||
> recording the GPU activity and HIP API calls. If you want to use an external rocprof, either
|
||||
> configure CMake with `-DOMNITRACE_USE_ROCTRACER=OFF` or explicitly set `TIMEMORY_ROCTRACER_ENABLED=OFF` in the
|
||||
> environment.
|
||||
|
||||
@@ -45,6 +45,11 @@ if(OMNITRACE_CLANG_FORMAT_EXE)
|
||||
file(GLOB_RECURSE headers ${PROJECT_SOURCE_DIR}/include/*.hpp)
|
||||
file(GLOB_RECURSE examples ${PROJECT_SOURCE_DIR}/examples/*.cpp
|
||||
${PROJECT_SOURCE_DIR}/examples/*.hpp)
|
||||
file(GLOB_RECURSE external ${PROJECT_SOURCE_DIR}/examples/lulesh/external/*.cpp
|
||||
${PROJECT_SOURCE_DIR}/examples/lulesh/external/*.hpp)
|
||||
if(external)
|
||||
list(REMOVE_ITEM examples ${external})
|
||||
endif()
|
||||
add_custom_target(
|
||||
format-omnitrace
|
||||
${OMNITRACE_CLANG_FORMAT_EXE} -i ${sources} ${headers} ${examples}
|
||||
|
||||
@@ -13,6 +13,7 @@ omnitrace_add_interface_library(omnitrace-threading "Enables multithreading supp
|
||||
omnitrace_add_interface_library(
|
||||
omnitrace-dyninst
|
||||
"Provides flags and libraries for Dyninst (dynamic instrumentation)")
|
||||
omnitrace_add_interface_library(omnitrace-hip "Provides flags and libraries for HIP")
|
||||
omnitrace_add_interface_library(omnitrace-roctracer
|
||||
"Provides flags and libraries for roctracer")
|
||||
omnitrace_add_interface_library(omnitrace-mpi "Provides MPI or MPI headers")
|
||||
@@ -24,6 +25,9 @@ target_include_directories(omnitrace-headers INTERFACE ${PROJECT_SOURCE_DIR}/inc
|
||||
# include threading because of rooflines
|
||||
target_link_libraries(omnitrace-headers INTERFACE omnitrace-threading)
|
||||
|
||||
# ensure the env overrides the appending /opt/rocm later
|
||||
string(REPLACE ":" ";" CMAKE_PREFIX_PATH "$ENV{CMAKE_PREFIX_PATH};${CMAKE_PREFIX_PATH}")
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
#
|
||||
# Threading
|
||||
@@ -47,6 +51,19 @@ if(pthread_LIBRARY AND NOT WIN32)
|
||||
target_link_libraries(omnitrace-threading INTERFACE ${pthread_LIBRARY})
|
||||
endif()
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
#
|
||||
# HIP
|
||||
#
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
|
||||
if(OMNITRACE_USE_HIP)
|
||||
list(APPEND CMAKE_PREFIX_PATH /opt/rocm)
|
||||
find_package(hip ${omnitrace_FIND_QUIETLY} REQUIRED)
|
||||
target_compile_definitions(omnitrace-hip INTERFACE OMNITRACE_USE_HIP)
|
||||
target_link_libraries(omnitrace-hip INTERFACE hip::host)
|
||||
endif()
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
#
|
||||
# roctracer
|
||||
@@ -56,9 +73,9 @@ endif()
|
||||
if(OMNITRACE_USE_ROCTRACER)
|
||||
list(APPEND CMAKE_PREFIX_PATH /opt/rocm)
|
||||
find_package(roctracer ${omnitrace_FIND_QUIETLY} REQUIRED)
|
||||
find_package(hip ${omnitrace_FIND_QUIETLY} REQUIRED)
|
||||
target_compile_definitions(omnitrace-roctracer INTERFACE OMNITRACE_USE_ROCTRACER)
|
||||
target_link_libraries(omnitrace-roctracer INTERFACE hip::host roctracer::roctracer)
|
||||
target_link_libraries(omnitrace-roctracer INTERFACE roctracer::roctracer
|
||||
omnitrace::omnitrace-hip)
|
||||
set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_RPATH}:${roctracer_LIBRARY_DIRS}")
|
||||
endif()
|
||||
|
||||
@@ -297,45 +314,77 @@ set(TIMEMORY_BUILD_TOOLS
|
||||
set(TIMEMORY_BUILD_EXCLUDE_FROM_ALL
|
||||
ON
|
||||
CACHE BOOL "Set timemory to only build dependencies")
|
||||
set(TIMEMORY_BUILD_HIDDEN_VISIBILITY
|
||||
ON
|
||||
CACHE BOOL "Build timemory with hidden visibility")
|
||||
set(TIMEMORY_QUIET_CONFIG
|
||||
ON
|
||||
CACHE BOOL "Make timemory configuration quieter")
|
||||
|
||||
# timemory feature settings
|
||||
set(TIMEMORY_USE_MPI
|
||||
${OMNITRACE_USE_MPI}
|
||||
CACHE BOOL "Enable MPI support in timemory" FORCE)
|
||||
set(TIMEMORY_USE_GOTCHA
|
||||
ON
|
||||
CACHE BOOL "Enable GOTCHA support in timemory")
|
||||
set(TIMEMORY_USE_PERFETTO
|
||||
OFF
|
||||
CACHE BOOL "Disable perfetto support in timemory")
|
||||
set(TIMEMORY_USE_LIBUNWIND
|
||||
ON
|
||||
CACHE BOOL "Enable libunwind support in timemory")
|
||||
|
||||
# timemory feature build settings
|
||||
set(TIMEMORY_BUILD_GOTCHA
|
||||
ON
|
||||
CACHE BOOL "Enable building GOTCHA library from submodule")
|
||||
set(TIMEMORY_BUILD_LIBUNWIND
|
||||
ON
|
||||
CACHE BOOL "Enable building libunwind library from submodule")
|
||||
set(TIMEMORY_BUILD_EXTRA_OPTIMIZATIONS
|
||||
${OMNITRACE_BUILD_EXTRA_OPTIMIZATIONS}
|
||||
CACHE BOOL "Enable building GOTCHA library from submodule" FORCE)
|
||||
|
||||
# timemory build settings
|
||||
set(TIMEMORY_TLS_MODEL
|
||||
"global-dynamic"
|
||||
CACHE STRING "Thread-local static model" FORCE)
|
||||
|
||||
set(TIMEMORY_SETTINGS_PREFIX
|
||||
"OMNITRACE_"
|
||||
CACHE STRING "Prefix used for settings and environment variables")
|
||||
mark_as_advanced(TIMEMORY_SETTINGS_PREFIX)
|
||||
|
||||
omnitrace_checkout_git_submodule(
|
||||
RELATIVE_PATH external/timemory
|
||||
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
|
||||
REPO_URL https://github.com/NERSC/timemory.git
|
||||
REPO_BRANCH gpu-kernel-instrumentation)
|
||||
|
||||
omnitrace_save_variables(BUILD_CONFIG VARIABLES BUILD_SHARED_LIBS BUILD_STATIC_LIBS
|
||||
CMAKE_POSITION_INDEPENDENT_CODE)
|
||||
omnitrace_save_variables(
|
||||
BUILD_CONFIG VARIABLES BUILD_SHARED_LIBS BUILD_STATIC_LIBS
|
||||
CMAKE_POSITION_INDEPENDENT_CODE CMAKE_PREFIX_PATH)
|
||||
|
||||
# ensure timemory builds PIC static libs so that we don't have to install timemory shared
|
||||
# lib
|
||||
set(BUILD_SHARED_LIBS ON)
|
||||
set(BUILD_STATIC_LIBS OFF)
|
||||
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
|
||||
set(TIMEMORY_CTP_OPTIONS GLOBAL)
|
||||
|
||||
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
|
||||
# results in undefined symbols to component::base<T>::load()
|
||||
set(TIMEMORY_BUILD_HIDDEN_VISIBILITY
|
||||
OFF
|
||||
CACHE BOOL "" FORCE)
|
||||
endif()
|
||||
|
||||
add_subdirectory(external/timemory)
|
||||
|
||||
omnitrace_restore_variables(BUILD_CONFIG VARIABLES BUILD_SHARED_LIBS BUILD_STATIC_LIBS
|
||||
CMAKE_POSITION_INDEPENDENT_CODE)
|
||||
omnitrace_restore_variables(
|
||||
BUILD_CONFIG VARIABLES BUILD_SHARED_LIBS BUILD_STATIC_LIBS
|
||||
CMAKE_POSITION_INDEPENDENT_CODE CMAKE_PREFIX_PATH)
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
#
|
||||
|
||||
@@ -13,12 +13,13 @@ WORKDIR /tmp
|
||||
SHELL [ "/bin/bash", "-c" ]
|
||||
|
||||
ARG EXTRA_PACKAGES=""
|
||||
ARG ROCM_REPO_VERSION="debian"
|
||||
|
||||
RUN apt-get update && \
|
||||
apt-get dist-upgrade -y && \
|
||||
apt-get install -y build-essential cmake libnuma-dev wget gnupg2 m4 bash-completion git-core && \
|
||||
apt-get install -y build-essential cmake libnuma-dev wget gnupg2 m4 bash-completion git-core autoconf libtool autotools-dev && \
|
||||
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - && \
|
||||
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | tee /etc/apt/sources.list.d/rocm.list && \
|
||||
echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/${ROCM_REPO_VERSION}/ ubuntu main" | tee /etc/apt/sources.list.d/rocm.list && \
|
||||
apt-get update && \
|
||||
apt-get dist-upgrade -y && \
|
||||
apt-get install -y rocm-dev rocm-utils roctracer-dev rocprofiler-dev hip-base hsa-amd-aqlprofile hsa-rocr-dev hsakmt-roct-dev ${EXTRA_PACKAGES}
|
||||
|
||||
Executable
+22
@@ -0,0 +1,22 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
if [ ! -f CMakeLists.txt ]; then
|
||||
echo "Error! Execute script from source directory"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
set -e
|
||||
|
||||
build-release()
|
||||
{
|
||||
CONTAINER=$1
|
||||
ROCM_VERSION=$2
|
||||
CODE_VERSION=$3
|
||||
docker run -it --rm -v ${PWD}:/home/omnitrace --env ROCM_VERSION=${ROCM_VERSION} --env VERSION=${CODE_VERSION} ${CONTAINER} /home/omnitrace/scripts/build-release.sh
|
||||
}
|
||||
|
||||
CODE_VERSION=$(cat VERSION)
|
||||
|
||||
build-release jrmadsen/omnitrace-base-rocm-4.5 4.5.0 ${CODE_VERSION}
|
||||
build-release jrmadsen/omnitrace-base-rocm-4.3 4.3.0 ${CODE_VERSION}
|
||||
build-release jrmadsen/omnitrace-base-rocm-4.3.1 4.3.1 ${CODE_VERSION}
|
||||
Executable
@@ -0,0 +1,8 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
: ${ROCM_VERSIONS:="4.5 4.3 4.3.1"}
|
||||
|
||||
for i in ${ROCM_VERSIONS}
|
||||
do
|
||||
docker build . --tag jrmadsen/omnitrace-base-rocm-${i} --build-arg ROCM_REPO_VERSION=${i}
|
||||
done
|
||||
@@ -7,3 +7,7 @@ set(CMAKE_CXX_VISIBILITY_PRESET "default")
|
||||
|
||||
add_subdirectory(transpose)
|
||||
add_subdirectory(parallel-overhead)
|
||||
|
||||
option(BUILD_SHARED_LIBS "Build dynamic libraries" ON)
|
||||
|
||||
add_subdirectory(lulesh)
|
||||
|
||||
@@ -0,0 +1,60 @@
|
||||
cmake_minimum_required(VERSION 3.15 FATAL_ERROR)
|
||||
|
||||
project(lulesh LANGUAGES C CXX)
|
||||
|
||||
list(INSERT CMAKE_MODULE_PATH 0 ${PROJECT_SOURCE_DIR}/cmake/Modules)
|
||||
|
||||
add_subdirectory(external)
|
||||
|
||||
set(CMAKE_CXX_EXTENSIONS OFF)
|
||||
|
||||
if("${CMAKE_BUILD_TYPE}" STREQUAL "")
|
||||
set(CMAKE_BUILD_TYPE
|
||||
"RelWithDebInfo"
|
||||
CACHE STRING "CMake build type" FORCE)
|
||||
endif()
|
||||
|
||||
if(DEFINED OMNITRACE_USE_MPI)
|
||||
option(LULESH_USE_MPI "Enable MPI" ${OMNITRACE_USE_MPI})
|
||||
else()
|
||||
option(LULESH_USE_MPI "Enable MPI" OFF)
|
||||
endif()
|
||||
|
||||
add_library(lulesh-mpi INTERFACE)
|
||||
if(LULESH_USE_MPI)
|
||||
find_package(MPI REQUIRED)
|
||||
target_compile_definitions(lulesh-mpi INTERFACE USE_MPI=1)
|
||||
target_link_libraries(lulesh-mpi INTERFACE MPI::MPI_C MPI::MPI_CXX)
|
||||
else()
|
||||
target_compile_definitions(lulesh-mpi INTERFACE USE_MPI=0)
|
||||
endif()
|
||||
|
||||
if(NOT TARGET Kokkos::kokkos)
|
||||
find_package(Kokkos REQUIRED)
|
||||
endif()
|
||||
|
||||
file(GLOB headers ${PROJECT_SOURCE_DIR}/*.h ${PROJECT_SOURCE_DIR}/*.hxx)
|
||||
file(GLOB sources ${PROJECT_SOURCE_DIR}/*.cc)
|
||||
|
||||
add_executable(${PROJECT_NAME} ${sources} ${headers})
|
||||
target_include_directories(${PROJECT_NAME} PRIVATE ${PROJECT_SOURCE_DIR}/includes)
|
||||
target_link_libraries(${PROJECT_NAME} PRIVATE Kokkos::kokkos lulesh-mpi)
|
||||
|
||||
if(NOT CMAKE_PROJECT_NAME STREQUAL PROJECT_NAME)
|
||||
set_target_properties(${PROJECT_NAME} PROPERTIES RUNTIME_OUTPUT_DIRECTORY
|
||||
${CMAKE_BINARY_DIR})
|
||||
endif()
|
||||
|
||||
enable_testing()
|
||||
if(LULESH_USE_MPI)
|
||||
add_test(
|
||||
NAME lulesh
|
||||
COMMAND ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} 8
|
||||
$<TARGET_FILE:${PROJECT_NAME}> -i 100 -s 20 -p
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
else()
|
||||
add_test(
|
||||
NAME lulesh
|
||||
COMMAND $<TARGET_FILE:${PROJECT_NAME}> -i 100 -s 20 -p
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
endif()
|
||||
@@ -0,0 +1,315 @@
|
||||
# include guard
|
||||
include_guard(DIRECTORY)
|
||||
|
||||
# MacroUtilities - useful macros and functions for generic tasks
|
||||
#
|
||||
|
||||
include(CMakeDependentOption)
|
||||
include(CMakeParseArguments)
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# function - capitalize - make a string capitalized (first letter is capital) usage:
|
||||
# capitalize("SHARED" CShared) message(STATUS "-- CShared is \"${CShared}\"") $ -- CShared
|
||||
# is "Shared"
|
||||
function(CAPITALIZE str var)
|
||||
# make string lower
|
||||
string(TOLOWER "${str}" str)
|
||||
string(SUBSTRING "${str}" 0 1 _first)
|
||||
string(TOUPPER "${_first}" _first)
|
||||
string(SUBSTRING "${str}" 1 -1 _remainder)
|
||||
string(CONCAT str "${_first}" "${_remainder}")
|
||||
set(${var}
|
||||
"${str}"
|
||||
PARENT_SCOPE)
|
||||
endfunction()
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
# macro CHECKOUT_GIT_SUBMODULE()
|
||||
#
|
||||
# Run "git submodule update" if a file in a submodule does not exist
|
||||
#
|
||||
# ARGS: RECURSIVE (option) -- add "--recursive" flag RELATIVE_PATH (one value) --
|
||||
# typically the relative path to submodule from PROJECT_SOURCE_DIR WORKING_DIRECTORY (one
|
||||
# value) -- (default: PROJECT_SOURCE_DIR) TEST_FILE (one value) -- file to check for
|
||||
# (default: CMakeLists.txt) ADDITIONAL_CMDS (many value) -- any addition commands to pass
|
||||
#
|
||||
function(CHECKOUT_GIT_SUBMODULE)
|
||||
# parse args
|
||||
cmake_parse_arguments(
|
||||
CHECKOUT "RECURSIVE"
|
||||
"RELATIVE_PATH;WORKING_DIRECTORY;TEST_FILE;REPO_URL;REPO_BRANCH"
|
||||
"ADDITIONAL_CMDS" ${ARGN})
|
||||
|
||||
if(NOT CHECKOUT_WORKING_DIRECTORY)
|
||||
set(CHECKOUT_WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
|
||||
endif()
|
||||
|
||||
if(NOT CHECKOUT_TEST_FILE)
|
||||
set(CHECKOUT_TEST_FILE "Makefile")
|
||||
endif()
|
||||
|
||||
# default assumption
|
||||
if(NOT CHECKOUT_REPO_BRANCH)
|
||||
set(CHECKOUT_REPO_BRANCH "master")
|
||||
endif()
|
||||
|
||||
find_package(Git)
|
||||
set(_DIR "${CHECKOUT_WORKING_DIRECTORY}/${CHECKOUT_RELATIVE_PATH}")
|
||||
# ensure the (possibly empty) directory exists
|
||||
if(NOT EXISTS "${_DIR}")
|
||||
if(NOT CHECKOUT_REPO_URL)
|
||||
message(FATAL_ERROR "submodule directory does not exist")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# if this file exists --> project has been checked out if not exists --> not been
|
||||
# checked out
|
||||
set(_TEST_FILE "${_DIR}/${CHECKOUT_TEST_FILE}")
|
||||
# assuming a .gitmodules file exists
|
||||
set(_SUBMODULE "${PROJECT_SOURCE_DIR}/.gitmodules")
|
||||
|
||||
set(_TEST_FILE_EXISTS OFF)
|
||||
if(EXISTS "${_TEST_FILE}" AND NOT IS_DIRECTORY "${_TEST_FILE}")
|
||||
set(_TEST_FILE_EXISTS ON)
|
||||
endif()
|
||||
|
||||
if(_TEST_FILE_EXISTS)
|
||||
return()
|
||||
endif()
|
||||
|
||||
find_package(Git REQUIRED)
|
||||
|
||||
set(_SUBMODULE_EXISTS OFF)
|
||||
if(EXISTS "${_SUBMODULE}" AND NOT IS_DIRECTORY "${_SUBMODULE}")
|
||||
set(_SUBMODULE_EXISTS ON)
|
||||
endif()
|
||||
|
||||
set(_HAS_REPO_URL OFF)
|
||||
if(NOT "${CHECKOUT_REPO_URL}" STREQUAL "")
|
||||
set(_HAS_REPO_URL ON)
|
||||
endif()
|
||||
|
||||
# if the module has not been checked out
|
||||
if(NOT _TEST_FILE_EXISTS AND _SUBMODULE_EXISTS)
|
||||
# perform the checkout
|
||||
execute_process(
|
||||
COMMAND ${GIT_EXECUTABLE} submodule update --init ${_RECURSE}
|
||||
${CHECKOUT_ADDITIONAL_CMDS} ${CHECKOUT_RELATIVE_PATH}
|
||||
WORKING_DIRECTORY ${CHECKOUT_WORKING_DIRECTORY}
|
||||
RESULT_VARIABLE RET)
|
||||
|
||||
# check the return code
|
||||
if(RET GREATER 0)
|
||||
set(_CMD "${GIT_EXECUTABLE} submodule update --init ${_RECURSE}
|
||||
${CHECKOUT_ADDITIONAL_CMDS} ${CHECKOUT_RELATIVE_PATH}")
|
||||
message(STATUS "function(CHECKOUT_GIT_SUBMODULE) failed.")
|
||||
message(FATAL_ERROR "Command: \"${_CMD}\"")
|
||||
else()
|
||||
set(_TEST_FILE_EXISTS ON)
|
||||
endif()
|
||||
endif()
|
||||
|
||||
if(NOT _TEST_FILE_EXISTS AND _HAS_REPO_URL)
|
||||
message(
|
||||
STATUS "Checking out '${CHECKOUT_REPO_URL}' @ '${CHECKOUT_REPO_BRANCH}'...")
|
||||
|
||||
# remove the existing directory
|
||||
if(EXISTS "${_DIR}")
|
||||
execute_process(COMMAND ${CMAKE_COMMAND} -E remove_directory ${_DIR})
|
||||
endif()
|
||||
|
||||
# perform the checkout
|
||||
execute_process(
|
||||
COMMAND
|
||||
${GIT_EXECUTABLE} clone -b ${CHECKOUT_REPO_BRANCH}
|
||||
${CHECKOUT_ADDITIONAL_CMDS} ${CHECKOUT_REPO_URL} ${CHECKOUT_RELATIVE_PATH}
|
||||
WORKING_DIRECTORY ${CHECKOUT_WORKING_DIRECTORY}
|
||||
RESULT_VARIABLE RET)
|
||||
|
||||
# perform the submodule update
|
||||
if(CHECKOUT_RECURSIVE
|
||||
AND EXISTS "${_DIR}"
|
||||
AND IS_DIRECTORY "${_DIR}")
|
||||
execute_process(
|
||||
COMMAND ${GIT_EXECUTABLE} submodule update --init ${_RECURSE}
|
||||
WORKING_DIRECTORY ${_DIR}
|
||||
RESULT_VARIABLE RET)
|
||||
endif()
|
||||
|
||||
# check the return code
|
||||
if(RET GREATER 0)
|
||||
set(_CMD
|
||||
"${GIT_EXECUTABLE} clone -b ${CHECKOUT_REPO_BRANCH}
|
||||
${CHECKOUT_ADDITIONAL_CMDS} ${CHECKOUT_REPO_URL} ${CHECKOUT_RELATIVE_PATH}"
|
||||
)
|
||||
message(STATUS "function(CHECKOUT_GIT_SUBMODULE) failed.")
|
||||
message(FATAL_ERROR "Command: \"${_CMD}\"")
|
||||
else()
|
||||
set(_TEST_FILE_EXISTS ON)
|
||||
endif()
|
||||
endif()
|
||||
|
||||
if(NOT EXISTS "${_TEST_FILE}" OR NOT _TEST_FILE_EXISTS)
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"Error checking out submodule: '${CHECKOUT_RELATIVE_PATH}' to '${_DIR}'")
|
||||
endif()
|
||||
|
||||
endfunction()
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
# require variable
|
||||
#
|
||||
function(CHECK_REQUIRED VAR)
|
||||
if(NOT DEFINED ${VAR} OR "${${VAR}}" STREQUAL "")
|
||||
message(FATAL_ERROR "Variable '${VAR}' must be defined and not empty")
|
||||
endif()
|
||||
endfunction()
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# function add_feature(<NAME> <DOCSTRING>) Add a project feature, whose activation is
|
||||
# specified by the existence of the variable <NAME>, to the list of enabled/disabled
|
||||
# features, plus a docstring describing the feature
|
||||
#
|
||||
function(ADD_FEATURE _var _description)
|
||||
set(EXTRA_DESC "")
|
||||
foreach(currentArg ${ARGN})
|
||||
if(NOT "${currentArg}" STREQUAL "${_var}" AND NOT "${currentArg}" STREQUAL
|
||||
"${_description}")
|
||||
set(EXTRA_DESC "${EXTA_DESC}${currentArg}")
|
||||
endif()
|
||||
endforeach()
|
||||
|
||||
set_property(GLOBAL APPEND PROPERTY ${PROJECT_NAME}_FEATURES ${_var})
|
||||
set_property(GLOBAL PROPERTY ${_var}_DESCRIPTION "${_description}${EXTRA_DESC}")
|
||||
|
||||
if("CMAKE_DEFINE" IN_LIST ARGN)
|
||||
set_property(GLOBAL APPEND PROPERTY ${PROJECT_NAME}_CMAKE_DEFINES
|
||||
"${_var} @${_var}@")
|
||||
endif()
|
||||
endfunction()
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
# function add_option(<OPTION_NAME> <DOCSRING> <DEFAULT_SETTING> [NO_FEATURE]) Add an
|
||||
# option and add as a feature if NO_FEATURE is not provided
|
||||
#
|
||||
function(ADD_OPTION _NAME _MESSAGE _DEFAULT)
|
||||
option(${_NAME} "${_MESSAGE}" ${_DEFAULT})
|
||||
if("NO_FEATURE" IN_LIST ARGN)
|
||||
mark_as_advanced(${_NAME})
|
||||
else()
|
||||
add_feature(${_NAME} "${_MESSAGE}")
|
||||
endif()
|
||||
if("ADVANCED" IN_LIST ARGN)
|
||||
mark_as_advanced(${_NAME})
|
||||
endif()
|
||||
endfunction()
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
# function print_enabled_features() Print enabled features plus their docstrings.
|
||||
#
|
||||
function(PRINT_ENABLED_FEATURES)
|
||||
set(_basemsg "The following features are defined/enabled (+):")
|
||||
set(_currentFeatureText "${_basemsg}")
|
||||
get_property(_features GLOBAL PROPERTY ${PROJECT_NAME}_FEATURES)
|
||||
if(NOT "${_features}" STREQUAL "")
|
||||
list(REMOVE_DUPLICATES _features)
|
||||
list(SORT _features)
|
||||
endif()
|
||||
foreach(_feature ${_features})
|
||||
if(${_feature})
|
||||
# add feature to text
|
||||
set(_currentFeatureText "${_currentFeatureText}\n ${_feature}")
|
||||
# get description
|
||||
get_property(_desc GLOBAL PROPERTY ${_feature}_DESCRIPTION)
|
||||
# print description, if not standard ON/OFF, print what is set to
|
||||
if(_desc)
|
||||
if(NOT "${${_feature}}" STREQUAL "ON" AND NOT "${${_feature}}" STREQUAL
|
||||
"TRUE")
|
||||
set(_currentFeatureText
|
||||
"${_currentFeatureText}: ${_desc} -- [\"${${_feature}}\"]")
|
||||
else()
|
||||
string(REGEX REPLACE "^${PROJECT_NAME}_USE_" "" _feature_tmp
|
||||
"${_feature}")
|
||||
string(TOLOWER "${_feature_tmp}" _feature_tmp_l)
|
||||
capitalize("${_feature_tmp}" _feature_tmp_c)
|
||||
foreach(_var _feature _feature_tmp _feature_tmp_l _feature_tmp_c)
|
||||
set(_ver "${${${_var}}_VERSION}")
|
||||
if(NOT "${_ver}" STREQUAL "")
|
||||
set(_desc "${_desc} -- [found version ${_ver}]")
|
||||
break()
|
||||
endif()
|
||||
unset(_ver)
|
||||
endforeach()
|
||||
set(_currentFeatureText "${_currentFeatureText}: ${_desc}")
|
||||
endif()
|
||||
set(_desc NOTFOUND)
|
||||
endif()
|
||||
endif()
|
||||
endforeach()
|
||||
|
||||
if(NOT "${_currentFeatureText}" STREQUAL "${_basemsg}")
|
||||
message(STATUS "${_currentFeatureText}\n")
|
||||
endif()
|
||||
endfunction()
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
# function print_disabled_features() Print disabled features plus their docstrings.
|
||||
#
|
||||
function(PRINT_DISABLED_FEATURES)
|
||||
set(_basemsg "The following features are NOT defined/enabled (-):")
|
||||
set(_currentFeatureText "${_basemsg}")
|
||||
get_property(_features GLOBAL PROPERTY ${PROJECT_NAME}_FEATURES)
|
||||
if(NOT "${_features}" STREQUAL "")
|
||||
list(REMOVE_DUPLICATES _features)
|
||||
list(SORT _features)
|
||||
endif()
|
||||
foreach(_feature ${_features})
|
||||
if(NOT ${_feature})
|
||||
set(_currentFeatureText "${_currentFeatureText}\n ${_feature}")
|
||||
get_property(_desc GLOBAL PROPERTY ${_feature}_DESCRIPTION)
|
||||
if(_desc)
|
||||
set(_currentFeatureText "${_currentFeatureText}: ${_desc}")
|
||||
set(_desc NOTFOUND)
|
||||
endif(_desc)
|
||||
endif()
|
||||
endforeach(_feature)
|
||||
|
||||
if(NOT "${_currentFeatureText}" STREQUAL "${_basemsg}")
|
||||
message(STATUS "${_currentFeatureText}\n")
|
||||
endif()
|
||||
endfunction()
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
# function print_features() Print all features plus their docstrings.
|
||||
#
|
||||
function(PRINT_FEATURES)
|
||||
message(STATUS "")
|
||||
print_enabled_features()
|
||||
print_disabled_features()
|
||||
endfunction()
|
||||
|
||||
# ----------------------------------------------------------------------------------------#
|
||||
# macro ADD_SUBPROJECT() Does a git submodule update + add_subdirectory
|
||||
#
|
||||
macro(ADD_SUBPROJECT PACKAGE_NAME)
|
||||
# parse args
|
||||
cmake_parse_arguments(PACKAGE "SUBMODULE" "DIRECTORY" "" ${ARGN})
|
||||
if(NOT PACKAGE_DIRECTORY)
|
||||
set(PACKAGE_DIRECTORY ${PACKAGE_NAME})
|
||||
endif()
|
||||
# if specified in options
|
||||
if("${PACKAGE_NAME}" IN_LIST PROJECTS)
|
||||
if(PACKAGE_SUBMODULE)
|
||||
checkout_git_submodule(RECURSIVE RELATIVE_PATH ${PACKAGE_DIRECTORY})
|
||||
endif()
|
||||
if(NOT EXISTS "${PROJECT_SOURCE_DIR}/${PACKAGE_DIRECTORY}/CMakeLists.txt")
|
||||
message(
|
||||
STATUS
|
||||
"Warning! '${PROJECT_SOURCE_DIR}/${PACKAGE_DIRECTORY}/CMakeLists.txt' does not exist!"
|
||||
)
|
||||
else()
|
||||
add_subdirectory(${PACKAGE_DIRECTORY})
|
||||
endif()
|
||||
endif()
|
||||
endmacro()
|
||||
+28
@@ -0,0 +1,28 @@
|
||||
set(Kokkos_ENABLE_SERIAL
|
||||
ON
|
||||
CACHE BOOL "Enable Serial")
|
||||
set(Kokkos_ENABLE_OPENMP
|
||||
ON
|
||||
CACHE BOOL "Enable OpenMP")
|
||||
if(USE_CUDA)
|
||||
set(Kokkos_ENABLE_CUDA
|
||||
ON
|
||||
CACHE BOOL "Enable CUDA")
|
||||
set(Kokkos_ENABLE_CUDA_UVM
|
||||
ON
|
||||
CACHE BOOL "Enable CUDA UVM")
|
||||
set(Kokkos_ENABLE_CUDA_LAMBDA
|
||||
ON
|
||||
CACHE BOOL "Enable CUDA UVM")
|
||||
set(Kokkos_ENABLE_CUDA_CONSTEXPR
|
||||
ON
|
||||
CACHE BOOL "Enable CUDA UVM")
|
||||
endif()
|
||||
|
||||
checkout_git_submodule(
|
||||
RELATIVE_PATH external/kokkos WORKING_DIRECTORY ${PROJECT_SOURCE_DIR} REPO_URL
|
||||
https://github.com/kokkos/kokkos.git REPO_BRANCH develop)
|
||||
|
||||
set(CMAKE_SKIP_INSTALL_ALL_DEPENDENCY ON)
|
||||
|
||||
add_subdirectory(kokkos)
|
||||
+1
Submodule examples/lulesh/external/kokkos added at 56468253ef
@@ -0,0 +1,127 @@
|
||||
/*!
|
||||
******************************************************************************
|
||||
*
|
||||
* \file
|
||||
*
|
||||
* \brief RAJA header file for simple class that can be used to
|
||||
* time code sections.
|
||||
*
|
||||
* \author Rich Hornung, Center for Applied Scientific Computing, LLNL
|
||||
* \author Jeff Keasler, Applications, Simulations And Quality, LLNL
|
||||
*
|
||||
******************************************************************************
|
||||
*/
|
||||
|
||||
#ifndef RAJA_Timer_HXX
|
||||
#define RAJA_Timer_HXX
|
||||
|
||||
#if defined(RAJA_USE_CYCLE)
|
||||
# include "./cycle.h"
|
||||
typedef ticks TimeType;
|
||||
|
||||
#elif defined(RAJA_USE_CLOCK)
|
||||
# include <time.h>
|
||||
typedef clock_t TimeType;
|
||||
|
||||
#elif defined(RAJA_USE_GETTIME)
|
||||
# include <time.h>
|
||||
typedef timespec TimeType;
|
||||
|
||||
#else
|
||||
# error RAJA_TIMER_TYPE is undefined!
|
||||
|
||||
#endif
|
||||
|
||||
namespace RAJA
|
||||
{
|
||||
/*!
|
||||
******************************************************************************
|
||||
*
|
||||
* \brief Simple timer class to time code sections.
|
||||
*
|
||||
******************************************************************************
|
||||
*/
|
||||
class Timer
|
||||
{
|
||||
public:
|
||||
#if defined(RAJA_USE_CYCLE) || defined(RAJA_USE_CLOCK)
|
||||
Timer()
|
||||
: telapsed(0)
|
||||
{
|
||||
;
|
||||
}
|
||||
#endif
|
||||
#if defined(RAJA_USE_GETTIME)
|
||||
Timer()
|
||||
: telapsed(0)
|
||||
, stime_elapsed(0)
|
||||
, nstime_elapsed(0)
|
||||
{
|
||||
;
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(RAJA_USE_CYCLE)
|
||||
void start() { tstart = getticks(); }
|
||||
void stop()
|
||||
{
|
||||
tstop = getticks();
|
||||
set_elapsed();
|
||||
}
|
||||
|
||||
long double elapsed() { return static_cast<long double>(telapsed); }
|
||||
#endif
|
||||
|
||||
#if defined(RAJA_USE_CLOCK)
|
||||
void start() { tstart = clock(); }
|
||||
void stop()
|
||||
{
|
||||
tstop = clock();
|
||||
set_elapsed();
|
||||
}
|
||||
|
||||
long double elapsed() { return static_cast<long double>(telapsed) / CLOCKS_PER_SEC; }
|
||||
#endif
|
||||
|
||||
#if defined(RAJA_USE_GETTIME)
|
||||
|
||||
# if 0
|
||||
void start() { clock_gettime(CLOCK_REALTIME, &tstart); }
|
||||
void stop() { clock_gettime(CLOCK_REALTIME, &tstop); set_elapsed(); }
|
||||
# else
|
||||
void start() { clock_gettime(CLOCK_MONOTONIC, &tstart); }
|
||||
void stop()
|
||||
{
|
||||
clock_gettime(CLOCK_MONOTONIC, &tstop);
|
||||
set_elapsed();
|
||||
}
|
||||
# endif
|
||||
|
||||
long double elapsed() { return (stime_elapsed + nstime_elapsed); }
|
||||
|
||||
#endif
|
||||
|
||||
private:
|
||||
TimeType tstart;
|
||||
TimeType tstop;
|
||||
long double telapsed;
|
||||
|
||||
#if defined(RAJA_USE_CYCLE) || defined(RAJA_USE_CLOCK)
|
||||
void set_elapsed() { telapsed += (tstop - tstart); }
|
||||
|
||||
#elif defined(RAJA_USE_GETTIME)
|
||||
long double stime_elapsed;
|
||||
long double nstime_elapsed;
|
||||
|
||||
void set_elapsed()
|
||||
{
|
||||
stime_elapsed += static_cast<long double>(tstop.tv_sec - tstart.tv_sec);
|
||||
nstime_elapsed +=
|
||||
static_cast<long double>(tstop.tv_nsec - tstart.tv_nsec) / 1000000000.0;
|
||||
}
|
||||
#endif
|
||||
};
|
||||
|
||||
} // namespace RAJA
|
||||
|
||||
#endif // closing endif for header file include guard
|
||||
@@ -0,0 +1,545 @@
|
||||
/*
|
||||
* Copyright (c) 2003, 2007-8 Matteo Frigo
|
||||
* Copyright (c) 2003, 2007-8 Massachusetts Institute of Technology
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining
|
||||
* a copy of this software and associated documentation files (the
|
||||
* "Software"), to deal in the Software without restriction, including
|
||||
* without limitation the rights to use, copy, modify, merge, publish,
|
||||
* distribute, sublicense, and/or sell copies of the Software, and to
|
||||
* permit persons to whom the Software is furnished to do so, subject to
|
||||
* the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be
|
||||
* included in all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
||||
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
||||
* LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
||||
* OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
||||
* WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
*
|
||||
*/
|
||||
|
||||
/* machine-dependent cycle counters code. Needs to be inlined. */
|
||||
|
||||
/***************************************************************************/
|
||||
/* To use the cycle counters in your code, simply #include "cycle.h" (this
|
||||
file), and then use the functions/macros:
|
||||
|
||||
ticks getticks(void);
|
||||
|
||||
ticks is an opaque typedef defined below, representing the current time.
|
||||
You extract the elapsed time between two calls to gettick() via:
|
||||
|
||||
double elapsed(ticks t1, ticks t0);
|
||||
|
||||
which returns a double-precision variable in arbitrary units. You
|
||||
are not expected to convert this into human units like seconds; it
|
||||
is intended only for *comparisons* of time intervals.
|
||||
|
||||
(In order to use some of the OS-dependent timer routines like
|
||||
Solaris' gethrtime, you need to paste the autoconf snippet below
|
||||
into your configure.ac file and #include "config.h" before cycle.h,
|
||||
or define the relevant macros manually if you are not using autoconf.)
|
||||
*/
|
||||
|
||||
/***************************************************************************/
|
||||
/* This file uses macros like HAVE_GETHRTIME that are assumed to be
|
||||
defined according to whether the corresponding function/type/header
|
||||
is available on your system. The necessary macros are most
|
||||
conveniently defined if you are using GNU autoconf, via the tests:
|
||||
|
||||
dnl ---------------------------------------------------------------------
|
||||
|
||||
AC_C_INLINE
|
||||
AC_HEADER_TIME
|
||||
AC_CHECK_HEADERS([sys/time.h c_asm.h intrinsics.h mach/mach_time.h])
|
||||
|
||||
AC_CHECK_TYPE([hrtime_t],[AC_DEFINE(HAVE_HRTIME_T, 1, [Define to 1 if hrtime_t is
|
||||
defined in <sys/time.h>])],,[#if HAVE_SYS_TIME_H #include <sys/time.h> #endif])
|
||||
|
||||
AC_CHECK_FUNCS([gethrtime read_real_time time_base_to_time clock_gettime
|
||||
mach_absolute_time])
|
||||
|
||||
dnl Cray UNICOS _rtc() (real-time clock) intrinsic
|
||||
AC_MSG_CHECKING([for _rtc intrinsic])
|
||||
rtc_ok=yes
|
||||
AC_TRY_LINK([#ifdef HAVE_INTRINSICS_H
|
||||
#include <intrinsics.h>
|
||||
#endif], [_rtc()], [AC_DEFINE(HAVE__RTC,1,[Define if you have the UNICOS _rtc()
|
||||
intrinsic.])], [rtc_ok=no]) AC_MSG_RESULT($rtc_ok)
|
||||
|
||||
dnl ---------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/***************************************************************************/
|
||||
|
||||
#if TIME_WITH_SYS_TIME
|
||||
# include <sys/time.h>
|
||||
# include <time.h>
|
||||
#else
|
||||
# if HAVE_SYS_TIME_H
|
||||
# include <sys/time.h>
|
||||
# else
|
||||
# include <time.h>
|
||||
# endif
|
||||
#endif
|
||||
|
||||
#define INLINE_ELAPSED(INL) \
|
||||
static INL double elapsed(ticks t1, ticks t0) { return (double) t1 - (double) t0; }
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/* Solaris */
|
||||
#if defined(HAVE_GETHRTIME) && defined(HAVE_HRTIME_T) && !defined(HAVE_TICK_COUNTER)
|
||||
typedef hrtime_t ticks;
|
||||
|
||||
# define getticks gethrtime
|
||||
|
||||
INLINE_ELAPSED(inline)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/* AIX v. 4+ routines to read the real-time clock or time-base register */
|
||||
#if defined(HAVE_READ_REAL_TIME) && defined(HAVE_TIME_BASE_TO_TIME) && \
|
||||
!defined(HAVE_TICK_COUNTER)
|
||||
typedef timebasestruct_t ticks;
|
||||
|
||||
static __inline ticks
|
||||
getticks(void)
|
||||
{
|
||||
ticks t;
|
||||
read_real_time(&t, TIMEBASE_SZ);
|
||||
return t;
|
||||
}
|
||||
|
||||
static __inline double
|
||||
elapsed(ticks t1, ticks t0) /* time in nanoseconds */
|
||||
{
|
||||
time_base_to_time(&t1, TIMEBASE_SZ);
|
||||
time_base_to_time(&t0, TIMEBASE_SZ);
|
||||
return (((double) t1.tb_high - (double) t0.tb_high) * 1.0e9 +
|
||||
((double) t1.tb_low - (double) t0.tb_low));
|
||||
}
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/*
|
||||
* PowerPC ``cycle'' counter using the time base register.
|
||||
*/
|
||||
#if((((defined(__GNUC__) && (defined(__powerpc__) || defined(__ppc__))) || \
|
||||
(defined(__MWERKS__) && defined(macintosh)))) || \
|
||||
(defined(__IBM_GCC_ASM) && (defined(__powerpc__) || defined(__ppc__)))) && \
|
||||
!defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned long long ticks;
|
||||
|
||||
static __inline__ ticks
|
||||
getticks(void)
|
||||
{
|
||||
unsigned int tbl, tbu0, tbu1;
|
||||
|
||||
do
|
||||
{
|
||||
__asm__ __volatile__("mftbu %0" : "=r"(tbu0));
|
||||
__asm__ __volatile__("mftb %0" : "=r"(tbl));
|
||||
__asm__ __volatile__("mftbu %0" : "=r"(tbu1));
|
||||
} while(tbu0 != tbu1);
|
||||
|
||||
return (((unsigned long long) tbu0) << 32) | tbl;
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(__inline__)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/* MacOS/Mach (Darwin) time-base register interface (unlike UpTime,
|
||||
from Carbon, requires no additional libraries to be linked). */
|
||||
#if defined(HAVE_MACH_ABSOLUTE_TIME) && defined(HAVE_MACH_MACH_TIME_H) && \
|
||||
!defined(HAVE_TICK_COUNTER)
|
||||
# include <mach/mach_time.h>
|
||||
typedef uint64_t ticks;
|
||||
# define getticks mach_absolute_time
|
||||
INLINE_ELAPSED(__inline__)
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/*
|
||||
* Pentium cycle counter
|
||||
*/
|
||||
#if(defined(__GNUC__) || defined(__ICC)) && defined(__i386__) && \
|
||||
!defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned long long ticks;
|
||||
|
||||
static __inline__ ticks
|
||||
getticks(void)
|
||||
{
|
||||
ticks ret;
|
||||
|
||||
__asm__ __volatile__("rdtsc" : "=A"(ret));
|
||||
/* no input, nothing else clobbered */
|
||||
return ret;
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(__inline__)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
# define TIME_MIN 5000.0 /* unreliable pentium IV cycle counter */
|
||||
#endif
|
||||
|
||||
/* Visual C++ -- thanks to Morten Nissov for his help with this */
|
||||
#if _MSC_VER >= 1200 && _M_IX86 >= 500 && !defined(HAVE_TICK_COUNTER)
|
||||
# include <windows.h>
|
||||
typedef LARGE_INTEGER ticks;
|
||||
# define RDTSC __asm __emit 0fh __asm __emit 031h /* hack for VC++ 5.0 */
|
||||
|
||||
static __inline ticks
|
||||
getticks(void)
|
||||
{
|
||||
ticks retval;
|
||||
|
||||
__asm {
|
||||
RDTSC
|
||||
mov retval.HighPart, edx
|
||||
mov retval.LowPart, eax
|
||||
}
|
||||
return retval;
|
||||
}
|
||||
|
||||
static __inline double
|
||||
elapsed(ticks t1, ticks t0)
|
||||
{
|
||||
return (double) t1.QuadPart - (double) t0.QuadPart;
|
||||
}
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
# define TIME_MIN 5000.0 /* unreliable pentium IV cycle counter */
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/*
|
||||
* X86-64 cycle counter
|
||||
*/
|
||||
#if(defined(__GNUC__) || defined(__ICC) || defined(__SUNPRO_C)) && \
|
||||
defined(__x86_64__) && !defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned long long ticks;
|
||||
|
||||
static __inline__ ticks
|
||||
getticks(void)
|
||||
{
|
||||
unsigned a, d;
|
||||
__asm__ volatile("rdtsc" : "=a"(a), "=d"(d));
|
||||
return ((ticks) a) | (((ticks) d) << 32);
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(__inline__)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/* PGI compiler, courtesy Cristiano Calonaci, Andrea Tarsi, & Roberto Gori.
|
||||
NOTE: this code will fail to link unless you use the -Masmkeyword compiler
|
||||
option (grrr). */
|
||||
#if defined(__PGI) && defined(__x86_64__) && !defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned long long ticks;
|
||||
static ticks
|
||||
getticks(void)
|
||||
{
|
||||
asm(" rdtsc; shl $0x20,%rdx; mov %eax,%eax; or %rdx,%rax; ");
|
||||
}
|
||||
INLINE_ELAPSED(__inline__)
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/* Visual C++, courtesy of Dirk Michaelis */
|
||||
#if _MSC_VER >= 1400 && (defined(_M_AMD64) || defined(_M_X64)) && \
|
||||
!defined(HAVE_TICK_COUNTER)
|
||||
|
||||
# include <intrin.h>
|
||||
# pragma intrinsic(__rdtsc)
|
||||
typedef unsigned __int64 ticks;
|
||||
# define getticks __rdtsc
|
||||
INLINE_ELAPSED(__inline)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/*
|
||||
* IA64 cycle counter
|
||||
*/
|
||||
|
||||
/* intel's icc/ecc compiler */
|
||||
#if(defined(__EDG_VERSION) || defined(__ECC)) && defined(__ia64__) && \
|
||||
!defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned long ticks;
|
||||
# include <ia64intrin.h>
|
||||
|
||||
static __inline__ ticks
|
||||
getticks(void)
|
||||
{
|
||||
return __getReg(_IA64_REG_AR_ITC);
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(__inline__)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/* gcc */
|
||||
#if defined(__GNUC__) && defined(__ia64__) && !defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned long ticks;
|
||||
|
||||
static __inline__ ticks
|
||||
getticks(void)
|
||||
{
|
||||
ticks ret;
|
||||
|
||||
__asm__ __volatile__("mov %0=ar.itc" : "=r"(ret));
|
||||
return ret;
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(__inline__)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/* HP/UX IA64 compiler, courtesy Teresa L. Johnson: */
|
||||
#if defined(__hpux) && defined(__ia64) && !defined(HAVE_TICK_COUNTER)
|
||||
# include <machine/sys/inline.h>
|
||||
typedef unsigned long ticks;
|
||||
|
||||
static inline ticks
|
||||
getticks(void)
|
||||
{
|
||||
ticks ret;
|
||||
|
||||
ret = _Asm_mov_from_ar(_AREG_ITC);
|
||||
return ret;
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(inline)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/* Microsoft Visual C++ */
|
||||
#if defined(_MSC_VER) && defined(_M_IA64) && !defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned __int64 ticks;
|
||||
|
||||
# ifdef __cplusplus
|
||||
extern "C"
|
||||
# endif
|
||||
ticks
|
||||
__getReg(int whichReg);
|
||||
# pragma intrinsic(__getReg)
|
||||
|
||||
static __inline ticks
|
||||
getticks(void)
|
||||
{
|
||||
volatile ticks temp;
|
||||
temp = __getReg(3116);
|
||||
return temp;
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(inline)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/*
|
||||
* PA-RISC cycle counter
|
||||
*/
|
||||
#if defined(__hppa__) || defined(__hppa) && !defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned long ticks;
|
||||
|
||||
# ifdef __GNUC__
|
||||
static __inline__ ticks
|
||||
getticks(void)
|
||||
{
|
||||
ticks ret;
|
||||
|
||||
__asm__ __volatile__("mfctl 16, %0" : "=r"(ret));
|
||||
/* no input, nothing else clobbered */
|
||||
return ret;
|
||||
}
|
||||
# else
|
||||
# include <machine/inline.h>
|
||||
static inline unsigned long
|
||||
getticks(void)
|
||||
{
|
||||
register ticks ret;
|
||||
_MFCTL(16, ret);
|
||||
return ret;
|
||||
}
|
||||
# endif
|
||||
|
||||
INLINE_ELAPSED(inline)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/* S390, courtesy of James Treacy */
|
||||
#if defined(__GNUC__) && defined(__s390__) && !defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned long long ticks;
|
||||
|
||||
static __inline__ ticks
|
||||
getticks(void)
|
||||
{
|
||||
ticks cycles;
|
||||
__asm__("stck 0(%0)" : : "a"(&(cycles)) : "memory", "cc");
|
||||
return cycles;
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(__inline__)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
/*----------------------------------------------------------------*/
|
||||
#if defined(__GNUC__) && defined(__alpha__) && !defined(HAVE_TICK_COUNTER)
|
||||
/*
|
||||
* The 32-bit cycle counter on alpha overflows pretty quickly,
|
||||
* unfortunately. A 1GHz machine overflows in 4 seconds.
|
||||
*/
|
||||
typedef unsigned int ticks;
|
||||
|
||||
static __inline__ ticks
|
||||
getticks(void)
|
||||
{
|
||||
unsigned long cc;
|
||||
__asm__ __volatile__("rpcc %0" : "=r"(cc));
|
||||
return (cc & 0xFFFFFFFF);
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(__inline__)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
#if defined(__GNUC__) && defined(__sparc_v9__) && !defined(HAVE_TICK_COUNTER)
|
||||
typedef unsigned long ticks;
|
||||
|
||||
static __inline__ ticks
|
||||
getticks(void)
|
||||
{
|
||||
ticks ret;
|
||||
__asm__ __volatile__("rd %%tick, %0" : "=r"(ret));
|
||||
return ret;
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(__inline__)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
#if(defined(__DECC) || defined(__DECCXX)) && defined(__alpha) && \
|
||||
defined(HAVE_C_ASM_H) && !defined(HAVE_TICK_COUNTER)
|
||||
# include <c_asm.h>
|
||||
typedef unsigned int ticks;
|
||||
|
||||
static __inline ticks
|
||||
getticks(void)
|
||||
{
|
||||
unsigned long cc;
|
||||
cc = asm("rpcc %v0");
|
||||
return (cc & 0xFFFFFFFF);
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(__inline)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
/*----------------------------------------------------------------*/
|
||||
/* SGI/Irix */
|
||||
#if defined(HAVE_CLOCK_GETTIME) && defined(CLOCK_SGI_CYCLE) && !defined(HAVE_TICK_COUNTER)
|
||||
typedef struct timespec ticks;
|
||||
|
||||
static inline ticks
|
||||
getticks(void)
|
||||
{
|
||||
struct timespec t;
|
||||
clock_gettime(CLOCK_SGI_CYCLE, &t);
|
||||
return t;
|
||||
}
|
||||
|
||||
static inline double
|
||||
elapsed(ticks t1, ticks t0)
|
||||
{
|
||||
return ((double) t1.tv_sec - (double) t0.tv_sec) * 1.0E9 +
|
||||
((double) t1.tv_nsec - (double) t0.tv_nsec);
|
||||
}
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/* Cray UNICOS _rtc() intrinsic function */
|
||||
#if defined(HAVE__RTC) && !defined(HAVE_TICK_COUNTER)
|
||||
# ifdef HAVE_INTRINSICS_H
|
||||
# include <intrinsics.h>
|
||||
# endif
|
||||
|
||||
typedef long long ticks;
|
||||
|
||||
# define getticks _rtc
|
||||
|
||||
INLINE_ELAPSED(inline)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
#endif
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
/* MIPS ZBus */
|
||||
#if HAVE_MIPS_ZBUS_TIMER
|
||||
# if defined(__mips__) && !defined(HAVE_TICK_COUNTER)
|
||||
# include <fcntl.h>
|
||||
# include <sys/mman.h>
|
||||
# include <unistd.h>
|
||||
|
||||
typedef uint64_t ticks;
|
||||
|
||||
static inline ticks
|
||||
getticks(void)
|
||||
{
|
||||
static uint64_t* addr = 0;
|
||||
|
||||
if(addr == 0)
|
||||
{
|
||||
uint32_t rq_addr = 0x10030000;
|
||||
int fd;
|
||||
int pgsize;
|
||||
|
||||
pgsize = getpagesize();
|
||||
fd = open("/dev/mem", O_RDONLY | O_SYNC, 0);
|
||||
if(fd < 0)
|
||||
{
|
||||
perror("open");
|
||||
return NULL;
|
||||
}
|
||||
addr = mmap(0, pgsize, PROT_READ, MAP_SHARED, fd, rq_addr);
|
||||
close(fd);
|
||||
if(addr == (uint64_t*) -1)
|
||||
{
|
||||
perror("mmap");
|
||||
return NULL;
|
||||
}
|
||||
}
|
||||
|
||||
return *addr;
|
||||
}
|
||||
|
||||
INLINE_ELAPSED(inline)
|
||||
|
||||
# define HAVE_TICK_COUNTER
|
||||
# endif
|
||||
#endif /* HAVE_MIPS_ZBUS_TIMER */
|
||||
Plik diff jest za duży
Load Diff
@@ -0,0 +1,886 @@
|
||||
#include <math.h>
|
||||
#if USE_MPI
|
||||
# include <mpi.h>
|
||||
#endif
|
||||
#if _OPENMP
|
||||
# include <omp.h>
|
||||
#endif
|
||||
#include "lulesh.h"
|
||||
#include <cstdlib>
|
||||
#include <limits.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
static KOKKOS_INLINE_FUNCTION Real_t
|
||||
CalcElemVolume(const Real_t x0, const Real_t x1, const Real_t x2, const Real_t x3,
|
||||
const Real_t x4, const Real_t x5, const Real_t x6, const Real_t x7,
|
||||
const Real_t y0, const Real_t y1, const Real_t y2, const Real_t y3,
|
||||
const Real_t y4, const Real_t y5, const Real_t y6, const Real_t y7,
|
||||
const Real_t z0, const Real_t z1, const Real_t z2, const Real_t z3,
|
||||
const Real_t z4, const Real_t z5, const Real_t z6, const Real_t z7)
|
||||
{
|
||||
Real_t twelveth = Real_t(1.0) / Real_t(12.0);
|
||||
|
||||
Real_t dx61 = x6 - x1;
|
||||
Real_t dy61 = y6 - y1;
|
||||
Real_t dz61 = z6 - z1;
|
||||
|
||||
Real_t dx70 = x7 - x0;
|
||||
Real_t dy70 = y7 - y0;
|
||||
Real_t dz70 = z7 - z0;
|
||||
|
||||
Real_t dx63 = x6 - x3;
|
||||
Real_t dy63 = y6 - y3;
|
||||
Real_t dz63 = z6 - z3;
|
||||
|
||||
Real_t dx20 = x2 - x0;
|
||||
Real_t dy20 = y2 - y0;
|
||||
Real_t dz20 = z2 - z0;
|
||||
|
||||
Real_t dx50 = x5 - x0;
|
||||
Real_t dy50 = y5 - y0;
|
||||
Real_t dz50 = z5 - z0;
|
||||
|
||||
Real_t dx64 = x6 - x4;
|
||||
Real_t dy64 = y6 - y4;
|
||||
Real_t dz64 = z6 - z4;
|
||||
|
||||
Real_t dx31 = x3 - x1;
|
||||
Real_t dy31 = y3 - y1;
|
||||
Real_t dz31 = z3 - z1;
|
||||
|
||||
Real_t dx72 = x7 - x2;
|
||||
Real_t dy72 = y7 - y2;
|
||||
Real_t dz72 = z7 - z2;
|
||||
|
||||
Real_t dx43 = x4 - x3;
|
||||
Real_t dy43 = y4 - y3;
|
||||
Real_t dz43 = z4 - z3;
|
||||
|
||||
Real_t dx57 = x5 - x7;
|
||||
Real_t dy57 = y5 - y7;
|
||||
Real_t dz57 = z5 - z7;
|
||||
|
||||
Real_t dx14 = x1 - x4;
|
||||
Real_t dy14 = y1 - y4;
|
||||
Real_t dz14 = z1 - z4;
|
||||
|
||||
Real_t dx25 = x2 - x5;
|
||||
Real_t dy25 = y2 - y5;
|
||||
Real_t dz25 = z2 - z5;
|
||||
|
||||
#define TRIPLE_PRODUCT(x1, y1, z1, x2, y2, z2, x3, y3, z3) \
|
||||
((x1) * ((y2) * (z3) - (z2) * (y3)) + (x2) * ((z1) * (y3) - (y1) * (z3)) + \
|
||||
(x3) * ((y1) * (z2) - (z1) * (y2)))
|
||||
|
||||
Real_t volume = TRIPLE_PRODUCT(dx31 + dx72, dx63, dx20, dy31 + dy72, dy63, dy20,
|
||||
dz31 + dz72, dz63, dz20) +
|
||||
TRIPLE_PRODUCT(dx43 + dx57, dx64, dx70, dy43 + dy57, dy64, dy70,
|
||||
dz43 + dz57, dz64, dz70) +
|
||||
TRIPLE_PRODUCT(dx14 + dx25, dx61, dx50, dy14 + dy25, dy61, dy50,
|
||||
dz14 + dz25, dz61, dz50);
|
||||
|
||||
#undef TRIPLE_PRODUCT
|
||||
|
||||
volume *= twelveth;
|
||||
|
||||
return volume;
|
||||
}
|
||||
|
||||
/******************************************/
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Real_t
|
||||
CalcElemVolume(const Real_t x[8], const Real_t y[8], const Real_t z[8])
|
||||
{
|
||||
return CalcElemVolume(x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], y[0], y[1],
|
||||
y[2], y[3], y[4], y[5], y[6], y[7], z[0], z[1], z[2], z[3],
|
||||
z[4], z[5], z[6], z[7]);
|
||||
}
|
||||
|
||||
/////////////////////////////////////////////////////////////////////
|
||||
Domain::Domain(Int_t numRanks, Index_t colLoc, Index_t rowLoc, Index_t planeLoc,
|
||||
Index_t nx, int tp, int nr, int balance, Int_t cost)
|
||||
: m_e_cut(Real_t(1.0e-7))
|
||||
, m_p_cut(Real_t(1.0e-7))
|
||||
, m_q_cut(Real_t(1.0e-7))
|
||||
, m_v_cut(Real_t(1.0e-10))
|
||||
, m_u_cut(Real_t(1.0e-7))
|
||||
, m_hgcoef(Real_t(3.0))
|
||||
, m_ss4o3(Real_t(4.0) / Real_t(3.0))
|
||||
, m_qstop(Real_t(1.0e+12))
|
||||
, m_monoq_max_slope(Real_t(1.0))
|
||||
, m_monoq_limiter_mult(Real_t(2.0))
|
||||
, m_qlc_monoq(Real_t(0.5))
|
||||
, m_qqc_monoq(Real_t(2.0) / Real_t(3.0))
|
||||
, m_qqc(Real_t(2.0))
|
||||
, m_eosvmax(Real_t(1.0e+9))
|
||||
, m_eosvmin(Real_t(1.0e-9))
|
||||
, m_pmin(Real_t(0.))
|
||||
, m_emin(Real_t(-1.0e+15))
|
||||
, m_dvovmax(Real_t(0.1))
|
||||
, m_refdens(Real_t(1.0))
|
||||
,
|
||||
//
|
||||
// set pointers to (potentially) "new'd" arrays to null to
|
||||
// simplify deallocation.
|
||||
//
|
||||
m_regNumList(0)
|
||||
, m_nodeElemStart(0)
|
||||
, m_nodeElemCornerList(0)
|
||||
, m_regElemSize(0)
|
||||
, m_regElemlist(0)
|
||||
#if USE_MPI
|
||||
, commDataSend(0)
|
||||
, commDataRecv(0)
|
||||
#endif
|
||||
{
|
||||
Index_t edgeElems = nx;
|
||||
Index_t edgeNodes = edgeElems + 1;
|
||||
this->cost() = cost;
|
||||
|
||||
m_tp = tp;
|
||||
m_numRanks = numRanks;
|
||||
|
||||
///////////////////////////////
|
||||
// Initialize Sedov Mesh
|
||||
///////////////////////////////
|
||||
|
||||
// construct a uniform box for this processor
|
||||
|
||||
m_colLoc = colLoc;
|
||||
m_rowLoc = rowLoc;
|
||||
m_planeLoc = planeLoc;
|
||||
|
||||
m_sizeX = edgeElems;
|
||||
m_sizeY = edgeElems;
|
||||
m_sizeZ = edgeElems;
|
||||
m_numElem = edgeElems * edgeElems * edgeElems;
|
||||
|
||||
m_numNode = edgeNodes * edgeNodes * edgeNodes;
|
||||
|
||||
m_regNumList = Allocate<Index_t>(numElem()); // material indexset
|
||||
|
||||
// Elem-centered
|
||||
AllocateElemPersistent(numElem());
|
||||
|
||||
// Node-centered
|
||||
AllocateNodePersistent(numNode());
|
||||
|
||||
SetupCommBuffers(edgeNodes);
|
||||
|
||||
// Basic Field Initialization
|
||||
for(Index_t i = 0; i < numElem(); ++i)
|
||||
{
|
||||
e(i) = Real_t(0.0);
|
||||
p(i) = Real_t(0.0);
|
||||
q(i) = Real_t(0.0);
|
||||
ss(i) = Real_t(0.0);
|
||||
}
|
||||
|
||||
// Note - v initializes to 1.0, not 0.0!
|
||||
for(Index_t i = 0; i < numElem(); ++i)
|
||||
{
|
||||
v(i) = Real_t(1.0);
|
||||
}
|
||||
|
||||
for(Index_t i = 0; i < numNode(); ++i)
|
||||
{
|
||||
xd(i) = Real_t(0.0);
|
||||
yd(i) = Real_t(0.0);
|
||||
zd(i) = Real_t(0.0);
|
||||
}
|
||||
|
||||
for(Index_t i = 0; i < numNode(); ++i)
|
||||
{
|
||||
xdd(i) = Real_t(0.0);
|
||||
ydd(i) = Real_t(0.0);
|
||||
zdd(i) = Real_t(0.0);
|
||||
}
|
||||
|
||||
for(Index_t i = 0; i < numNode(); ++i)
|
||||
{
|
||||
nodalMass(i) = Real_t(0.0);
|
||||
}
|
||||
|
||||
BuildMesh(nx, edgeNodes, edgeElems);
|
||||
|
||||
#if _OPENMP
|
||||
SetupThreadSupportStructures();
|
||||
#else
|
||||
// These arrays are not used if we're not threaded
|
||||
m_nodeElemStart = NULL;
|
||||
m_nodeElemCornerList = NULL;
|
||||
#endif
|
||||
|
||||
// Setup region index sets. For now, these are constant sized
|
||||
// throughout the run, but could be changed every cycle to
|
||||
// simulate effects of ALE on the lagrange solver
|
||||
CreateRegionIndexSets(nr, balance);
|
||||
|
||||
// Setup symmetry nodesets
|
||||
SetupSymmetryPlanes(edgeNodes);
|
||||
|
||||
// Setup element connectivities
|
||||
SetupElementConnectivities(edgeElems);
|
||||
|
||||
// Setup symmetry planes and free surface boundary arrays
|
||||
SetupBoundaryConditions(edgeElems);
|
||||
|
||||
// Setup defaults
|
||||
|
||||
// These can be changed (requires recompile) if you want to run
|
||||
// with a fixed timestep, or to a different end time, but it's
|
||||
// probably easier/better to just run a fixed number of timesteps
|
||||
// using the -i flag in 2.x
|
||||
|
||||
dtfixed() = Real_t(-1.0e-6); // Negative means use courant condition
|
||||
stoptime() = Real_t(1.0e-2); // *Real_t(edgeElems*tp/45.0) ;
|
||||
|
||||
// Initial conditions
|
||||
deltatimemultlb() = Real_t(1.1);
|
||||
deltatimemultub() = Real_t(1.2);
|
||||
dtcourant() = Real_t(1.0e+20);
|
||||
dthydro() = Real_t(1.0e+20);
|
||||
dtmax() = Real_t(1.0e-2);
|
||||
time() = Real_t(0.);
|
||||
cycle() = Int_t(0);
|
||||
|
||||
// initialize field data
|
||||
for(Index_t i = 0; i < numElem(); ++i)
|
||||
{
|
||||
Real_t x_local[8], y_local[8], z_local[8];
|
||||
Index_t* elemToNode = nodelist(i);
|
||||
for(Index_t lnode = 0; lnode < 8; ++lnode)
|
||||
{
|
||||
Index_t gnode = elemToNode[lnode];
|
||||
x_local[lnode] = x(gnode);
|
||||
y_local[lnode] = y(gnode);
|
||||
z_local[lnode] = z(gnode);
|
||||
}
|
||||
|
||||
// volume calculations
|
||||
Real_t volume = CalcElemVolume(x_local, y_local, z_local);
|
||||
volo(i) = volume;
|
||||
elemMass(i) = volume;
|
||||
for(Index_t j = 0; j < 8; ++j)
|
||||
{
|
||||
Index_t idx = elemToNode[j];
|
||||
nodalMass(idx) += volume / Real_t(8.0);
|
||||
}
|
||||
}
|
||||
|
||||
// deposit initial energy
|
||||
// An energy of 3.948746e+7 is correct for a problem with
|
||||
// 45 zones along a side - we need to scale it
|
||||
const Real_t ebase = Real_t(3.948746e+7);
|
||||
Real_t scale = (nx * m_tp) / Real_t(45.0);
|
||||
Real_t einit = ebase * scale * scale * scale;
|
||||
if(m_rowLoc + m_colLoc + m_planeLoc == 0)
|
||||
{
|
||||
// Dump into the first zone (which we know is in the corner)
|
||||
// of the domain that sits at the origin
|
||||
e(0) = einit;
|
||||
}
|
||||
// set initial deltatime base on analytic CFL calculation
|
||||
deltatime() = (Real_t(.5) * cbrt(volo(0))) / sqrt(Real_t(2.0) * einit);
|
||||
|
||||
} // End constructor
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
Domain::~Domain()
|
||||
{
|
||||
/* Release(&m_regNumList);
|
||||
Release(&m_nodeElemStart);
|
||||
Release(&m_nodeElemCornerList);
|
||||
Release(&m_regElemSize);
|
||||
for (Index_t i=0 ; i<numReg() ; ++i) {
|
||||
Release(&m_regElemlist[i]);
|
||||
}
|
||||
Release(&m_regElemlist);
|
||||
|
||||
#if USE_MPI
|
||||
Release(&commDataSend);
|
||||
Release(&commDataRecv);
|
||||
#endif
|
||||
*/
|
||||
} // End destructor
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
void
|
||||
Domain::BuildMesh(Int_t nx, Int_t edgeNodes, Int_t edgeElems)
|
||||
{
|
||||
Index_t meshEdgeElems = m_tp * nx;
|
||||
|
||||
// initialize nodal coordinates
|
||||
Index_t nidx = 0;
|
||||
Real_t tz = Real_t(1.125) * Real_t(m_planeLoc * nx) / Real_t(meshEdgeElems);
|
||||
for(Index_t plane = 0; plane < edgeNodes; ++plane)
|
||||
{
|
||||
Real_t ty = Real_t(1.125) * Real_t(m_rowLoc * nx) / Real_t(meshEdgeElems);
|
||||
for(Index_t row = 0; row < edgeNodes; ++row)
|
||||
{
|
||||
Real_t tx = Real_t(1.125) * Real_t(m_colLoc * nx) / Real_t(meshEdgeElems);
|
||||
for(Index_t col = 0; col < edgeNodes; ++col)
|
||||
{
|
||||
x(nidx) = tx;
|
||||
y(nidx) = ty;
|
||||
z(nidx) = tz;
|
||||
++nidx;
|
||||
// tx += ds ; // may accumulate roundoff...
|
||||
tx = Real_t(1.125) * Real_t(m_colLoc * nx + col + 1) /
|
||||
Real_t(meshEdgeElems);
|
||||
}
|
||||
// ty += ds ; // may accumulate roundoff...
|
||||
ty = Real_t(1.125) * Real_t(m_rowLoc * nx + row + 1) / Real_t(meshEdgeElems);
|
||||
}
|
||||
// tz += ds ; // may accumulate roundoff...
|
||||
tz = Real_t(1.125) * Real_t(m_planeLoc * nx + plane + 1) / Real_t(meshEdgeElems);
|
||||
}
|
||||
|
||||
// embed hexehedral elements in nodal point lattice
|
||||
Index_t zidx = 0;
|
||||
nidx = 0;
|
||||
for(Index_t plane = 0; plane < edgeElems; ++plane)
|
||||
{
|
||||
for(Index_t row = 0; row < edgeElems; ++row)
|
||||
{
|
||||
for(Index_t col = 0; col < edgeElems; ++col)
|
||||
{
|
||||
Index_t* localNode = nodelist(zidx);
|
||||
localNode[0] = nidx;
|
||||
localNode[1] = nidx + 1;
|
||||
localNode[2] = nidx + edgeNodes + 1;
|
||||
localNode[3] = nidx + edgeNodes;
|
||||
localNode[4] = nidx + edgeNodes * edgeNodes;
|
||||
localNode[5] = nidx + edgeNodes * edgeNodes + 1;
|
||||
localNode[6] = nidx + edgeNodes * edgeNodes + edgeNodes + 1;
|
||||
localNode[7] = nidx + edgeNodes * edgeNodes + edgeNodes;
|
||||
++zidx;
|
||||
++nidx;
|
||||
}
|
||||
++nidx;
|
||||
}
|
||||
nidx += edgeNodes;
|
||||
}
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
void
|
||||
Domain::SetupThreadSupportStructures()
|
||||
{
|
||||
// set up node-centered indexing of elements
|
||||
Index_t* nodeElemCount = Allocate<Index_t>(numNode());
|
||||
|
||||
for(Index_t i = 0; i < numNode(); ++i)
|
||||
{
|
||||
nodeElemCount[i] = 0;
|
||||
}
|
||||
|
||||
for(Index_t i = 0; i < numElem(); ++i)
|
||||
{
|
||||
Index_t* nl = nodelist(i);
|
||||
for(Index_t j = 0; j < 8; ++j)
|
||||
{
|
||||
++(nodeElemCount[nl[j]]);
|
||||
}
|
||||
}
|
||||
|
||||
m_nodeElemStart = Allocate<Index_t>(numNode() + 1);
|
||||
|
||||
m_nodeElemStart[0] = 0;
|
||||
|
||||
for(Index_t i = 1; i <= numNode(); ++i)
|
||||
{
|
||||
m_nodeElemStart[i] = m_nodeElemStart[i - 1] + nodeElemCount[i - 1];
|
||||
}
|
||||
|
||||
m_nodeElemCornerList = Allocate<Index_t>(m_nodeElemStart[numNode()]);
|
||||
|
||||
for(Index_t i = 0; i < numNode(); ++i)
|
||||
{
|
||||
nodeElemCount[i] = 0;
|
||||
}
|
||||
|
||||
for(Index_t i = 0; i < numElem(); ++i)
|
||||
{
|
||||
Index_t* nl = nodelist(i);
|
||||
for(Index_t j = 0; j < 8; ++j)
|
||||
{
|
||||
Index_t m = nl[j];
|
||||
Index_t k = i * 8 + j;
|
||||
Index_t offset = m_nodeElemStart[m] + nodeElemCount[m];
|
||||
m_nodeElemCornerList[offset] = k;
|
||||
++(nodeElemCount[m]);
|
||||
}
|
||||
}
|
||||
|
||||
Index_t clSize = m_nodeElemStart[numNode()];
|
||||
for(Index_t i = 0; i < clSize; ++i)
|
||||
{
|
||||
Index_t clv = m_nodeElemCornerList[i];
|
||||
if((clv < 0) || (clv > numElem() * 8))
|
||||
{
|
||||
fprintf(
|
||||
stderr,
|
||||
"AllocateNodeElemIndexes(): nodeElemCornerList entry out of range!\n");
|
||||
#if USE_MPI
|
||||
MPI_Abort(MPI_COMM_WORLD, -1);
|
||||
#else
|
||||
exit(-1);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
Release<Index_t>(&nodeElemCount);
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
void
|
||||
Domain::SetupCommBuffers(Int_t edgeNodes)
|
||||
{
|
||||
// allocate a buffer large enough for nodal ghost data
|
||||
Index_t maxEdgeSize = MAX(this->sizeX(), MAX(this->sizeY(), this->sizeZ())) + 1;
|
||||
m_maxPlaneSize = CACHE_ALIGN_REAL(maxEdgeSize * maxEdgeSize);
|
||||
m_maxEdgeSize = CACHE_ALIGN_REAL(maxEdgeSize);
|
||||
|
||||
// assume communication to 6 neighbors by default
|
||||
m_rowMin = (m_rowLoc == 0) ? 0 : 1;
|
||||
m_rowMax = (m_rowLoc == m_tp - 1) ? 0 : 1;
|
||||
m_colMin = (m_colLoc == 0) ? 0 : 1;
|
||||
m_colMax = (m_colLoc == m_tp - 1) ? 0 : 1;
|
||||
m_planeMin = (m_planeLoc == 0) ? 0 : 1;
|
||||
m_planeMax = (m_planeLoc == m_tp - 1) ? 0 : 1;
|
||||
|
||||
#if USE_MPI
|
||||
// account for face communication
|
||||
Index_t comBufSize =
|
||||
(m_rowMin + m_rowMax + m_colMin + m_colMax + m_planeMin + m_planeMax) *
|
||||
m_maxPlaneSize * MAX_FIELDS_PER_MPI_COMM;
|
||||
|
||||
// account for edge communication
|
||||
comBufSize +=
|
||||
((m_rowMin & m_colMin) + (m_rowMin & m_planeMin) + (m_colMin & m_planeMin) +
|
||||
(m_rowMax & m_colMax) + (m_rowMax & m_planeMax) + (m_colMax & m_planeMax) +
|
||||
(m_rowMax & m_colMin) + (m_rowMin & m_planeMax) + (m_colMin & m_planeMax) +
|
||||
(m_rowMin & m_colMax) + (m_rowMax & m_planeMin) + (m_colMax & m_planeMin)) *
|
||||
m_maxEdgeSize * MAX_FIELDS_PER_MPI_COMM;
|
||||
|
||||
// account for corner communication
|
||||
// factor of 16 is so each buffer has its own cache line
|
||||
comBufSize +=
|
||||
((m_rowMin & m_colMin & m_planeMin) + (m_rowMin & m_colMin & m_planeMax) +
|
||||
(m_rowMin & m_colMax & m_planeMin) + (m_rowMin & m_colMax & m_planeMax) +
|
||||
(m_rowMax & m_colMin & m_planeMin) + (m_rowMax & m_colMin & m_planeMax) +
|
||||
(m_rowMax & m_colMax & m_planeMin) + (m_rowMax & m_colMax & m_planeMax)) *
|
||||
CACHE_COHERENCE_PAD_REAL;
|
||||
|
||||
this->commDataSend = Allocate<Real_t>(comBufSize);
|
||||
this->commDataRecv = Allocate<Real_t>(comBufSize);
|
||||
// prevent floating point exceptions
|
||||
memset(this->commDataSend, 0, comBufSize * sizeof(Real_t));
|
||||
memset(this->commDataRecv, 0, comBufSize * sizeof(Real_t));
|
||||
#endif
|
||||
|
||||
// Boundary nodesets
|
||||
if(m_colLoc == 0)
|
||||
m_symmX.resize(edgeNodes * edgeNodes);
|
||||
if(m_rowLoc == 0)
|
||||
m_symmY.resize(edgeNodes * edgeNodes);
|
||||
if(m_planeLoc == 0)
|
||||
m_symmZ.resize(edgeNodes * edgeNodes);
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
void
|
||||
Domain::CreateRegionIndexSets(Int_t nr, Int_t balance)
|
||||
{
|
||||
#if USE_MPI
|
||||
Index_t myRank;
|
||||
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
|
||||
srand(myRank);
|
||||
#else
|
||||
srand(0);
|
||||
Index_t myRank = 0;
|
||||
#endif
|
||||
this->numReg() = nr;
|
||||
m_regElemSize = Allocate<Index_t>(numReg());
|
||||
m_regElemlist = Allocate<Index_t*>(numReg());
|
||||
Index_t nextIndex = 0;
|
||||
// if we only have one region just fill it
|
||||
// Fill out the regNumList with material numbers, which are always
|
||||
// the region index plus one
|
||||
if(numReg() == 1)
|
||||
{
|
||||
while(nextIndex < numElem())
|
||||
{
|
||||
this->regNumList(nextIndex) = 1;
|
||||
nextIndex++;
|
||||
}
|
||||
regElemSize(0) = 0;
|
||||
}
|
||||
// If we have more than one region distribute the elements.
|
||||
else
|
||||
{
|
||||
Int_t regionNum;
|
||||
Int_t regionVar;
|
||||
Int_t lastReg = -1;
|
||||
Int_t binSize;
|
||||
Index_t elements;
|
||||
Index_t runto = 0;
|
||||
Int_t costDenominator = 0;
|
||||
Int_t* regBinEnd = Allocate<Int_t>(numReg());
|
||||
// Determine the relative weights of all the regions. This is based off the -b
|
||||
// flag. Balance is the value passed into b.
|
||||
for(Index_t i = 0; i < numReg(); ++i)
|
||||
{
|
||||
regElemSize(i) = 0;
|
||||
costDenominator += pow((i + 1), balance); // Total sum of all regions weights
|
||||
regBinEnd[i] =
|
||||
costDenominator; // Chance of hitting a given region is (regBinEnd[i] -
|
||||
// regBinEdn[i-1])/costDenominator
|
||||
}
|
||||
// Until all elements are assigned
|
||||
while(nextIndex < numElem())
|
||||
{
|
||||
// pick the region
|
||||
regionVar = rand() % costDenominator;
|
||||
Index_t i = 0;
|
||||
while(regionVar >= regBinEnd[i])
|
||||
i++;
|
||||
// rotate the regions based on MPI rank. Rotation is Rank % NumRegions this
|
||||
// makes each domain have a different region with the highest representation
|
||||
regionNum = ((i + myRank) % numReg()) + 1;
|
||||
// make sure we don't pick the same region twice in a row
|
||||
while(regionNum == lastReg)
|
||||
{
|
||||
regionVar = rand() % costDenominator;
|
||||
i = 0;
|
||||
while(regionVar >= regBinEnd[i])
|
||||
i++;
|
||||
regionNum = ((i + myRank) % numReg()) + 1;
|
||||
}
|
||||
// Pick the bin size of the region and determine the number of elements.
|
||||
binSize = rand() % 1000;
|
||||
if(binSize < 773)
|
||||
{
|
||||
elements = rand() % 15 + 1;
|
||||
}
|
||||
else if(binSize < 937)
|
||||
{
|
||||
elements = rand() % 16 + 16;
|
||||
}
|
||||
else if(binSize < 970)
|
||||
{
|
||||
elements = rand() % 32 + 32;
|
||||
}
|
||||
else if(binSize < 974)
|
||||
{
|
||||
elements = rand() % 64 + 64;
|
||||
}
|
||||
else if(binSize < 978)
|
||||
{
|
||||
elements = rand() % 128 + 128;
|
||||
}
|
||||
else if(binSize < 981)
|
||||
{
|
||||
elements = rand() % 256 + 256;
|
||||
}
|
||||
else
|
||||
elements = rand() % 1537 + 512;
|
||||
runto = elements + nextIndex;
|
||||
// Store the elements. If we hit the end before we run out of elements then
|
||||
// just stop.
|
||||
while(nextIndex < runto && nextIndex < numElem())
|
||||
{
|
||||
this->regNumList(nextIndex) = regionNum;
|
||||
nextIndex++;
|
||||
}
|
||||
lastReg = regionNum;
|
||||
}
|
||||
}
|
||||
// Convert regNumList to region index sets
|
||||
// First, count size of each region
|
||||
for(Index_t i = 0; i < numElem(); ++i)
|
||||
{
|
||||
int r = this->regNumList(i) - 1; // region index == regnum-1
|
||||
regElemSize(r)++;
|
||||
}
|
||||
// Second, allocate each region index set
|
||||
for(Index_t i = 0; i < numReg(); ++i)
|
||||
{
|
||||
m_regElemlist[i] = Allocate<Int_t>(regElemSize(i));
|
||||
regElemSize(i) = 0;
|
||||
}
|
||||
// Third, fill index sets
|
||||
for(Index_t i = 0; i < numElem(); ++i)
|
||||
{
|
||||
Index_t r = regNumList(i) - 1; // region index == regnum-1
|
||||
Index_t regndx = regElemSize(r)++; // Note increment
|
||||
regElemlist(r, regndx) = i;
|
||||
}
|
||||
}
|
||||
|
||||
/////////////////////////////////////////////////////////////
|
||||
void
|
||||
Domain::SetupSymmetryPlanes(Int_t edgeNodes)
|
||||
{
|
||||
Index_t nidx = 0;
|
||||
for(Index_t i = 0; i < edgeNodes; ++i)
|
||||
{
|
||||
Index_t planeInc = i * edgeNodes * edgeNodes;
|
||||
Index_t rowInc = i * edgeNodes;
|
||||
for(Index_t j = 0; j < edgeNodes; ++j)
|
||||
{
|
||||
if(m_planeLoc == 0)
|
||||
{
|
||||
m_symmZ[nidx] = rowInc + j;
|
||||
}
|
||||
if(m_rowLoc == 0)
|
||||
{
|
||||
m_symmY[nidx] = planeInc + j;
|
||||
}
|
||||
if(m_colLoc == 0)
|
||||
{
|
||||
m_symmX[nidx] = planeInc + j * edgeNodes;
|
||||
}
|
||||
++nidx;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/////////////////////////////////////////////////////////////
|
||||
void
|
||||
Domain::SetupElementConnectivities(Int_t edgeElems)
|
||||
{
|
||||
lxim(0) = 0;
|
||||
for(Index_t i = 1; i < numElem(); ++i)
|
||||
{
|
||||
lxim(i) = i - 1;
|
||||
lxip(i - 1) = i;
|
||||
}
|
||||
lxip(numElem() - 1) = numElem() - 1;
|
||||
|
||||
for(Index_t i = 0; i < edgeElems; ++i)
|
||||
{
|
||||
letam(i) = i;
|
||||
letap(numElem() - edgeElems + i) = numElem() - edgeElems + i;
|
||||
}
|
||||
for(Index_t i = edgeElems; i < numElem(); ++i)
|
||||
{
|
||||
letam(i) = i - edgeElems;
|
||||
letap(i - edgeElems) = i;
|
||||
}
|
||||
|
||||
for(Index_t i = 0; i < edgeElems * edgeElems; ++i)
|
||||
{
|
||||
lzetam(i) = i;
|
||||
lzetap(numElem() - edgeElems * edgeElems + i) =
|
||||
numElem() - edgeElems * edgeElems + i;
|
||||
}
|
||||
for(Index_t i = edgeElems * edgeElems; i < numElem(); ++i)
|
||||
{
|
||||
lzetam(i) = i - edgeElems * edgeElems;
|
||||
lzetap(i - edgeElems * edgeElems) = i;
|
||||
}
|
||||
}
|
||||
|
||||
/////////////////////////////////////////////////////////////
|
||||
void
|
||||
Domain::SetupBoundaryConditions(Int_t edgeElems)
|
||||
{
|
||||
Index_t ghostIdx[6]; // offsets to ghost locations
|
||||
|
||||
// set up boundary condition information
|
||||
for(Index_t i = 0; i < numElem(); ++i)
|
||||
{
|
||||
elemBC(i) = Int_t(0);
|
||||
}
|
||||
|
||||
for(Index_t i = 0; i < 6; ++i)
|
||||
{
|
||||
ghostIdx[i] = INT_MIN;
|
||||
}
|
||||
|
||||
Int_t pidx = numElem();
|
||||
if(m_planeMin != 0)
|
||||
{
|
||||
ghostIdx[0] = pidx;
|
||||
pidx += sizeX() * sizeY();
|
||||
}
|
||||
|
||||
if(m_planeMax != 0)
|
||||
{
|
||||
ghostIdx[1] = pidx;
|
||||
pidx += sizeX() * sizeY();
|
||||
}
|
||||
|
||||
if(m_rowMin != 0)
|
||||
{
|
||||
ghostIdx[2] = pidx;
|
||||
pidx += sizeX() * sizeZ();
|
||||
}
|
||||
|
||||
if(m_rowMax != 0)
|
||||
{
|
||||
ghostIdx[3] = pidx;
|
||||
pidx += sizeX() * sizeZ();
|
||||
}
|
||||
|
||||
if(m_colMin != 0)
|
||||
{
|
||||
ghostIdx[4] = pidx;
|
||||
pidx += sizeY() * sizeZ();
|
||||
}
|
||||
|
||||
if(m_colMax != 0)
|
||||
{
|
||||
ghostIdx[5] = pidx;
|
||||
}
|
||||
|
||||
// symmetry plane or free surface BCs
|
||||
for(Index_t i = 0; i < edgeElems; ++i)
|
||||
{
|
||||
Index_t planeInc = i * edgeElems * edgeElems;
|
||||
Index_t rowInc = i * edgeElems;
|
||||
for(Index_t j = 0; j < edgeElems; ++j)
|
||||
{
|
||||
if(m_planeLoc == 0)
|
||||
{
|
||||
elemBC(rowInc + j) |= ZETA_M_SYMM;
|
||||
}
|
||||
else
|
||||
{
|
||||
elemBC(rowInc + j) |= ZETA_M_COMM;
|
||||
lzetam(rowInc + j) = ghostIdx[0] + rowInc + j;
|
||||
}
|
||||
|
||||
if(m_planeLoc == m_tp - 1)
|
||||
{
|
||||
elemBC(rowInc + j + numElem() - edgeElems * edgeElems) |= ZETA_P_FREE;
|
||||
}
|
||||
else
|
||||
{
|
||||
elemBC(rowInc + j + numElem() - edgeElems * edgeElems) |= ZETA_P_COMM;
|
||||
lzetap(rowInc + j + numElem() - edgeElems * edgeElems) =
|
||||
ghostIdx[1] + rowInc + j;
|
||||
}
|
||||
|
||||
if(m_rowLoc == 0)
|
||||
{
|
||||
elemBC(planeInc + j) |= ETA_M_SYMM;
|
||||
}
|
||||
else
|
||||
{
|
||||
elemBC(planeInc + j) |= ETA_M_COMM;
|
||||
letam(planeInc + j) = ghostIdx[2] + rowInc + j;
|
||||
}
|
||||
|
||||
if(m_rowLoc == m_tp - 1)
|
||||
{
|
||||
elemBC(planeInc + j + edgeElems * edgeElems - edgeElems) |= ETA_P_FREE;
|
||||
}
|
||||
else
|
||||
{
|
||||
elemBC(planeInc + j + edgeElems * edgeElems - edgeElems) |= ETA_P_COMM;
|
||||
letap(planeInc + j + edgeElems * edgeElems - edgeElems) =
|
||||
ghostIdx[3] + rowInc + j;
|
||||
}
|
||||
|
||||
if(m_colLoc == 0)
|
||||
{
|
||||
elemBC(planeInc + j * edgeElems) |= XI_M_SYMM;
|
||||
}
|
||||
else
|
||||
{
|
||||
elemBC(planeInc + j * edgeElems) |= XI_M_COMM;
|
||||
lxim(planeInc + j * edgeElems) = ghostIdx[4] + rowInc + j;
|
||||
}
|
||||
|
||||
if(m_colLoc == m_tp - 1)
|
||||
{
|
||||
elemBC(planeInc + j * edgeElems + edgeElems - 1) |= XI_P_FREE;
|
||||
}
|
||||
else
|
||||
{
|
||||
elemBC(planeInc + j * edgeElems + edgeElems - 1) |= XI_P_COMM;
|
||||
lxip(planeInc + j * edgeElems + edgeElems - 1) = ghostIdx[5] + rowInc + j;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
///////////////////////////////////////////////////////////////////////////
|
||||
void
|
||||
InitMeshDecomp(Int_t numRanks, Int_t myRank, Int_t* col, Int_t* row, Int_t* plane,
|
||||
Int_t* side)
|
||||
{
|
||||
Int_t testProcs;
|
||||
Int_t dx, dy, dz;
|
||||
Int_t myDom;
|
||||
|
||||
// Assume cube processor layout for now
|
||||
testProcs = Int_t(cbrt(Real_t(numRanks)) + 0.5);
|
||||
if(testProcs * testProcs * testProcs != numRanks)
|
||||
{
|
||||
printf("Num processors must be a cube of an integer (1, 8, 27, ...)\n");
|
||||
#if USE_MPI
|
||||
MPI_Abort(MPI_COMM_WORLD, -1);
|
||||
#else
|
||||
exit(-1);
|
||||
#endif
|
||||
}
|
||||
if(sizeof(Real_t) != 4 && sizeof(Real_t) != 8)
|
||||
{
|
||||
printf("MPI operations only support float and double right now...\n");
|
||||
#if USE_MPI
|
||||
MPI_Abort(MPI_COMM_WORLD, -1);
|
||||
#else
|
||||
exit(-1);
|
||||
#endif
|
||||
}
|
||||
if(MAX_FIELDS_PER_MPI_COMM > CACHE_COHERENCE_PAD_REAL)
|
||||
{
|
||||
printf("corner element comm buffers too small. Fix code.\n");
|
||||
#if USE_MPI
|
||||
MPI_Abort(MPI_COMM_WORLD, -1);
|
||||
#else
|
||||
exit(-1);
|
||||
#endif
|
||||
}
|
||||
|
||||
dx = testProcs;
|
||||
dy = testProcs;
|
||||
dz = testProcs;
|
||||
|
||||
// temporary test
|
||||
if(dx * dy * dz != numRanks)
|
||||
{
|
||||
printf("error -- must have as many domains as procs\n");
|
||||
#if USE_MPI
|
||||
MPI_Abort(MPI_COMM_WORLD, -1);
|
||||
#else
|
||||
exit(-1);
|
||||
#endif
|
||||
}
|
||||
Int_t remainder = dx * dy * dz % numRanks;
|
||||
if(myRank < remainder)
|
||||
{
|
||||
myDom = myRank * (1 + (dx * dy * dz / numRanks));
|
||||
}
|
||||
else
|
||||
{
|
||||
myDom = remainder * (1 + (dx * dy * dz / numRanks)) +
|
||||
(myRank - remainder) * (dx * dy * dz / numRanks);
|
||||
}
|
||||
|
||||
*col = myDom % dx;
|
||||
*row = (myDom / dx) % dy;
|
||||
*plane = myDom / (dx * dy);
|
||||
*side = testProcs;
|
||||
|
||||
return;
|
||||
}
|
||||
@@ -0,0 +1,273 @@
|
||||
#include <ctype.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#if USE_MPI
|
||||
# include <mpi.h>
|
||||
#endif
|
||||
#include "lulesh.h"
|
||||
|
||||
/* Helper function for converting strings to ints, with error checking */
|
||||
int
|
||||
StrToInt(const char* token, int* retVal)
|
||||
{
|
||||
const char* c;
|
||||
char* endptr;
|
||||
const int decimal_base = 10;
|
||||
|
||||
if(token == NULL)
|
||||
return 0;
|
||||
|
||||
c = token;
|
||||
*retVal = (int) strtol(c, &endptr, decimal_base);
|
||||
if((endptr != c) && ((*endptr == ' ') || (*endptr == '\0')))
|
||||
return 1;
|
||||
else
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void
|
||||
PrintCommandLineOptions(char* execname, int myRank)
|
||||
{
|
||||
if(myRank == 0)
|
||||
{
|
||||
printf("Usage: %s [opts]\n", execname);
|
||||
printf(" where [opts] is one or more of:\n");
|
||||
printf(" -q : quiet mode - suppress all stdout\n");
|
||||
printf(" -i <iterations> : number of cycles to run\n");
|
||||
printf(" -s <size> : length of cube mesh along side\n");
|
||||
printf(" -r <numregions> : Number of distinct regions (def: 11)\n");
|
||||
printf(" -b <balance> : Load balance between regions of a domain (def: 1)\n");
|
||||
printf(" -c <cost> : Extra cost of more expensive regions (def: 1)\n");
|
||||
printf(" -f <numfiles> : Number of files to split viz dump into (def: "
|
||||
"(np+10)/9)\n");
|
||||
printf(" -p : Print out progress\n");
|
||||
printf(
|
||||
" -v : Output viz file (requires compiling with -DVIZ_MESH\n");
|
||||
printf(" -h : This message\n");
|
||||
printf("\n\n");
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
ParseError(const char* message, int myRank)
|
||||
{
|
||||
if(myRank == 0)
|
||||
{
|
||||
printf("%s\n", message);
|
||||
#if USE_MPI
|
||||
MPI_Abort(MPI_COMM_WORLD, -1);
|
||||
#else
|
||||
exit(-1);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
ParseCommandLineOptions(int argc, char* argv[], int myRank, struct cmdLineOpts* opts)
|
||||
{
|
||||
if(argc > 1)
|
||||
{
|
||||
int i = 1;
|
||||
|
||||
while(i < argc)
|
||||
{
|
||||
int ok;
|
||||
/* -i <iterations> */
|
||||
if(strcmp(argv[i], "-i") == 0)
|
||||
{
|
||||
if(i + 1 >= argc)
|
||||
{
|
||||
ParseError("Missing integer argument to -i", myRank);
|
||||
}
|
||||
ok = StrToInt(argv[i + 1], &(opts->its));
|
||||
if(!ok)
|
||||
{
|
||||
ParseError("Parse Error on option -i integer value required after "
|
||||
"argument\n",
|
||||
myRank);
|
||||
}
|
||||
i += 2;
|
||||
}
|
||||
/* -s <size, sidelength> */
|
||||
else if(strcmp(argv[i], "-s") == 0)
|
||||
{
|
||||
if(i + 1 >= argc)
|
||||
{
|
||||
ParseError("Missing integer argument to -s\n", myRank);
|
||||
}
|
||||
ok = StrToInt(argv[i + 1], &(opts->nx));
|
||||
if(!ok)
|
||||
{
|
||||
ParseError("Parse Error on option -s integer value required after "
|
||||
"argument\n",
|
||||
myRank);
|
||||
}
|
||||
i += 2;
|
||||
}
|
||||
/* -r <numregions> */
|
||||
else if(strcmp(argv[i], "-r") == 0)
|
||||
{
|
||||
if(i + 1 >= argc)
|
||||
{
|
||||
ParseError("Missing integer argument to -r\n", myRank);
|
||||
}
|
||||
ok = StrToInt(argv[i + 1], &(opts->numReg));
|
||||
if(!ok)
|
||||
{
|
||||
ParseError("Parse Error on option -r integer value required after "
|
||||
"argument\n",
|
||||
myRank);
|
||||
}
|
||||
i += 2;
|
||||
}
|
||||
/* -f <numfilepieces> */
|
||||
else if(strcmp(argv[i], "-f") == 0)
|
||||
{
|
||||
if(i + 1 >= argc)
|
||||
{
|
||||
ParseError("Missing integer argument to -f\n", myRank);
|
||||
}
|
||||
ok = StrToInt(argv[i + 1], &(opts->numFiles));
|
||||
if(!ok)
|
||||
{
|
||||
ParseError("Parse Error on option -f integer value required after "
|
||||
"argument\n",
|
||||
myRank);
|
||||
}
|
||||
i += 2;
|
||||
}
|
||||
/* -p */
|
||||
else if(strcmp(argv[i], "-p") == 0)
|
||||
{
|
||||
opts->showProg = 1;
|
||||
i++;
|
||||
}
|
||||
/* -q */
|
||||
else if(strcmp(argv[i], "-q") == 0)
|
||||
{
|
||||
opts->quiet = 1;
|
||||
i++;
|
||||
}
|
||||
/* -q */
|
||||
else if(strcmp(argv[i], "-a") == 0)
|
||||
{
|
||||
opts->do_atomic = 1;
|
||||
i++;
|
||||
}
|
||||
else if(strcmp(argv[i], "-b") == 0)
|
||||
{
|
||||
if(i + 1 >= argc)
|
||||
{
|
||||
ParseError("Missing integer argument to -b\n", myRank);
|
||||
}
|
||||
ok = StrToInt(argv[i + 1], &(opts->balance));
|
||||
if(!ok)
|
||||
{
|
||||
ParseError("Parse Error on option -b integer value required after "
|
||||
"argument\n",
|
||||
myRank);
|
||||
}
|
||||
i += 2;
|
||||
}
|
||||
else if(strcmp(argv[i], "-c") == 0)
|
||||
{
|
||||
if(i + 1 >= argc)
|
||||
{
|
||||
ParseError("Missing integer argument to -c\n", myRank);
|
||||
}
|
||||
ok = StrToInt(argv[i + 1], &(opts->cost));
|
||||
if(!ok)
|
||||
{
|
||||
ParseError("Parse Error on option -c integer value required after "
|
||||
"argument\n",
|
||||
myRank);
|
||||
}
|
||||
i += 2;
|
||||
}
|
||||
/* -v */
|
||||
else if(strcmp(argv[i], "-v") == 0)
|
||||
{
|
||||
#if VIZ_MESH
|
||||
opts->viz = 1;
|
||||
#else
|
||||
ParseError("Use of -v requires compiling with -DVIZ_MESH\n", myRank);
|
||||
#endif
|
||||
i++;
|
||||
}
|
||||
/* -h */
|
||||
else if(strcmp(argv[i], "-h") == 0)
|
||||
{
|
||||
PrintCommandLineOptions(argv[0], myRank);
|
||||
#if USE_MPI
|
||||
MPI_Abort(MPI_COMM_WORLD, 0);
|
||||
#else
|
||||
exit(0);
|
||||
#endif
|
||||
}
|
||||
else
|
||||
{
|
||||
char msg[80];
|
||||
PrintCommandLineOptions(argv[0], myRank);
|
||||
sprintf(msg, "ERROR: Unknown command line argument: %s\n", argv[i]);
|
||||
ParseError(msg, myRank);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/////////////////////////////////////////////////////////////////////
|
||||
|
||||
void
|
||||
VerifyAndWriteFinalOutput(Real_t elapsed_time, Domain& locDom, Int_t nx, Int_t numRanks)
|
||||
{
|
||||
// GrindTime1 only takes a single domain into account, and is thus a good way to
|
||||
// measure processor speed indepdendent of MPI parallelism. GrindTime2 takes into
|
||||
// account speedups from MPI parallelism
|
||||
Real_t grindTime1 = ((elapsed_time * 1e6) / locDom.cycle()) / (nx * nx * nx);
|
||||
Real_t grindTime2 =
|
||||
((elapsed_time * 1e6) / locDom.cycle()) / (nx * nx * nx * numRanks);
|
||||
|
||||
Index_t ElemId = 0;
|
||||
printf("Run completed: \n");
|
||||
printf(" Problem size = %i \n", nx);
|
||||
printf(" MPI tasks = %i \n", numRanks);
|
||||
printf(" Iteration count = %i \n", locDom.cycle());
|
||||
printf(" Final Origin Energy = %12.6e \n", locDom.e(ElemId));
|
||||
|
||||
Real_t MaxAbsDiff = Real_t(0.0);
|
||||
Real_t TotalAbsDiff = Real_t(0.0);
|
||||
Real_t MaxRelDiff = Real_t(0.0);
|
||||
|
||||
for(Index_t j = 0; j < nx; ++j)
|
||||
{
|
||||
for(Index_t k = j + 1; k < nx; ++k)
|
||||
{
|
||||
Real_t AbsDiff = FABS(locDom.e(j * nx + k) - locDom.e(k * nx + j));
|
||||
TotalAbsDiff += AbsDiff;
|
||||
|
||||
if(MaxAbsDiff < AbsDiff)
|
||||
MaxAbsDiff = AbsDiff;
|
||||
|
||||
Real_t RelDiff = AbsDiff / locDom.e(k * nx + j);
|
||||
|
||||
if(MaxRelDiff < RelDiff)
|
||||
MaxRelDiff = RelDiff;
|
||||
}
|
||||
}
|
||||
|
||||
// Quick symmetry check
|
||||
printf(" Testing Plane 0 of Energy Array on rank 0:\n");
|
||||
printf(" MaxAbsDiff = %12.6e\n", MaxAbsDiff);
|
||||
printf(" TotalAbsDiff = %12.6e\n", TotalAbsDiff);
|
||||
printf(" MaxRelDiff = %12.6e\n\n", MaxRelDiff);
|
||||
|
||||
// Timing information
|
||||
printf("\nElapsed time = %10.2f (s)\n", elapsed_time);
|
||||
printf("Grind time (us/z/c) = %10.8g (per dom) (%10.8g overall)\n", grindTime1,
|
||||
grindTime2);
|
||||
printf("FOM = %10.8g (z/s)\n\n",
|
||||
1000.0 / grindTime2); // zones per second
|
||||
|
||||
return;
|
||||
}
|
||||
@@ -0,0 +1,422 @@
|
||||
#include "lulesh.h"
|
||||
#include <math.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
#ifdef VIZ_MESH
|
||||
|
||||
# ifdef __cplusplus
|
||||
extern "C"
|
||||
{
|
||||
# endif
|
||||
# include "silo.h"
|
||||
# if USE_MPI
|
||||
# include "pmpio.h"
|
||||
# endif
|
||||
# ifdef __cplusplus
|
||||
}
|
||||
# endif
|
||||
|
||||
// Function prototypes
|
||||
static void
|
||||
DumpDomainToVisit(DBfile* db, Domain& domain, int myRank);
|
||||
static
|
||||
|
||||
# if USE_MPI
|
||||
// For some reason, earlier versions of g++ (e.g. 4.2) won't let me
|
||||
// put the 'static' qualifier on this prototype, even if it's done
|
||||
// consistently in the prototype and definition
|
||||
void
|
||||
DumpMultiblockObjects(DBfile* db, PMPIO_baton_t* bat, char basename[], int numRanks);
|
||||
|
||||
// Callback prototypes for PMPIO interface (only useful if we're
|
||||
// running parallel)
|
||||
static void*
|
||||
LULESH_PMPIO_Create(const char* fname, const char* dname, void* udata);
|
||||
static void*
|
||||
LULESH_PMPIO_Open(const char* fname, const char* dname, PMPIO_iomode_t ioMode,
|
||||
void* udata);
|
||||
static void
|
||||
LULESH_PMPIO_Close(void* file, void* udata);
|
||||
|
||||
# else
|
||||
void
|
||||
DumpMultiblockObjects(DBfile* db, char basename[], int numRanks);
|
||||
# endif
|
||||
|
||||
/**********************************************************************/
|
||||
void
|
||||
DumpToVisit(Domain& domain, int numFiles, int myRank, int numRanks)
|
||||
{
|
||||
char subdirName[32];
|
||||
char basename[32];
|
||||
DBfile* db;
|
||||
|
||||
sprintf(basename, "lulesh_plot_c%d", domain.cycle());
|
||||
sprintf(subdirName, "data_%d", myRank);
|
||||
|
||||
# if USE_MPI
|
||||
|
||||
PMPIO_baton_t* bat =
|
||||
PMPIO_Init(numFiles, PMPIO_WRITE, MPI_COMM_WORLD, 10101, LULESH_PMPIO_Create,
|
||||
LULESH_PMPIO_Open, LULESH_PMPIO_Close, NULL);
|
||||
|
||||
int myiorank = PMPIO_GroupRank(bat, myRank);
|
||||
|
||||
char fileName[64];
|
||||
|
||||
if(myiorank == 0)
|
||||
strcpy(fileName, basename);
|
||||
else
|
||||
sprintf(fileName, "%s.%03d", basename, myiorank);
|
||||
|
||||
db = (DBfile*) PMPIO_WaitForBaton(bat, fileName, subdirName);
|
||||
|
||||
DumpDomainToVisit(db, domain, myRank);
|
||||
|
||||
// Processor 0 writes out bit of extra data to its file that
|
||||
// describes how to stitch all the pieces together
|
||||
if(myRank == 0)
|
||||
{
|
||||
DumpMultiblockObjects(db, bat, basename, numRanks);
|
||||
}
|
||||
|
||||
PMPIO_HandOffBaton(bat, db);
|
||||
|
||||
PMPIO_Finish(bat);
|
||||
# else
|
||||
|
||||
db = (DBfile*) DBCreate(basename, DB_CLOBBER, DB_LOCAL, NULL, DB_HDF5X);
|
||||
|
||||
if(db)
|
||||
{
|
||||
DBMkDir(db, subdirName);
|
||||
DBSetDir(db, subdirName);
|
||||
DumpDomainToVisit(db, domain, myRank);
|
||||
DumpMultiblockObjects(db, basename, numRanks);
|
||||
}
|
||||
else
|
||||
{
|
||||
printf("Error writing out viz file - rank %d\n", myRank);
|
||||
}
|
||||
|
||||
# endif
|
||||
}
|
||||
|
||||
/**********************************************************************/
|
||||
|
||||
static void
|
||||
DumpDomainToVisit(DBfile* db, Domain& domain, int myRank)
|
||||
{
|
||||
int ok = 0;
|
||||
|
||||
/* Create an option list that will give some hints to VisIt for
|
||||
* printing out the cycle and time in the annotations */
|
||||
DBoptlist* optlist;
|
||||
|
||||
/* Write out the mesh connectivity in fully unstructured format */
|
||||
int shapetype[1] = { DB_ZONETYPE_HEX };
|
||||
int shapesize[1] = { 8 };
|
||||
int shapecnt[1] = { domain.numElem() };
|
||||
int* conn = Allocate<int>(domain.numElem() * 8);
|
||||
int ci = 0;
|
||||
for(int ei = 0; ei < domain.numElem(); ++ei)
|
||||
{
|
||||
Index_t* elemToNode = domain.nodelist(ei);
|
||||
for(int ni = 0; ni < 8; ++ni)
|
||||
{
|
||||
conn[ci++] = elemToNode[ni];
|
||||
}
|
||||
}
|
||||
ok += DBPutZonelist2(db, "connectivity", domain.numElem(), 3, conn,
|
||||
domain.numElem() * 8, 0, 0, 0, /* Not carrying ghost zones */
|
||||
shapetype, shapesize, shapecnt, 1, NULL);
|
||||
Release<int>(&conn);
|
||||
|
||||
/* Write out the mesh coordinates associated with the mesh */
|
||||
const char* coordnames[3] = { "X", "Y", "Z" };
|
||||
float* coords[3];
|
||||
coords[0] = Allocate<float>(domain.numNode());
|
||||
coords[1] = Allocate<float>(domain.numNode());
|
||||
coords[2] = Allocate<float>(domain.numNode());
|
||||
for(int ni = 0; ni < domain.numNode(); ++ni)
|
||||
{
|
||||
coords[0][ni] = float(domain.x(ni));
|
||||
coords[1][ni] = float(domain.y(ni));
|
||||
coords[2][ni] = float(domain.z(ni));
|
||||
}
|
||||
optlist = DBMakeOptlist(2);
|
||||
ok += DBAddOption(optlist, DBOPT_DTIME, &domain.time());
|
||||
ok += DBAddOption(optlist, DBOPT_CYCLE, &domain.cycle());
|
||||
ok += DBPutUcdmesh(db, "mesh", 3, (char**) &coordnames[0], (float**) coords,
|
||||
domain.numNode(), domain.numElem(), "connectivity", 0, DB_FLOAT,
|
||||
optlist);
|
||||
ok += DBFreeOptlist(optlist);
|
||||
Release<float>(&coords[2]);
|
||||
Release<float>(&coords[1]);
|
||||
Release<float>(&coords[0]);
|
||||
|
||||
/* Write out the materials */
|
||||
int* matnums = Allocate<int>(domain.numReg());
|
||||
int dims[1] = { domain.numElem() }; // No mixed elements
|
||||
for(int i = 0; i < domain.numReg(); ++i)
|
||||
matnums[i] = i + 1;
|
||||
|
||||
ok += DBPutMaterial(db, "regions", "mesh", domain.numReg(), matnums,
|
||||
domain.regNumList(), dims, 1, NULL, NULL, NULL, NULL, 0, DB_FLOAT,
|
||||
NULL);
|
||||
Release<int>(&matnums);
|
||||
|
||||
/* Write out pressure, energy, relvol, q */
|
||||
|
||||
float* e = Allocate<float>(domain.numElem());
|
||||
for(int ei = 0; ei < domain.numElem(); ++ei)
|
||||
{
|
||||
e[ei] = float(domain.e(ei));
|
||||
}
|
||||
ok += DBPutUcdvar1(db, "e", "mesh", e, domain.numElem(), NULL, 0, DB_FLOAT,
|
||||
DB_ZONECENT, NULL);
|
||||
Release<float>(&e);
|
||||
|
||||
float* p = Allocate<float>(domain.numElem());
|
||||
for(int ei = 0; ei < domain.numElem(); ++ei)
|
||||
{
|
||||
p[ei] = float(domain.p(ei));
|
||||
}
|
||||
ok += DBPutUcdvar1(db, "p", "mesh", p, domain.numElem(), NULL, 0, DB_FLOAT,
|
||||
DB_ZONECENT, NULL);
|
||||
Release<float>(&p);
|
||||
|
||||
float* v = Allocate<float>(domain.numElem());
|
||||
for(int ei = 0; ei < domain.numElem(); ++ei)
|
||||
{
|
||||
v[ei] = float(domain.v(ei));
|
||||
}
|
||||
ok += DBPutUcdvar1(db, "v", "mesh", v, domain.numElem(), NULL, 0, DB_FLOAT,
|
||||
DB_ZONECENT, NULL);
|
||||
Release<float>(&v);
|
||||
|
||||
float* q = Allocate<float>(domain.numElem());
|
||||
for(int ei = 0; ei < domain.numElem(); ++ei)
|
||||
{
|
||||
q[ei] = float(domain.q(ei));
|
||||
}
|
||||
ok += DBPutUcdvar1(db, "q", "mesh", q, domain.numElem(), NULL, 0, DB_FLOAT,
|
||||
DB_ZONECENT, NULL);
|
||||
Release<float>(&q);
|
||||
|
||||
/* Write out nodal speed, velocities */
|
||||
float* zd = Allocate<float>(domain.numNode());
|
||||
float* yd = Allocate<float>(domain.numNode());
|
||||
float* xd = Allocate<float>(domain.numNode());
|
||||
float* speed = Allocate<float>(domain.numNode());
|
||||
for(int ni = 0; ni < domain.numNode(); ++ni)
|
||||
{
|
||||
xd[ni] = float(domain.xd(ni));
|
||||
yd[ni] = float(domain.yd(ni));
|
||||
zd[ni] = float(domain.zd(ni));
|
||||
speed[ni] =
|
||||
float(sqrt((xd[ni] * xd[ni]) + (yd[ni] * yd[ni]) + (zd[ni] * zd[ni])));
|
||||
}
|
||||
|
||||
ok += DBPutUcdvar1(db, "speed", "mesh", speed, domain.numNode(), NULL, 0, DB_FLOAT,
|
||||
DB_NODECENT, NULL);
|
||||
Release<float>(&speed);
|
||||
|
||||
ok += DBPutUcdvar1(db, "xd", "mesh", xd, domain.numNode(), NULL, 0, DB_FLOAT,
|
||||
DB_NODECENT, NULL);
|
||||
Release<float>(&xd);
|
||||
|
||||
ok += DBPutUcdvar1(db, "yd", "mesh", yd, domain.numNode(), NULL, 0, DB_FLOAT,
|
||||
DB_NODECENT, NULL);
|
||||
Release<float>(&yd);
|
||||
|
||||
ok += DBPutUcdvar1(db, "zd", "mesh", zd, domain.numNode(), NULL, 0, DB_FLOAT,
|
||||
DB_NODECENT, NULL);
|
||||
Release<float>(&zd);
|
||||
|
||||
if(ok != 0)
|
||||
{
|
||||
printf("Error writing out viz file - rank %d\n", myRank);
|
||||
}
|
||||
}
|
||||
|
||||
/**********************************************************************/
|
||||
|
||||
# if USE_MPI
|
||||
void
|
||||
DumpMultiblockObjects(DBfile* db, PMPIO_baton_t* bat, char basename[], int numRanks)
|
||||
# else
|
||||
void
|
||||
DumpMultiblockObjects(DBfile* db, char basename[], int numRanks)
|
||||
# endif
|
||||
{
|
||||
/* MULTIBLOCK objects to tie together multiple files */
|
||||
char** multimeshObjs;
|
||||
char** multimatObjs;
|
||||
char*** multivarObjs;
|
||||
int* blockTypes;
|
||||
int* varTypes;
|
||||
int ok = 0;
|
||||
// Make sure this list matches what's written out above
|
||||
char vars[][10] = { "p", "e", "v", "q", "speed", "xd", "yd", "zd" };
|
||||
int numvars = sizeof(vars) / sizeof(vars[0]);
|
||||
|
||||
// Reset to the root directory of the silo file
|
||||
DBSetDir(db, "/");
|
||||
|
||||
// Allocate a bunch of space for building up the string names
|
||||
multimeshObjs = Allocate<char*>(numRanks);
|
||||
multimatObjs = Allocate<char*>(numRanks);
|
||||
multivarObjs = Allocate<char**>(numvars);
|
||||
blockTypes = Allocate<int>(numRanks);
|
||||
varTypes = Allocate<int>(numRanks);
|
||||
|
||||
for(int v = 0; v < numvars; ++v)
|
||||
{
|
||||
multivarObjs[v] = Allocate<char*>(numRanks);
|
||||
}
|
||||
|
||||
for(int i = 0; i < numRanks; ++i)
|
||||
{
|
||||
multimeshObjs[i] = Allocate<char>(64);
|
||||
multimatObjs[i] = Allocate<char>(64);
|
||||
for(int v = 0; v < numvars; ++v)
|
||||
{
|
||||
multivarObjs[v][i] = Allocate<char>(64);
|
||||
}
|
||||
blockTypes[i] = DB_UCDMESH;
|
||||
varTypes[i] = DB_UCDVAR;
|
||||
}
|
||||
|
||||
// Build up the multiobject names
|
||||
for(int i = 0; i < numRanks; ++i)
|
||||
{
|
||||
# if USE_MPI
|
||||
int iorank = PMPIO_GroupRank(bat, i);
|
||||
# else
|
||||
int iorank = 0;
|
||||
# endif
|
||||
|
||||
// delete multivarObjs[i];
|
||||
if(iorank == 0)
|
||||
{
|
||||
snprintf(multimeshObjs[i], 64, "/data_%d/mesh", i);
|
||||
snprintf(multimatObjs[i], 64, "/data_%d/regions", i);
|
||||
for(int v = 0; v < numvars; ++v)
|
||||
{
|
||||
snprintf(multivarObjs[v][i], 64, "/data_%d/%s", i, vars[v]);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
snprintf(multimeshObjs[i], 64, "%s.%03d:/data_%d/mesh", basename, iorank, i);
|
||||
snprintf(multimatObjs[i], 64, "%s.%03d:/data_%d/regions", basename, iorank,
|
||||
i);
|
||||
for(int v = 0; v < numvars; ++v)
|
||||
{
|
||||
snprintf(multivarObjs[v][i], 64, "%s.%03d:/data_%d/%s", basename, iorank,
|
||||
i, vars[v]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Now write out the objects
|
||||
ok += DBPutMultimesh(db, "mesh", numRanks, (char**) multimeshObjs, blockTypes, NULL);
|
||||
ok += DBPutMultimat(db, "regions", numRanks, (char**) multimatObjs, NULL);
|
||||
for(int v = 0; v < numvars; ++v)
|
||||
{
|
||||
ok += DBPutMultivar(db, vars[v], numRanks, (char**) multivarObjs[v], varTypes,
|
||||
NULL);
|
||||
}
|
||||
|
||||
for(int v = 0; v < numvars; ++v)
|
||||
{
|
||||
for(int i = 0; i < numRanks; i++)
|
||||
{
|
||||
Release<char>(&multivarObjs[v][i]);
|
||||
}
|
||||
Release<char*>(&multivarObjs[v]);
|
||||
}
|
||||
|
||||
// Clean up
|
||||
for(int i = 0; i < numRanks; i++)
|
||||
{
|
||||
Release<char>(&multimeshObjs[i]);
|
||||
Release<char>(&multimatObjs[i]);
|
||||
}
|
||||
Release<char*>(&multimeshObjs);
|
||||
Release<char*>(&multimatObjs);
|
||||
Release<char**>(&multivarObjs);
|
||||
Release<int>(&blockTypes);
|
||||
Release<int>(&varTypes);
|
||||
|
||||
if(ok != 0)
|
||||
{
|
||||
printf("Error writing out multiXXX objs to viz file - rank 0\n");
|
||||
}
|
||||
}
|
||||
|
||||
# if USE_MPI
|
||||
|
||||
/**********************************************************************/
|
||||
|
||||
static void*
|
||||
LULESH_PMPIO_Create(const char* fname, const char* dname, void* udata)
|
||||
{
|
||||
/* Create the file */
|
||||
DBfile* db = DBCreate(fname, DB_CLOBBER, DB_LOCAL, NULL, DB_HDF5X);
|
||||
|
||||
/* Put the data in a subdirectory, so VisIt only sees the multimesh
|
||||
* objects we write out in the base file */
|
||||
if(db)
|
||||
{
|
||||
DBMkDir(db, dname);
|
||||
DBSetDir(db, dname);
|
||||
}
|
||||
return (void*) db;
|
||||
}
|
||||
|
||||
/**********************************************************************/
|
||||
|
||||
static void*
|
||||
LULESH_PMPIO_Open(const char* fname, const char* dname, PMPIO_iomode_t ioMode,
|
||||
void* udata)
|
||||
{
|
||||
/* Open the file */
|
||||
DBfile* db = DBOpen(fname, DB_UNKNOWN, DB_APPEND);
|
||||
|
||||
/* Put the data in a subdirectory, so VisIt only sees the multimesh
|
||||
* objects we write out in the base file */
|
||||
if(db)
|
||||
{
|
||||
DBMkDir(db, dname);
|
||||
DBSetDir(db, dname);
|
||||
}
|
||||
return (void*) db;
|
||||
}
|
||||
|
||||
/**********************************************************************/
|
||||
|
||||
static void
|
||||
LULESH_PMPIO_Close(void* file, void* udata)
|
||||
{
|
||||
DBfile* db = (DBfile*) file;
|
||||
if(db)
|
||||
DBClose(db);
|
||||
}
|
||||
# endif
|
||||
|
||||
#else
|
||||
|
||||
void
|
||||
DumpToVisit(Domain& domain, int numFiles, int myRank, int numRanks)
|
||||
{
|
||||
if(myRank == 0)
|
||||
{
|
||||
printf("Must enable -DVIZ_MESH at compile time to call DumpDomain\n");
|
||||
}
|
||||
}
|
||||
|
||||
#endif
|
||||
Plik diff jest za duży
Load Diff
@@ -0,0 +1,836 @@
|
||||
|
||||
#if !defined(USE_MPI)
|
||||
# error "You should specify USE_MPI=0 or USE_MPI=1 on the compile line"
|
||||
#endif
|
||||
|
||||
// OpenMP will be compiled in if this flag is set to 1 AND the compiler beging
|
||||
// used supports it (i.e. the _OPENMP symbol is defined)
|
||||
#define USE_OMP 1
|
||||
|
||||
#if USE_MPI
|
||||
# include <mpi.h>
|
||||
|
||||
/*
|
||||
define one of these three symbols:
|
||||
|
||||
SEDOV_SYNC_POS_VEL_NONE
|
||||
SEDOV_SYNC_POS_VEL_EARLY
|
||||
SEDOV_SYNC_POS_VEL_LATE
|
||||
*/
|
||||
|
||||
# define SEDOV_SYNC_POS_VEL_EARLY 1
|
||||
#endif
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
#include <Kokkos_Vector.hpp>
|
||||
|
||||
#include <math.h>
|
||||
#include <vector>
|
||||
|
||||
//**************************************************
|
||||
// Allow flexibility for arithmetic representations
|
||||
//**************************************************
|
||||
|
||||
#define MAX(a, b) (((a) > (b)) ? (a) : (b))
|
||||
|
||||
// Precision specification
|
||||
typedef float real4;
|
||||
typedef double real8;
|
||||
typedef long double real10; // 10 bytes on x86
|
||||
|
||||
typedef int Index_t; // array subscript and loop index
|
||||
typedef real8 Real_t; // floating point representation
|
||||
typedef int Int_t; // integer representation
|
||||
|
||||
enum
|
||||
{
|
||||
VolumeError = -1,
|
||||
QStopError = -2
|
||||
};
|
||||
|
||||
inline real4
|
||||
SQRT(real4 arg)
|
||||
{
|
||||
return sqrtf(arg);
|
||||
}
|
||||
inline real8
|
||||
SQRT(real8 arg)
|
||||
{
|
||||
return sqrt(arg);
|
||||
}
|
||||
inline real10
|
||||
SQRT(real10 arg)
|
||||
{
|
||||
return sqrtl(arg);
|
||||
}
|
||||
|
||||
inline real4
|
||||
CBRT(real4 arg)
|
||||
{
|
||||
return cbrtf(arg);
|
||||
}
|
||||
inline real8
|
||||
CBRT(real8 arg)
|
||||
{
|
||||
return cbrt(arg);
|
||||
}
|
||||
inline real10
|
||||
CBRT(real10 arg)
|
||||
{
|
||||
return cbrtl(arg);
|
||||
}
|
||||
|
||||
inline real4
|
||||
FABS(real4 arg)
|
||||
{
|
||||
return fabsf(arg);
|
||||
}
|
||||
inline real8
|
||||
FABS(real8 arg)
|
||||
{
|
||||
return fabs(arg);
|
||||
}
|
||||
inline real10
|
||||
FABS(real10 arg)
|
||||
{
|
||||
return fabsl(arg);
|
||||
}
|
||||
|
||||
// Stuff needed for boundary conditions
|
||||
// 2 BCs on each of 6 hexahedral faces (12 bits)
|
||||
#define XI_M 0x00007
|
||||
#define XI_M_SYMM 0x00001
|
||||
#define XI_M_FREE 0x00002
|
||||
#define XI_M_COMM 0x00004
|
||||
|
||||
#define XI_P 0x00038
|
||||
#define XI_P_SYMM 0x00008
|
||||
#define XI_P_FREE 0x00010
|
||||
#define XI_P_COMM 0x00020
|
||||
|
||||
#define ETA_M 0x001c0
|
||||
#define ETA_M_SYMM 0x00040
|
||||
#define ETA_M_FREE 0x00080
|
||||
#define ETA_M_COMM 0x00100
|
||||
|
||||
#define ETA_P 0x00e00
|
||||
#define ETA_P_SYMM 0x00200
|
||||
#define ETA_P_FREE 0x00400
|
||||
#define ETA_P_COMM 0x00800
|
||||
|
||||
#define ZETA_M 0x07000
|
||||
#define ZETA_M_SYMM 0x01000
|
||||
#define ZETA_M_FREE 0x02000
|
||||
#define ZETA_M_COMM 0x04000
|
||||
|
||||
#define ZETA_P 0x38000
|
||||
#define ZETA_P_SYMM 0x08000
|
||||
#define ZETA_P_FREE 0x10000
|
||||
#define ZETA_P_COMM 0x20000
|
||||
|
||||
// MPI Message Tags
|
||||
#define MSG_COMM_SBN 1024
|
||||
#define MSG_SYNC_POS_VEL 2048
|
||||
#define MSG_MONOQ 3072
|
||||
|
||||
#define MAX_FIELDS_PER_MPI_COMM 6
|
||||
|
||||
// Assume 128 byte coherence
|
||||
// Assume Real_t is an "integral power of 2" bytes wide
|
||||
#define CACHE_COHERENCE_PAD_REAL (128 / sizeof(Real_t))
|
||||
|
||||
#define CACHE_ALIGN_REAL(n) \
|
||||
(((n) + (CACHE_COHERENCE_PAD_REAL - 1)) & ~(CACHE_COHERENCE_PAD_REAL - 1))
|
||||
|
||||
//////////////////////////////////////////////////////
|
||||
// Primary data structure
|
||||
//////////////////////////////////////////////////////
|
||||
|
||||
/*
|
||||
* The implementation of the data abstraction used for lulesh
|
||||
* resides entirely in the Domain class below. You can change
|
||||
* grouping and interleaving of fields here to maximize data layout
|
||||
* efficiency for your underlying architecture or compiler.
|
||||
*
|
||||
* For example, fields can be implemented as STL objects or
|
||||
* raw array pointers. As another example, individual fields
|
||||
* m_x, m_y, m_z could be budled into
|
||||
*
|
||||
* struct { Real_t x, y, z ; } *m_coord ;
|
||||
*
|
||||
* allowing accessor functions such as
|
||||
*
|
||||
* "Real_t &x(Index_t idx) { return m_coord[idx].x ; }"
|
||||
* "Real_t &y(Index_t idx) { return m_coord[idx].y ; }"
|
||||
* "Real_t &z(Index_t idx) { return m_coord[idx].z ; }"
|
||||
*/
|
||||
|
||||
class Domain
|
||||
{
|
||||
public:
|
||||
// Constructor
|
||||
Domain(Int_t numRanks, Index_t colLoc, Index_t rowLoc, Index_t planeLoc, Index_t nx,
|
||||
Int_t tp, Int_t nr, Int_t balance, Int_t cost);
|
||||
|
||||
// Destructor
|
||||
~Domain();
|
||||
|
||||
//
|
||||
// ALLOCATION
|
||||
//
|
||||
|
||||
void AllocateNodePersistent(Int_t numNode) // Node-centered
|
||||
{
|
||||
m_x.resize(numNode); // coordinates
|
||||
m_y.resize(numNode);
|
||||
m_z.resize(numNode);
|
||||
|
||||
m_xd.resize(numNode); // velocities
|
||||
m_yd.resize(numNode);
|
||||
m_zd.resize(numNode);
|
||||
|
||||
m_xdd.resize(numNode); // accelerations
|
||||
m_ydd.resize(numNode);
|
||||
m_zdd.resize(numNode);
|
||||
|
||||
m_fx.resize(numNode); // forces
|
||||
m_fy.resize(numNode);
|
||||
m_fz.resize(numNode);
|
||||
|
||||
m_nodalMass.resize(numNode); // mass
|
||||
|
||||
m_c_x = m_x.d_view;
|
||||
m_c_y = m_y.d_view;
|
||||
m_c_z = m_z.d_view;
|
||||
m_c_xd = m_xd.d_view;
|
||||
m_c_yd = m_yd.d_view;
|
||||
m_c_zd = m_zd.d_view;
|
||||
}
|
||||
|
||||
void AllocateElemPersistent(Int_t numElem) // Elem-centered
|
||||
{
|
||||
m_nodelist.resize(8 * numElem);
|
||||
|
||||
// elem connectivities through face
|
||||
m_lxim.resize(numElem);
|
||||
m_lxip.resize(numElem);
|
||||
m_letam.resize(numElem);
|
||||
m_letap.resize(numElem);
|
||||
m_lzetam.resize(numElem);
|
||||
m_lzetap.resize(numElem);
|
||||
|
||||
m_elemBC.resize(numElem);
|
||||
|
||||
m_e.resize(numElem);
|
||||
m_p.resize(numElem);
|
||||
|
||||
m_q.resize(numElem);
|
||||
m_ql.resize(numElem);
|
||||
m_qq.resize(numElem);
|
||||
|
||||
m_v.resize(numElem);
|
||||
|
||||
m_volo.resize(numElem);
|
||||
m_delv.resize(numElem);
|
||||
m_vdov.resize(numElem);
|
||||
|
||||
m_arealg.resize(numElem);
|
||||
|
||||
m_ss.resize(numElem);
|
||||
|
||||
m_elemMass.resize(numElem);
|
||||
|
||||
m_vnew.resize(numElem);
|
||||
|
||||
m_c_e = m_e.d_view;
|
||||
m_c_p = m_p.d_view;
|
||||
m_c_q = m_q.d_view;
|
||||
m_c_ql = m_ql.d_view;
|
||||
m_c_qq = m_qq.d_view;
|
||||
m_c_delv = m_delv.d_view;
|
||||
}
|
||||
|
||||
void AllocateGradients(Int_t numElem, Int_t allElem)
|
||||
{
|
||||
// Position gradients
|
||||
m_delx_xi.resize(numElem);
|
||||
m_delx_eta.resize(numElem);
|
||||
m_delx_zeta.resize(numElem);
|
||||
|
||||
// Velocity gradients
|
||||
m_delv_xi.resize(allElem);
|
||||
m_delv_eta.resize(allElem);
|
||||
m_delv_zeta.resize(allElem);
|
||||
}
|
||||
|
||||
void DeallocateGradients()
|
||||
{
|
||||
m_delx_zeta.clear();
|
||||
m_delx_eta.clear();
|
||||
m_delx_xi.clear();
|
||||
|
||||
m_delv_zeta.clear();
|
||||
m_delv_eta.clear();
|
||||
m_delv_xi.clear();
|
||||
}
|
||||
|
||||
void AllocateStrains(Int_t numElem)
|
||||
{
|
||||
m_dxx.resize(numElem);
|
||||
m_dyy.resize(numElem);
|
||||
m_dzz.resize(numElem);
|
||||
}
|
||||
|
||||
void DeallocateStrains()
|
||||
{
|
||||
m_dzz.clear();
|
||||
m_dyy.clear();
|
||||
m_dxx.clear();
|
||||
}
|
||||
|
||||
//
|
||||
// ACCESSORS
|
||||
//
|
||||
|
||||
// Node-centered
|
||||
|
||||
// Nodal coordinates
|
||||
KOKKOS_INLINE_FUNCTION Real_t& x(const Index_t idx) const { return m_x[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& y(const Index_t idx) const { return m_y[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& z(const Index_t idx) const { return m_z[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_x(const Index_t idx) const { return m_c_x[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_y(const Index_t idx) const { return m_c_y[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_z(const Index_t idx) const { return m_c_z[idx]; }
|
||||
|
||||
// Nodal velocities
|
||||
KOKKOS_INLINE_FUNCTION Real_t& xd(const Index_t idx) const { return m_xd[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& yd(const Index_t idx) const { return m_yd[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& zd(const Index_t idx) const { return m_zd[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_xd(const Index_t idx) const { return m_c_xd[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_yd(const Index_t idx) const { return m_c_yd[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_zd(const Index_t idx) const { return m_c_zd[idx]; }
|
||||
|
||||
// Nodal accelerations
|
||||
KOKKOS_INLINE_FUNCTION Real_t& xdd(const Index_t idx) const { return m_xdd[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& ydd(const Index_t idx) const { return m_ydd[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& zdd(const Index_t idx) const { return m_zdd[idx]; }
|
||||
|
||||
// Nodal forces
|
||||
KOKKOS_INLINE_FUNCTION Real_t& fx(const Index_t idx) const { return m_fx[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& fy(const Index_t idx) const { return m_fy[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& fz(const Index_t idx) const { return m_fz[idx]; }
|
||||
|
||||
// Nodal mass
|
||||
KOKKOS_INLINE_FUNCTION Real_t& nodalMass(const Index_t idx) const
|
||||
{
|
||||
return m_nodalMass[idx];
|
||||
}
|
||||
|
||||
// Nodes on symmertry planes
|
||||
Index_t symmX(const Index_t idx) const { return m_symmX[idx]; }
|
||||
Index_t symmY(const Index_t idx) const { return m_symmY[idx]; }
|
||||
Index_t symmZ(const Index_t idx) const { return m_symmZ[idx]; }
|
||||
bool symmXempty() { return m_symmX.empty(); }
|
||||
bool symmYempty() { return m_symmY.empty(); }
|
||||
bool symmZempty() { return m_symmZ.empty(); }
|
||||
|
||||
//
|
||||
// Element-centered
|
||||
//
|
||||
Index_t& regElemSize(Index_t idx) { return m_regElemSize[idx]; }
|
||||
Index_t& regNumList(Index_t idx) { return m_regNumList[idx]; }
|
||||
Index_t* regNumList() { return &m_regNumList[0]; }
|
||||
Index_t* regElemlist(Int_t r) { return m_regElemlist[r]; }
|
||||
Index_t& regElemlist(const Int_t r, Index_t idx) const
|
||||
{
|
||||
return m_regElemlist[r][idx];
|
||||
}
|
||||
|
||||
Index_t* nodelist(Index_t idx) const { return &m_nodelist[Index_t(8) * idx]; }
|
||||
|
||||
// elem connectivities through face
|
||||
Index_t& lxim(const Index_t idx) const { return m_lxim[idx]; }
|
||||
Index_t& lxip(const Index_t idx) const { return m_lxip[idx]; }
|
||||
Index_t& letam(const Index_t idx) const { return m_letam[idx]; }
|
||||
Index_t& letap(const Index_t idx) const { return m_letap[idx]; }
|
||||
Index_t& lzetam(const Index_t idx) const { return m_lzetam[idx]; }
|
||||
Index_t& lzetap(const Index_t idx) const { return m_lzetap[idx]; }
|
||||
|
||||
// elem face symm/free-surface flag
|
||||
Int_t& elemBC(const Index_t idx) const { return m_elemBC[idx]; }
|
||||
|
||||
// Principal strains - temporary
|
||||
KOKKOS_INLINE_FUNCTION Real_t& dxx(const Index_t idx) const { return m_dxx[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& dyy(const Index_t idx) const { return m_dyy[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& dzz(const Index_t idx) const { return m_dzz[idx]; }
|
||||
|
||||
// New relative volume - temporary
|
||||
KOKKOS_INLINE_FUNCTION Real_t& vnew(const Index_t idx) const { return m_vnew[idx]; }
|
||||
|
||||
// Velocity gradient - temporary
|
||||
KOKKOS_INLINE_FUNCTION Real_t& delv_xi(const Index_t idx) const
|
||||
{
|
||||
return m_delv_xi[idx];
|
||||
}
|
||||
KOKKOS_INLINE_FUNCTION Real_t& delv_eta(const Index_t idx) const
|
||||
{
|
||||
return m_delv_eta[idx];
|
||||
}
|
||||
KOKKOS_INLINE_FUNCTION Real_t& delv_zeta(const Index_t idx) const
|
||||
{
|
||||
return m_delv_zeta[idx];
|
||||
}
|
||||
|
||||
// Position gradient - temporary
|
||||
KOKKOS_INLINE_FUNCTION Real_t& delx_xi(const Index_t idx) const
|
||||
{
|
||||
return m_delx_xi[idx];
|
||||
}
|
||||
KOKKOS_INLINE_FUNCTION Real_t& delx_eta(const Index_t idx) const
|
||||
{
|
||||
return m_delx_eta[idx];
|
||||
}
|
||||
KOKKOS_INLINE_FUNCTION Real_t& delx_zeta(const Index_t idx) const
|
||||
{
|
||||
return m_delx_zeta[idx];
|
||||
}
|
||||
// Energy
|
||||
KOKKOS_INLINE_FUNCTION Real_t& e(const Index_t idx) const { return m_e[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_e(const Index_t idx) const { return m_c_e[idx]; }
|
||||
|
||||
// Pressure
|
||||
KOKKOS_INLINE_FUNCTION Real_t& p(const Index_t idx) const { return m_p[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_p(const Index_t idx) const { return m_c_p[idx]; }
|
||||
|
||||
// Artificial viscosity
|
||||
KOKKOS_INLINE_FUNCTION Real_t& q(const Index_t idx) const { return m_q[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_q(const Index_t idx) const { return m_c_q[idx]; }
|
||||
|
||||
// Linear term for q
|
||||
KOKKOS_INLINE_FUNCTION Real_t& ql(const Index_t idx) const { return m_ql[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_ql(const Index_t idx) const { return m_c_ql[idx]; }
|
||||
// Quadratic term for q
|
||||
KOKKOS_INLINE_FUNCTION Real_t& qq(const Index_t idx) const { return m_qq[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_qq(const Index_t idx) const { return m_c_qq[idx]; }
|
||||
|
||||
// Relative volume
|
||||
KOKKOS_INLINE_FUNCTION Real_t& v(const Index_t idx) const { return m_v[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t& delv(const Index_t idx) const { return m_delv[idx]; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t c_delv(const Index_t idx) const
|
||||
{
|
||||
return m_c_delv[idx];
|
||||
}
|
||||
|
||||
// Reference volume
|
||||
KOKKOS_INLINE_FUNCTION Real_t& volo(Index_t idx) const { return m_volo[idx]; }
|
||||
|
||||
// volume derivative over volume
|
||||
KOKKOS_INLINE_FUNCTION Real_t& vdov(Index_t idx) const { return m_vdov[idx]; }
|
||||
|
||||
// Element characteristic length
|
||||
KOKKOS_INLINE_FUNCTION Real_t& arealg(Index_t idx) const { return m_arealg[idx]; }
|
||||
|
||||
// Sound speed
|
||||
KOKKOS_INLINE_FUNCTION Real_t& ss(const Index_t idx) const { return m_ss[idx]; }
|
||||
|
||||
// Element mass
|
||||
KOKKOS_INLINE_FUNCTION Real_t& elemMass(const Index_t idx) const
|
||||
{
|
||||
return m_elemMass[idx];
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION Index_t nodeElemCount(Index_t idx) const
|
||||
{
|
||||
return m_nodeElemStart[idx + 1] - m_nodeElemStart[idx];
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION Index_t* nodeElemCornerList(Index_t idx) const
|
||||
{
|
||||
return &m_nodeElemCornerList[m_nodeElemStart[idx]];
|
||||
}
|
||||
|
||||
// Parameters
|
||||
|
||||
// Cutoffs
|
||||
KOKKOS_INLINE_FUNCTION Real_t u_cut() const { return m_u_cut; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t e_cut() const { return m_e_cut; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t p_cut() const { return m_p_cut; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t q_cut() const { return m_q_cut; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t v_cut() const { return m_v_cut; }
|
||||
|
||||
// Other constants (usually are settable via input file in real codes)
|
||||
KOKKOS_INLINE_FUNCTION Real_t hgcoef() const { return m_hgcoef; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t qstop() const { return m_qstop; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t monoq_max_slope() const { return m_monoq_max_slope; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t monoq_limiter_mult() const
|
||||
{
|
||||
return m_monoq_limiter_mult;
|
||||
}
|
||||
KOKKOS_INLINE_FUNCTION Real_t ss4o3() const { return m_ss4o3; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t qlc_monoq() const { return m_qlc_monoq; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t qqc_monoq() const { return m_qqc_monoq; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t qqc() const { return m_qqc; }
|
||||
|
||||
KOKKOS_INLINE_FUNCTION Real_t eosvmax() const { return m_eosvmax; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t eosvmin() const { return m_eosvmin; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t pmin() const { return m_pmin; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t emin() const { return m_emin; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t dvovmax() const { return m_dvovmax; }
|
||||
KOKKOS_INLINE_FUNCTION Real_t refdens() const { return m_refdens; }
|
||||
|
||||
// Timestep controls, etc...
|
||||
Real_t& time() { return m_time; }
|
||||
Real_t& deltatime() { return m_deltatime; }
|
||||
Real_t& deltatimemultlb() { return m_deltatimemultlb; }
|
||||
Real_t& deltatimemultub() { return m_deltatimemultub; }
|
||||
Real_t& stoptime() { return m_stoptime; }
|
||||
Real_t& dtcourant() { return m_dtcourant; }
|
||||
Real_t& dthydro() { return m_dthydro; }
|
||||
Real_t& dtmax() { return m_dtmax; }
|
||||
Real_t& dtfixed() { return m_dtfixed; }
|
||||
|
||||
Int_t& cycle() { return m_cycle; }
|
||||
Index_t& numRanks() { return m_numRanks; }
|
||||
|
||||
Index_t& colLoc() { return m_colLoc; }
|
||||
Index_t& rowLoc() { return m_rowLoc; }
|
||||
Index_t& planeLoc() { return m_planeLoc; }
|
||||
Index_t& tp() { return m_tp; }
|
||||
|
||||
Index_t& sizeX() { return m_sizeX; }
|
||||
Index_t& sizeY() { return m_sizeY; }
|
||||
Index_t& sizeZ() { return m_sizeZ; }
|
||||
Index_t& numReg() { return m_numReg; }
|
||||
Int_t& cost() { return m_cost; }
|
||||
Index_t& numElem() { return m_numElem; }
|
||||
Index_t& numNode() { return m_numNode; }
|
||||
|
||||
Index_t& maxPlaneSize() { return m_maxPlaneSize; }
|
||||
Index_t& maxEdgeSize() { return m_maxEdgeSize; }
|
||||
|
||||
//
|
||||
// MPI-Related additional data
|
||||
//
|
||||
|
||||
#if USE_MPI
|
||||
// Communication Work space
|
||||
Real_t* commDataSend;
|
||||
Real_t* commDataRecv;
|
||||
|
||||
// Maximum number of block neighbors
|
||||
MPI_Request recvRequest[26]; // 6 faces + 12 edges + 8 corners
|
||||
MPI_Request sendRequest[26]; // 6 faces + 12 edges + 8 corners
|
||||
#endif
|
||||
|
||||
private:
|
||||
void BuildMesh(Int_t nx, Int_t edgeNodes, Int_t edgeElems);
|
||||
void SetupThreadSupportStructures();
|
||||
void CreateRegionIndexSets(Int_t nreg, Int_t balance);
|
||||
void SetupCommBuffers(Int_t edgeNodes);
|
||||
void SetupSymmetryPlanes(Int_t edgeNodes);
|
||||
void SetupElementConnectivities(Int_t edgeElems);
|
||||
void SetupBoundaryConditions(Int_t edgeElems);
|
||||
|
||||
//
|
||||
// IMPLEMENTATION
|
||||
//
|
||||
|
||||
/* Node-centered */
|
||||
Kokkos::vector<Real_t> m_x; /* coordinates */
|
||||
Kokkos::vector<Real_t> m_y;
|
||||
Kokkos::vector<Real_t> m_z;
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_x; /* coordinates */
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_y; /* coordinates */
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_z; /* coordinates */
|
||||
|
||||
Kokkos::vector<Real_t> m_xd; /* velocities */
|
||||
Kokkos::vector<Real_t> m_yd;
|
||||
Kokkos::vector<Real_t> m_zd;
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_xd; /* coordinates */
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_yd; /* coordinates */
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_zd; /* coordinates */
|
||||
|
||||
Kokkos::vector<Real_t> m_xdd; /* accelerations */
|
||||
Kokkos::vector<Real_t> m_ydd;
|
||||
Kokkos::vector<Real_t> m_zdd;
|
||||
|
||||
Kokkos::vector<Real_t> m_fx; /* forces */
|
||||
Kokkos::vector<Real_t> m_fy;
|
||||
Kokkos::vector<Real_t> m_fz;
|
||||
|
||||
Kokkos::vector<Real_t> m_nodalMass; /* mass */
|
||||
|
||||
Kokkos::vector<Index_t> m_symmX; /* symmetry plane nodesets */
|
||||
Kokkos::vector<Index_t> m_symmY;
|
||||
Kokkos::vector<Index_t> m_symmZ;
|
||||
|
||||
// Element-centered
|
||||
|
||||
// Region information
|
||||
Int_t m_numReg;
|
||||
Int_t m_cost; // imbalance cost
|
||||
Index_t* m_regElemSize; // Size of region sets
|
||||
Index_t* m_regNumList; // Region number per domain element
|
||||
Index_t** m_regElemlist; // region indexset
|
||||
|
||||
Kokkos::vector<Index_t> m_nodelist; /* elemToNode connectivity */
|
||||
|
||||
Kokkos::vector<Index_t> m_lxim; /* element connectivity across each face */
|
||||
Kokkos::vector<Index_t> m_lxip;
|
||||
Kokkos::vector<Index_t> m_letam;
|
||||
Kokkos::vector<Index_t> m_letap;
|
||||
Kokkos::vector<Index_t> m_lzetam;
|
||||
Kokkos::vector<Index_t> m_lzetap;
|
||||
|
||||
Kokkos::vector<Int_t> m_elemBC; /* symmetry/free-surface flags for each elem face */
|
||||
|
||||
Kokkos::vector<Real_t> m_dxx; /* principal strains -- temporary */
|
||||
Kokkos::vector<Real_t> m_dyy;
|
||||
Kokkos::vector<Real_t> m_dzz;
|
||||
|
||||
Kokkos::vector<Real_t> m_delv_xi; /* velocity gradient -- temporary */
|
||||
Kokkos::vector<Real_t> m_delv_eta;
|
||||
Kokkos::vector<Real_t> m_delv_zeta;
|
||||
|
||||
Kokkos::vector<Real_t> m_delx_xi; /* coordinate gradient -- temporary */
|
||||
Kokkos::vector<Real_t> m_delx_eta;
|
||||
Kokkos::vector<Real_t> m_delx_zeta;
|
||||
|
||||
Kokkos::vector<Real_t> m_e; /* energy */
|
||||
|
||||
Kokkos::vector<Real_t> m_p; /* pressure */
|
||||
Kokkos::vector<Real_t> m_q; /* q */
|
||||
Kokkos::vector<Real_t> m_ql; /* linear term for q */
|
||||
Kokkos::vector<Real_t> m_qq; /* quadratic term for q */
|
||||
|
||||
Kokkos::vector<Real_t> m_v; /* relative volume */
|
||||
Kokkos::vector<Real_t> m_volo; /* reference volume */
|
||||
Kokkos::vector<Real_t> m_vnew; /* new relative volume -- temporary */
|
||||
Kokkos::vector<Real_t> m_delv; /* m_vnew - m_v */
|
||||
Kokkos::vector<Real_t> m_vdov; /* volume derivative over volume */
|
||||
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_e; /* coordinates */
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_p; /* coordinates */
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_q; /* coordinates */
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_ql; /* coordinates */
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_qq; /* coordinates */
|
||||
Kokkos::View<const Real_t*, Kokkos::MemoryTraits<Kokkos::RandomAccess>>
|
||||
m_c_delv; /* coordinates */
|
||||
|
||||
Kokkos::vector<Real_t> m_arealg; /* characteristic length of an element */
|
||||
|
||||
Kokkos::vector<Real_t> m_ss; /* "sound speed" */
|
||||
|
||||
Kokkos::vector<Real_t> m_elemMass; /* mass */
|
||||
|
||||
// Cutoffs (treat as constants)
|
||||
const Real_t m_e_cut; // energy tolerance
|
||||
const Real_t m_p_cut; // pressure tolerance
|
||||
const Real_t m_q_cut; // q tolerance
|
||||
const Real_t m_v_cut; // relative volume tolerance
|
||||
const Real_t m_u_cut; // velocity tolerance
|
||||
|
||||
// Other constants (usually setable, but hardcoded in this proxy app)
|
||||
|
||||
const Real_t m_hgcoef; // hourglass control
|
||||
const Real_t m_ss4o3;
|
||||
const Real_t m_qstop; // excessive q indicator
|
||||
const Real_t m_monoq_max_slope;
|
||||
const Real_t m_monoq_limiter_mult;
|
||||
const Real_t m_qlc_monoq; // linear term coef for q
|
||||
const Real_t m_qqc_monoq; // quadratic term coef for q
|
||||
const Real_t m_qqc;
|
||||
const Real_t m_eosvmax;
|
||||
const Real_t m_eosvmin;
|
||||
const Real_t m_pmin; // pressure floor
|
||||
const Real_t m_emin; // energy floor
|
||||
const Real_t m_dvovmax; // maximum allowable volume change
|
||||
const Real_t m_refdens; // reference density
|
||||
|
||||
// Variables to keep track of timestep, simulation time, and cycle
|
||||
Real_t m_dtcourant; // courant constraint
|
||||
Real_t m_dthydro; // volume change constraint
|
||||
Int_t m_cycle; // iteration count for simulation
|
||||
Real_t m_dtfixed; // fixed time increment
|
||||
Real_t m_time; // current time
|
||||
Real_t m_deltatime; // variable time increment
|
||||
Real_t m_deltatimemultlb;
|
||||
Real_t m_deltatimemultub;
|
||||
Real_t m_dtmax; // maximum allowable time increment
|
||||
Real_t m_stoptime; // end time for simulation
|
||||
|
||||
Int_t m_numRanks;
|
||||
|
||||
Index_t m_colLoc;
|
||||
Index_t m_rowLoc;
|
||||
Index_t m_planeLoc;
|
||||
Index_t m_tp;
|
||||
|
||||
Index_t m_sizeX;
|
||||
Index_t m_sizeY;
|
||||
Index_t m_sizeZ;
|
||||
Index_t m_numElem;
|
||||
Index_t m_numNode;
|
||||
|
||||
Index_t m_maxPlaneSize;
|
||||
Index_t m_maxEdgeSize;
|
||||
|
||||
// OMP hack
|
||||
Index_t* m_nodeElemStart;
|
||||
Index_t* m_nodeElemCornerList;
|
||||
|
||||
// Used in setup
|
||||
Index_t m_rowMin, m_rowMax;
|
||||
Index_t m_colMin, m_colMax;
|
||||
Index_t m_planeMin, m_planeMax;
|
||||
};
|
||||
typedef Real_t& (Domain::*Domain_member)(Index_t) const;
|
||||
|
||||
struct cmdLineOpts
|
||||
{
|
||||
Int_t its; // -i
|
||||
Int_t nx; // -s
|
||||
Int_t numReg; // -r
|
||||
Int_t numFiles; // -f
|
||||
Int_t showProg; // -p
|
||||
Int_t quiet; // -q
|
||||
Int_t viz; // -v
|
||||
Int_t cost; // -c
|
||||
Int_t balance; // -b
|
||||
Int_t do_atomic; // -a
|
||||
};
|
||||
|
||||
// Function Prototypes
|
||||
|
||||
// lulesh-par
|
||||
/*Real_t CalcElemVolume( const Real_t x[8],
|
||||
const Real_t y[8],
|
||||
const Real_t z[8]);*/
|
||||
|
||||
// lulesh-util
|
||||
void
|
||||
ParseCommandLineOptions(int argc, char* argv[], Int_t myRank, struct cmdLineOpts* opts);
|
||||
void
|
||||
VerifyAndWriteFinalOutput(Real_t elapsed_time, Domain& locDom, Int_t nx, Int_t numRanks);
|
||||
|
||||
// lulesh-viz
|
||||
void
|
||||
DumpToVisit(Domain& domain, int numFiles, int myRank, int numRanks);
|
||||
|
||||
// lulesh-comm
|
||||
void
|
||||
CommRecv(Domain& domain, Int_t msgType, Index_t xferFields, Index_t dx, Index_t dy,
|
||||
Index_t dz, bool doRecv, bool planeOnly);
|
||||
void
|
||||
CommSend(Domain& domain, Int_t msgType, Index_t xferFields, Domain_member* fieldData,
|
||||
Index_t dx, Index_t dy, Index_t dz, bool doSend, bool planeOnly);
|
||||
void
|
||||
CommSBN(Domain& domain, Int_t xferFields, Domain_member* fieldData);
|
||||
void
|
||||
CommSyncPosVel(Domain& domain);
|
||||
void
|
||||
CommMonoQ(Domain& domain);
|
||||
|
||||
// lulesh-init
|
||||
void
|
||||
InitMeshDecomp(Int_t numRanks, Int_t myRank, Int_t* col, Int_t* row, Int_t* plane,
|
||||
Int_t* side);
|
||||
|
||||
/*********************************/
|
||||
/* Data structure implementation */
|
||||
/*********************************/
|
||||
|
||||
/* might want to add access methods so that memory can be */
|
||||
/* better managed, as in luleshFT */
|
||||
|
||||
template <typename T>
|
||||
T*
|
||||
Allocate(size_t size)
|
||||
{
|
||||
return static_cast<T*>(Kokkos::kokkos_malloc<>(sizeof(T) * size));
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void
|
||||
Release(T** ptr)
|
||||
{
|
||||
if(*ptr != NULL)
|
||||
{
|
||||
Kokkos::kokkos_free<>(*ptr);
|
||||
*ptr = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
struct MinFinder
|
||||
{
|
||||
Real_t val;
|
||||
int i;
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
||||
MinFinder()
|
||||
: val(100000000000000000000.0000)
|
||||
, i(-1)
|
||||
{}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
MinFinder(const double& val_, const int& i_)
|
||||
: val(val_)
|
||||
, i(i_)
|
||||
{}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
MinFinder(const MinFinder& src)
|
||||
: val(src.val)
|
||||
, i(src.i)
|
||||
{}
|
||||
|
||||
// overloading += operator to do the max assignment
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator+=(MinFinder& src)
|
||||
{
|
||||
if(src.val < val)
|
||||
{
|
||||
val = src.val;
|
||||
i = src.i;
|
||||
}
|
||||
}
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator+=(const volatile MinFinder& src) volatile
|
||||
{
|
||||
if(src.val < val)
|
||||
{
|
||||
val = src.val;
|
||||
i = src.i;
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
struct reduce_double3
|
||||
{
|
||||
double x, y, z;
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
reduce_double3()
|
||||
{
|
||||
x = 0.0;
|
||||
y = 0.0;
|
||||
z = 0.0;
|
||||
}
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator+=(const reduce_double3& src)
|
||||
{
|
||||
x += src.x;
|
||||
y += src.y;
|
||||
z += src.z;
|
||||
}
|
||||
};
|
||||
@@ -0,0 +1,651 @@
|
||||
#if !defined(USE_MPI)
|
||||
# error "You should specify USE_MPI=0 or USE_MPI=1 on the compile line"
|
||||
#endif
|
||||
|
||||
// OpenMP will be compiled in if this flag is set to 1 AND the compiler beging
|
||||
// used supports it (i.e. the _OPENMP symbol is defined)
|
||||
#define USE_OMP 1
|
||||
|
||||
#if USE_MPI
|
||||
# include <mpi.h>
|
||||
#endif
|
||||
|
||||
#include <mpi.h>
|
||||
|
||||
/*
|
||||
define one of these three symbols:
|
||||
|
||||
SEDOV_SYNC_POS_VEL_NONE
|
||||
SEDOV_SYNC_POS_VEL_EARLY
|
||||
SEDOV_SYNC_POS_VEL_LATE
|
||||
*/
|
||||
|
||||
#define SEDOV_SYNC_POS_VEL_EARLY 1
|
||||
|
||||
#include <math.h>
|
||||
#include <vector>
|
||||
|
||||
//**************************************************
|
||||
// Allow flexibility for arithmetic representations
|
||||
//**************************************************
|
||||
|
||||
#define MAX(a, b) (((a) > (b)) ? (a) : (b))
|
||||
|
||||
// Precision specification
|
||||
typedef float real4;
|
||||
typedef double real8;
|
||||
typedef long double real10; // 10 bytes on x86
|
||||
|
||||
typedef int Index_t; // array subscript and loop index
|
||||
typedef real8 Real_t; // floating point representation
|
||||
typedef int Int_t; // integer representation
|
||||
|
||||
enum
|
||||
{
|
||||
VolumeError = -1,
|
||||
QStopError = -2
|
||||
};
|
||||
|
||||
inline real4
|
||||
SQRT(real4 arg)
|
||||
{
|
||||
return sqrtf(arg);
|
||||
}
|
||||
inline real8
|
||||
SQRT(real8 arg)
|
||||
{
|
||||
return sqrt(arg);
|
||||
}
|
||||
inline real10
|
||||
SQRT(real10 arg)
|
||||
{
|
||||
return sqrtl(arg);
|
||||
}
|
||||
|
||||
inline real4
|
||||
CBRT(real4 arg)
|
||||
{
|
||||
return cbrtf(arg);
|
||||
}
|
||||
inline real8
|
||||
CBRT(real8 arg)
|
||||
{
|
||||
return cbrt(arg);
|
||||
}
|
||||
inline real10
|
||||
CBRT(real10 arg)
|
||||
{
|
||||
return cbrtl(arg);
|
||||
}
|
||||
|
||||
inline real4
|
||||
FABS(real4 arg)
|
||||
{
|
||||
return fabsf(arg);
|
||||
}
|
||||
inline real8
|
||||
FABS(real8 arg)
|
||||
{
|
||||
return fabs(arg);
|
||||
}
|
||||
inline real10
|
||||
FABS(real10 arg)
|
||||
{
|
||||
return fabsl(arg);
|
||||
}
|
||||
|
||||
// Stuff needed for boundary conditions
|
||||
// 2 BCs on each of 6 hexahedral faces (12 bits)
|
||||
#define XI_M 0x00007
|
||||
#define XI_M_SYMM 0x00001
|
||||
#define XI_M_FREE 0x00002
|
||||
#define XI_M_COMM 0x00004
|
||||
|
||||
#define XI_P 0x00038
|
||||
#define XI_P_SYMM 0x00008
|
||||
#define XI_P_FREE 0x00010
|
||||
#define XI_P_COMM 0x00020
|
||||
|
||||
#define ETA_M 0x001c0
|
||||
#define ETA_M_SYMM 0x00040
|
||||
#define ETA_M_FREE 0x00080
|
||||
#define ETA_M_COMM 0x00100
|
||||
|
||||
#define ETA_P 0x00e00
|
||||
#define ETA_P_SYMM 0x00200
|
||||
#define ETA_P_FREE 0x00400
|
||||
#define ETA_P_COMM 0x00800
|
||||
|
||||
#define ZETA_M 0x07000
|
||||
#define ZETA_M_SYMM 0x01000
|
||||
#define ZETA_M_FREE 0x02000
|
||||
#define ZETA_M_COMM 0x04000
|
||||
|
||||
#define ZETA_P 0x38000
|
||||
#define ZETA_P_SYMM 0x08000
|
||||
#define ZETA_P_FREE 0x10000
|
||||
#define ZETA_P_COMM 0x20000
|
||||
|
||||
// MPI Message Tags
|
||||
#define MSG_COMM_SBN 1024
|
||||
#define MSG_SYNC_POS_VEL 2048
|
||||
#define MSG_MONOQ 3072
|
||||
|
||||
#define MAX_FIELDS_PER_MPI_COMM 6
|
||||
|
||||
// Assume 128 byte coherence
|
||||
// Assume Real_t is an "integral power of 2" bytes wide
|
||||
#define CACHE_COHERENCE_PAD_REAL (128 / sizeof(Real_t))
|
||||
|
||||
#define CACHE_ALIGN_REAL(n) \
|
||||
(((n) + (CACHE_COHERENCE_PAD_REAL - 1)) & ~(CACHE_COHERENCE_PAD_REAL - 1))
|
||||
|
||||
//////////////////////////////////////////////////////
|
||||
// Primary data structure
|
||||
//////////////////////////////////////////////////////
|
||||
|
||||
/*
|
||||
* The implementation of the data abstraction used for lulesh
|
||||
* resides entirely in the Domain class below. You can change
|
||||
* grouping and interleaving of fields here to maximize data layout
|
||||
* efficiency for your underlying architecture or compiler.
|
||||
*
|
||||
* For example, fields can be implemented as STL objects or
|
||||
* raw array pointers. As another example, individual fields
|
||||
* m_x, m_y, m_z could be budled into
|
||||
*
|
||||
* struct { Real_t x, y, z ; } *m_coord ;
|
||||
*
|
||||
* allowing accessor functions such as
|
||||
*
|
||||
* "Real_t &x(Index_t idx) { return m_coord[idx].x ; }"
|
||||
* "Real_t &y(Index_t idx) { return m_coord[idx].y ; }"
|
||||
* "Real_t &z(Index_t idx) { return m_coord[idx].z ; }"
|
||||
*/
|
||||
|
||||
class Domain
|
||||
{
|
||||
public:
|
||||
// Constructor
|
||||
Domain(Int_t numRanks, Index_t colLoc, Index_t rowLoc, Index_t planeLoc, Index_t nx,
|
||||
Int_t tp, Int_t nr, Int_t balance, Int_t cost);
|
||||
|
||||
//
|
||||
// ALLOCATION
|
||||
//
|
||||
|
||||
void AllocateNodePersistent(Int_t numNode) // Node-centered
|
||||
{
|
||||
m_coord.resize(numNode); // coordinates
|
||||
|
||||
m_vel.resize(numNode); // velocities
|
||||
|
||||
m_acc.resize(numNode); // accelerations
|
||||
|
||||
m_force.resize(numNode); // forces
|
||||
|
||||
m_nodalMass.resize(numNode); // mass
|
||||
}
|
||||
|
||||
void AllocateElemPersistent(Int_t numElem) // Elem-centered
|
||||
{
|
||||
m_nodelist.resize(8 * numElem);
|
||||
|
||||
// elem connectivities through face
|
||||
m_faceToElem.resize(numElem);
|
||||
|
||||
m_elemBC.resize(numElem);
|
||||
|
||||
m_e.resize(numElem);
|
||||
|
||||
m_pq.resize(numElem);
|
||||
|
||||
m_qlqq.resize(numElem);
|
||||
|
||||
m_vol.resize(numElem);
|
||||
|
||||
m_delv.resize(numElem);
|
||||
m_vdov.resize(numElem);
|
||||
|
||||
m_arealg.resize(numElem);
|
||||
|
||||
m_ss.resize(numElem);
|
||||
|
||||
m_elemMass.resize(numElem);
|
||||
}
|
||||
|
||||
void AllocateGradients(Int_t numElem, Int_t allElem)
|
||||
{
|
||||
// Position gradients
|
||||
m_delx_xi.resize(numElem);
|
||||
m_delx_eta.resize(numElem);
|
||||
m_delx_zeta.resize(numElem);
|
||||
|
||||
// Velocity gradients
|
||||
m_delv_xi.resize(allElem);
|
||||
m_delv_eta.resize(allElem);
|
||||
m_delv_zeta.resize(allElem);
|
||||
}
|
||||
|
||||
void DeallocateGradients()
|
||||
{
|
||||
m_delx_zeta.clear();
|
||||
m_delx_eta.clear();
|
||||
m_delx_xi.clear();
|
||||
|
||||
m_delv_zeta.clear();
|
||||
m_delv_eta.clear();
|
||||
m_delv_xi.clear();
|
||||
}
|
||||
|
||||
void AllocateStrains(Int_t numElem)
|
||||
{
|
||||
m_dxx.resize(numElem);
|
||||
m_dyy.resize(numElem);
|
||||
m_dzz.resize(numElem);
|
||||
}
|
||||
|
||||
void DeallocateStrains()
|
||||
{
|
||||
m_dzz.clear();
|
||||
m_dyy.clear();
|
||||
m_dxx.clear();
|
||||
}
|
||||
|
||||
//
|
||||
// ACCESSORS
|
||||
//
|
||||
|
||||
// Node-centered
|
||||
|
||||
// Nodal coordinates
|
||||
Real_t& x(Index_t idx) { return m_coord[idx].x; }
|
||||
Real_t& y(Index_t idx) { return m_coord[idx].y; }
|
||||
Real_t& z(Index_t idx) { return m_coord[idx].z; }
|
||||
|
||||
// Nodal velocities
|
||||
Real_t& xd(Index_t idx) { return m_vel[idx].x; }
|
||||
Real_t& yd(Index_t idx) { return m_vel[idx].y; }
|
||||
Real_t& zd(Index_t idx) { return m_vel[idx].z; }
|
||||
|
||||
// Nodal accelerations
|
||||
Real_t& xdd(Index_t idx) { return m_acc[idx].x; }
|
||||
Real_t& ydd(Index_t idx) { return m_acc[idx].y; }
|
||||
Real_t& zdd(Index_t idx) { return m_acc[idx].z; }
|
||||
|
||||
// Nodal forces
|
||||
Real_t& fx(Index_t idx) { return m_force[idx].x; }
|
||||
Real_t& fy(Index_t idx) { return m_force[idx].y; }
|
||||
Real_t& fz(Index_t idx) { return m_force[idx].z; }
|
||||
|
||||
// Nodal mass
|
||||
Real_t& nodalMass(Index_t idx) { return m_nodalMass[idx]; }
|
||||
|
||||
// Nodes on symmertry planes
|
||||
Index_t symmX(Index_t idx) { return m_symmX[idx]; }
|
||||
Index_t symmY(Index_t idx) { return m_symmY[idx]; }
|
||||
Index_t symmZ(Index_t idx) { return m_symmZ[idx]; }
|
||||
bool symmXempty() { return m_symmX.empty(); }
|
||||
bool symmYempty() { return m_symmY.empty(); }
|
||||
bool symmZempty() { return m_symmZ.empty(); }
|
||||
|
||||
//
|
||||
// Element-centered
|
||||
//
|
||||
Index_t& regElemSize(Index_t idx) { return m_regElemSize[idx]; }
|
||||
Index_t& regNumList(Index_t idx) { return m_regNumList[idx]; }
|
||||
Index_t* regNumList() { return &m_regNumList[0]; }
|
||||
Index_t* regElemlist(Int_t r) { return m_regElemlist[r]; }
|
||||
Index_t& regElemlist(Int_t r, Index_t idx) { return m_regElemlist[r][idx]; }
|
||||
|
||||
Index_t* nodelist(Index_t idx) { return &m_nodelist[Index_t(8) * idx]; }
|
||||
|
||||
// elem connectivities through face
|
||||
Index_t& lxim(Index_t idx) { return m_faceToElem[idx].lxim; }
|
||||
Index_t& lxip(Index_t idx) { return m_faceToElem[idx].lxip; }
|
||||
Index_t& letam(Index_t idx) { return m_faceToElem[idx].letam; }
|
||||
Index_t& letap(Index_t idx) { return m_faceToElem[idx].letap; }
|
||||
Index_t& lzetam(Index_t idx) { return m_faceToElem[idx].lzetam; }
|
||||
Index_t& lzetap(Index_t idx) { return m_faceToElem[idx].lzetap; }
|
||||
|
||||
// elem face symm/free-surface flag
|
||||
Int_t& elemBC(Index_t idx) { return m_elemBC[idx]; }
|
||||
|
||||
// Principal strains - temporary
|
||||
Real_t& dxx(Index_t idx) { return m_dxx[idx]; }
|
||||
Real_t& dyy(Index_t idx) { return m_dyy[idx]; }
|
||||
Real_t& dzz(Index_t idx) { return m_dzz[idx]; }
|
||||
|
||||
// Velocity gradient - temporary
|
||||
Real_t& delv_xi(Index_t idx) { return m_delv_xi[idx]; }
|
||||
Real_t& delv_eta(Index_t idx) { return m_delv_eta[idx]; }
|
||||
Real_t& delv_zeta(Index_t idx) { return m_delv_zeta[idx]; }
|
||||
|
||||
// Position gradient - temporary
|
||||
Real_t& delx_xi(Index_t idx) { return m_delx_xi[idx]; }
|
||||
Real_t& delx_eta(Index_t idx) { return m_delx_eta[idx]; }
|
||||
Real_t& delx_zeta(Index_t idx) { return m_delx_zeta[idx]; }
|
||||
|
||||
// Energy
|
||||
Real_t& e(Index_t idx) { return m_e[idx]; }
|
||||
|
||||
// Pressure
|
||||
Real_t& p(Index_t idx) { return m_pq[idx].p; }
|
||||
|
||||
// Artificial viscosity
|
||||
Real_t& q(Index_t idx) { return m_pq[idx].q; }
|
||||
|
||||
// Linear term for q
|
||||
Real_t& ql(Index_t idx) { return m_qlqq[idx].ql; }
|
||||
// Quadratic term for q
|
||||
Real_t& qq(Index_t idx) { return m_qlqq[idx].qq; }
|
||||
|
||||
Real_t& delv(Index_t idx) { return m_delv[idx]; }
|
||||
|
||||
// Relative volume
|
||||
Real_t& v(Index_t idx) { return m_vol[idx].v; }
|
||||
// Reference volume
|
||||
Real_t& volo(Index_t idx) { return m_vol[idx].volo; }
|
||||
|
||||
// volume derivative over volume
|
||||
Real_t& vdov(Index_t idx) { return m_vdov[idx]; }
|
||||
|
||||
// Element characteristic length
|
||||
Real_t& arealg(Index_t idx) { return m_arealg[idx]; }
|
||||
|
||||
// Sound speed
|
||||
Real_t& ss(Index_t idx) { return m_ss[idx]; }
|
||||
|
||||
// Element mass
|
||||
Real_t& elemMass(Index_t idx) { return m_elemMass[idx]; }
|
||||
|
||||
Index_t nodeElemCount(Index_t idx)
|
||||
{
|
||||
return m_nodeElemStart[idx + 1] - m_nodeElemStart[idx];
|
||||
}
|
||||
|
||||
Index_t* nodeElemCornerList(Index_t idx)
|
||||
{
|
||||
return &m_nodeElemCornerList[m_nodeElemStart[idx]];
|
||||
}
|
||||
|
||||
// Parameters
|
||||
|
||||
// Cutoffs
|
||||
Real_t u_cut() const { return m_u_cut; }
|
||||
Real_t e_cut() const { return m_e_cut; }
|
||||
Real_t p_cut() const { return m_p_cut; }
|
||||
Real_t q_cut() const { return m_q_cut; }
|
||||
Real_t v_cut() const { return m_v_cut; }
|
||||
|
||||
// Other constants (usually are settable via input file in real codes)
|
||||
Real_t hgcoef() const { return m_hgcoef; }
|
||||
Real_t qstop() const { return m_qstop; }
|
||||
Real_t monoq_max_slope() const { return m_monoq_max_slope; }
|
||||
Real_t monoq_limiter_mult() const { return m_monoq_limiter_mult; }
|
||||
Real_t ss4o3() const { return m_ss4o3; }
|
||||
Real_t qlc_monoq() const { return m_qlc_monoq; }
|
||||
Real_t qqc_monoq() const { return m_qqc_monoq; }
|
||||
Real_t qqc() const { return m_qqc; }
|
||||
|
||||
Real_t eosvmax() const { return m_eosvmax; }
|
||||
Real_t eosvmin() const { return m_eosvmin; }
|
||||
Real_t pmin() const { return m_pmin; }
|
||||
Real_t emin() const { return m_emin; }
|
||||
Real_t dvovmax() const { return m_dvovmax; }
|
||||
Real_t refdens() const { return m_refdens; }
|
||||
|
||||
// Timestep controls, etc...
|
||||
Real_t& time() { return m_time; }
|
||||
Real_t& deltatime() { return m_deltatime; }
|
||||
Real_t& deltatimemultlb() { return m_deltatimemultlb; }
|
||||
Real_t& deltatimemultub() { return m_deltatimemultub; }
|
||||
Real_t& stoptime() { return m_stoptime; }
|
||||
Real_t& dtcourant() { return m_dtcourant; }
|
||||
Real_t& dthydro() { return m_dthydro; }
|
||||
Real_t& dtmax() { return m_dtmax; }
|
||||
Real_t& dtfixed() { return m_dtfixed; }
|
||||
|
||||
Int_t& cycle() { return m_cycle; }
|
||||
Index_t& numRanks() { return m_numRanks; }
|
||||
|
||||
Index_t& colLoc() { return m_colLoc; }
|
||||
Index_t& rowLoc() { return m_rowLoc; }
|
||||
Index_t& planeLoc() { return m_planeLoc; }
|
||||
Index_t& tp() { return m_tp; }
|
||||
|
||||
Index_t& sizeX() { return m_sizeX; }
|
||||
Index_t& sizeY() { return m_sizeY; }
|
||||
Index_t& sizeZ() { return m_sizeZ; }
|
||||
Index_t& numReg() { return m_numReg; }
|
||||
Int_t& cost() { return m_cost; }
|
||||
Index_t& numElem() { return m_numElem; }
|
||||
Index_t& numNode() { return m_numNode; }
|
||||
|
||||
Index_t& maxPlaneSize() { return m_maxPlaneSize; }
|
||||
Index_t& maxEdgeSize() { return m_maxEdgeSize; }
|
||||
|
||||
//
|
||||
// MPI-Related additional data
|
||||
//
|
||||
|
||||
#if USE_MPI
|
||||
// Communication Work space
|
||||
Real_t* commDataSend;
|
||||
Real_t* commDataRecv;
|
||||
|
||||
// Maximum number of block neighbors
|
||||
MPI_Request recvRequest[26]; // 6 faces + 12 edges + 8 corners
|
||||
MPI_Request sendRequest[26]; // 6 faces + 12 edges + 8 corners
|
||||
#endif
|
||||
|
||||
private:
|
||||
void BuildMesh(Int_t nx, Int_t edgeNodes, Int_t edgeElems);
|
||||
void SetupThreadSupportStructures();
|
||||
void CreateRegionIndexSets(Int_t nreg, Int_t balance);
|
||||
void SetupCommBuffers(Int_t edgeNodes);
|
||||
void SetupSymmetryPlanes(Int_t edgeNodes);
|
||||
void SetupElementConnectivities(Int_t edgeElems);
|
||||
void SetupBoundaryConditions(Int_t edgeElems);
|
||||
|
||||
//
|
||||
// IMPLEMENTATION
|
||||
//
|
||||
|
||||
/* Node-centered */
|
||||
|
||||
struct Tuple3
|
||||
{
|
||||
Real_t x, y, z;
|
||||
};
|
||||
|
||||
Kokkos::vector<Tuple3> m_coord; /* coordinates */
|
||||
|
||||
Kokkos::vector<Tuple3> m_vel; /* velocities */
|
||||
|
||||
Kokkos::vector<Tuple3> m_acc; /* accelerations */
|
||||
|
||||
Kokkos::vector<Tuple3> m_force; /* forces */
|
||||
|
||||
Kokkos::vector<Real_t> m_nodalMass; /* mass */
|
||||
|
||||
Kokkos::vector<Index_t> m_symmX; /* symmetry plane nodesets */
|
||||
Kokkos::vector<Index_t> m_symmY;
|
||||
Kokkos::vector<Index_t> m_symmZ;
|
||||
|
||||
// Element-centered
|
||||
|
||||
// Region information
|
||||
Int_t m_numReg;
|
||||
Int_t m_cost; // imbalance cost
|
||||
Index_t* m_regElemSize; // Size of region sets
|
||||
Index_t* m_regNumList; // Region number per domain element
|
||||
Index_t** m_regElemlist; // region indexset
|
||||
|
||||
Kokkos::vector<Index_t> m_nodelist; /* elemToNode connectivity */
|
||||
|
||||
struct FaceElemConn
|
||||
{
|
||||
Index_t lxim, lxip, letam, letap, lzetam, lzetap;
|
||||
};
|
||||
|
||||
Kokkos::vector<FaceElemConn> m_faceToElem; /* element conn across faces */
|
||||
|
||||
Kokkos::vector<Int_t> m_elemBC; /* symmetry/free-surface flags for each elem face */
|
||||
|
||||
Kokkos::vector<Real_t> m_dxx; /* principal strains -- temporary */
|
||||
Kokkos::vector<Real_t> m_dyy;
|
||||
Kokkos::vector<Real_t> m_dzz;
|
||||
|
||||
Kokkos::vector<Real_t> m_delv_xi; /* velocity gradient -- temporary */
|
||||
Kokkos::vector<Real_t> m_delv_eta;
|
||||
Kokkos::vector<Real_t> m_delv_zeta;
|
||||
|
||||
Kokkos::vector<Real_t> m_delx_xi; /* coordinate gradient -- temporary */
|
||||
Kokkos::vector<Real_t> m_delx_eta;
|
||||
Kokkos::vector<Real_t> m_delx_zeta;
|
||||
|
||||
Kokkos::vector<Real_t> m_e; /* energy */
|
||||
|
||||
struct Pcomponents
|
||||
{
|
||||
Real_t p, q;
|
||||
};
|
||||
|
||||
Kokkos::vector<Pcomponents> m_pq; /* pressure and artificial viscosity */
|
||||
|
||||
struct Qcomponents
|
||||
{
|
||||
Real_t ql, qq;
|
||||
};
|
||||
|
||||
Kokkos::vector<Qcomponents> m_qlqq; /* linear and quadratic terms for q */
|
||||
|
||||
struct Volume
|
||||
{
|
||||
Real_t v, volo;
|
||||
};
|
||||
|
||||
Kokkos::vector<Volume> m_vol; /* relative and reference volume */
|
||||
|
||||
Kokkos::vector<Real_t> m_vnew; /* new relative volume -- temporary */
|
||||
Kokkos::vector<Real_t> m_delv; /* m_vnew - m_v */
|
||||
Kokkos::vector<Real_t> m_vdov; /* volume derivative over volume */
|
||||
|
||||
Kokkos::vector<Real_t> m_arealg; /* characteristic length of an element */
|
||||
|
||||
Kokkos::vector<Real_t> m_ss; /* "sound speed" */
|
||||
|
||||
Kokkos::vector<Real_t> m_elemMass; /* mass */
|
||||
|
||||
// Cutoffs (treat as constants)
|
||||
const Real_t m_e_cut; // energy tolerance
|
||||
const Real_t m_p_cut; // pressure tolerance
|
||||
const Real_t m_q_cut; // q tolerance
|
||||
const Real_t m_v_cut; // relative volume tolerance
|
||||
const Real_t m_u_cut; // velocity tolerance
|
||||
|
||||
// Other constants (usually setable, but hardcoded in this proxy app)
|
||||
|
||||
const Real_t m_hgcoef; // hourglass control
|
||||
const Real_t m_ss4o3;
|
||||
const Real_t m_qstop; // excessive q indicator
|
||||
const Real_t m_monoq_max_slope;
|
||||
const Real_t m_monoq_limiter_mult;
|
||||
const Real_t m_qlc_monoq; // linear term coef for q
|
||||
const Real_t m_qqc_monoq; // quadratic term coef for q
|
||||
const Real_t m_qqc;
|
||||
const Real_t m_eosvmax;
|
||||
const Real_t m_eosvmin;
|
||||
const Real_t m_pmin; // pressure floor
|
||||
const Real_t m_emin; // energy floor
|
||||
const Real_t m_dvovmax; // maximum allowable volume change
|
||||
const Real_t m_refdens; // reference density
|
||||
|
||||
// Variables to keep track of timestep, simulation time, and cycle
|
||||
Real_t m_dtcourant; // courant constraint
|
||||
Real_t m_dthydro; // volume change constraint
|
||||
Int_t m_cycle; // iteration count for simulation
|
||||
Real_t m_dtfixed; // fixed time increment
|
||||
Real_t m_time; // current time
|
||||
Real_t m_deltatime; // variable time increment
|
||||
Real_t m_deltatimemultlb;
|
||||
Real_t m_deltatimemultub;
|
||||
Real_t m_dtmax; // maximum allowable time increment
|
||||
Real_t m_stoptime; // end time for simulation
|
||||
|
||||
Int_t m_numRanks;
|
||||
|
||||
Index_t m_colLoc;
|
||||
Index_t m_rowLoc;
|
||||
Index_t m_planeLoc;
|
||||
Index_t m_tp;
|
||||
|
||||
Index_t m_sizeX;
|
||||
Index_t m_sizeY;
|
||||
Index_t m_sizeZ;
|
||||
Index_t m_numElem;
|
||||
Index_t m_numNode;
|
||||
|
||||
Index_t m_maxPlaneSize;
|
||||
Index_t m_maxEdgeSize;
|
||||
|
||||
// OMP hack
|
||||
Index_t* m_nodeElemStart;
|
||||
Index_t* m_nodeElemCornerList;
|
||||
|
||||
// Used in setup
|
||||
Index_t m_rowMin, m_rowMax;
|
||||
Index_t m_colMin, m_colMax;
|
||||
Index_t m_planeMin, m_planeMax;
|
||||
};
|
||||
|
||||
typedef Real_t& (Domain::*Domain_member)(Index_t);
|
||||
|
||||
struct cmdLineOpts
|
||||
{
|
||||
Int_t its; // -i
|
||||
Int_t nx; // -s
|
||||
Int_t numReg; // -r
|
||||
Int_t numFiles; // -f
|
||||
Int_t showProg; // -p
|
||||
Int_t quiet; // -q
|
||||
Int_t viz; // -v
|
||||
Int_t cost; // -c
|
||||
Int_t balance; // -b
|
||||
};
|
||||
|
||||
// Function Prototypes
|
||||
|
||||
// lulesh-par
|
||||
Real_t
|
||||
CalcElemVolume(const Real_t x[8], const Real_t y[8], const Real_t z[8]);
|
||||
|
||||
// lulesh-util
|
||||
void
|
||||
ParseCommandLineOptions(int argc, char* argv[], Int_t myRank, struct cmdLineOpts* opts);
|
||||
void
|
||||
VerifyAndWriteFinalOutput(Real_t elapsed_time, Domain& locDom, Int_t nx, Int_t numRanks);
|
||||
|
||||
// lulesh-viz
|
||||
void
|
||||
DumpToVisit(Domain& domain, int numFiles, int myRank, int numRanks);
|
||||
|
||||
// lulesh-comm
|
||||
void
|
||||
CommRecv(Domain& domain, Int_t msgType, Index_t xferFields, Index_t dx, Index_t dy,
|
||||
Index_t dz, bool doRecv, bool planeOnly);
|
||||
void
|
||||
CommSend(Domain& domain, Int_t msgType, Index_t xferFields, Domain_member* fieldData,
|
||||
Index_t dx, Index_t dy, Index_t dz, bool doSend, bool planeOnly);
|
||||
void
|
||||
CommSBN(Domain& domain, Int_t xferFields, Domain_member* fieldData);
|
||||
void
|
||||
CommSyncPosVel(Domain& domain);
|
||||
void
|
||||
CommMonoQ(Domain& domain);
|
||||
|
||||
// lulesh-init
|
||||
void
|
||||
InitMeshDecomp(Int_t numRanks, Int_t myRank, Int_t* col, Int_t* row, Int_t* plane,
|
||||
Int_t* side);
|
||||
@@ -1,4 +1,4 @@
|
||||
cmake_minimum_required(VERSION 3.13 FATAL_ERROR)
|
||||
cmake_minimum_required(VERSION 3.15 FATAL_ERROR)
|
||||
|
||||
project(omnitrace-parallel-overhead LANGUAGES CXX)
|
||||
|
||||
|
||||
@@ -36,6 +36,9 @@ main(int argc, char** argv)
|
||||
if(argc > 2) nthread = atol(argv[2]);
|
||||
if(argc > 3) nitr = atol(argv[3]);
|
||||
|
||||
printf("[%s] Threads: %zu\n[%s] Iterations: %zu\n[%s] fibonacci(%li)...\n", argv[0],
|
||||
nthread, argv[0], nitr, argv[0], nfib);
|
||||
|
||||
std::vector<std::thread> threads{};
|
||||
for(size_t i = 0; i < nthread; ++i)
|
||||
{
|
||||
@@ -43,10 +46,11 @@ main(int argc, char** argv)
|
||||
threads.emplace_back(&run, _nitr, nfib);
|
||||
}
|
||||
|
||||
run(nitr - 0.25 * nitr, nfib - 0.1 * nfib);
|
||||
for(auto& itr : threads)
|
||||
itr.join();
|
||||
|
||||
printf("fibonacci(%li) x %lu = %li\n", nfib, nthread, total.load());
|
||||
printf("[%s] fibonacci(%li) x %lu = %li\n", argv[0], nfib, nthread, total.load());
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
cmake_minimum_required(VERSION 3.13 FATAL_ERROR)
|
||||
cmake_minimum_required(VERSION 3.15 FATAL_ERROR)
|
||||
|
||||
project(omnitrace-transpose LANGUAGES CXX)
|
||||
|
||||
|
||||
@@ -21,6 +21,7 @@ THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#include "hip/hip_runtime.h"
|
||||
|
||||
#include <cfloat>
|
||||
#include <chrono>
|
||||
#include <cmath>
|
||||
@@ -29,14 +30,19 @@ THE SOFTWARE.
|
||||
#include <fstream>
|
||||
#include <iomanip>
|
||||
#include <iostream>
|
||||
#include <mutex>
|
||||
#include <thread>
|
||||
#include <vector>
|
||||
|
||||
static std::mutex print_lock{};
|
||||
using auto_lock_t = std::unique_lock<std::mutex>;
|
||||
|
||||
#define HIP_API_CALL(CALL) \
|
||||
{ \
|
||||
hipError_t error_ = (CALL); \
|
||||
if(error_ != hipSuccess) \
|
||||
{ \
|
||||
auto_lock_t _lk{ print_lock }; \
|
||||
fprintf(stderr, "%s:%d :: HIP error : %s\n", __FILE__, __LINE__, \
|
||||
hipGetErrorString(error_)); \
|
||||
exit(EXIT_FAILURE); \
|
||||
@@ -49,6 +55,7 @@ check_hip_error(void)
|
||||
hipError_t err = hipGetLastError();
|
||||
if(err != hipSuccess)
|
||||
{
|
||||
auto_lock_t _lk{ print_lock };
|
||||
std::cerr << "Error: " << hipGetErrorString(err) << std::endl;
|
||||
exit(err);
|
||||
}
|
||||
@@ -63,6 +70,7 @@ verify(int* in, int* out, int M, int N)
|
||||
int col = rand() % N;
|
||||
if(in[row * N + col] != out[col * M + row])
|
||||
{
|
||||
auto_lock_t _lk{ print_lock };
|
||||
std::cout << "mismatch: " << row << ", " << col << " : " << in[row * N + col]
|
||||
<< " | " << out[col * M + row] << "\n";
|
||||
}
|
||||
@@ -85,19 +93,23 @@ transpose_a(int* in, int* out, int M, int N)
|
||||
}
|
||||
|
||||
void
|
||||
run(int rank, int argc, char** argv)
|
||||
run(int rank, int tid, hipStream_t stream, int argc, char** argv)
|
||||
{
|
||||
(void) argc;
|
||||
(void) argv;
|
||||
unsigned int M = 4960 * 2;
|
||||
unsigned int N = 4960 * 2;
|
||||
size_t nitr = 5000;
|
||||
unsigned int M = 4960 * 2;
|
||||
unsigned int N = 4960 * 2;
|
||||
if(argc > 2) nitr = atoll(argv[2]);
|
||||
|
||||
auto_lock_t _lk{ print_lock };
|
||||
std::cout << "[" << rank << "][" << tid << "] M: " << M << " N: " << N << std::endl;
|
||||
_lk.unlock();
|
||||
|
||||
std::cout << "[" << rank << "] M: " << M << " N: " << N << std::endl;
|
||||
size_t size = sizeof(int) * M * N;
|
||||
int* matrix = (int*) malloc(size);
|
||||
int* matrix = new int[size];
|
||||
for(size_t i = 0; i < M * N; i++)
|
||||
matrix[i] = rand() % 1002;
|
||||
int *in, *out;
|
||||
int* in = nullptr;
|
||||
int* out = nullptr;
|
||||
|
||||
std::chrono::high_resolution_clock::time_point t1, t2;
|
||||
|
||||
@@ -106,37 +118,36 @@ run(int rank, int argc, char** argv)
|
||||
HIP_API_CALL(hipMemset(in, 0, size));
|
||||
HIP_API_CALL(hipMemset(out, 0, size));
|
||||
HIP_API_CALL(hipMemcpy(in, matrix, size, hipMemcpyHostToDevice));
|
||||
HIP_API_CALL(hipDeviceSynchronize());
|
||||
|
||||
hipDeviceProp_t props;
|
||||
HIP_API_CALL(hipGetDeviceProperties(&props, 0));
|
||||
|
||||
dim3 grid(M / 32, N / 32, 1);
|
||||
dim3 block(32, 32, 1); // transpose_a
|
||||
|
||||
t1 = std::chrono::high_resolution_clock::now();
|
||||
const unsigned times = 10000;
|
||||
auto _func = [&](hipStream_t stream) {
|
||||
for(size_t i = 0; i < times / 2; i++)
|
||||
{
|
||||
transpose_a<<<grid, block, 0, stream>>>(in, out, M, N);
|
||||
check_hip_error();
|
||||
}
|
||||
HIP_API_CALL(hipStreamSynchronize(stream));
|
||||
};
|
||||
hipStream_t _stream{};
|
||||
HIP_API_CALL(hipStreamCreate(&_stream));
|
||||
std::thread _t{ _func, _stream };
|
||||
_t.join();
|
||||
_func(0);
|
||||
HIP_API_CALL(hipDeviceSynchronize());
|
||||
t1 = std::chrono::high_resolution_clock::now();
|
||||
for(size_t i = 0; i < nitr; i++)
|
||||
{
|
||||
transpose_a<<<grid, block, 0, stream>>>(in, out, M, N);
|
||||
check_hip_error();
|
||||
}
|
||||
HIP_API_CALL(hipStreamSynchronize(stream));
|
||||
t2 = std::chrono::high_resolution_clock::now();
|
||||
double time =
|
||||
std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1).count();
|
||||
float GB = (float) size * times * 2 / (1 << 30);
|
||||
std::cout << "[" << rank << "] Runtime of transpose is " << time << " sec\n"
|
||||
float GB = (float) size * nitr * 2 / (1 << 30);
|
||||
|
||||
print_lock.lock();
|
||||
std::cout << "[" << rank << "][" << tid << "] Runtime of transpose is " << time
|
||||
<< " sec\n"
|
||||
<< "The average performance of transpose is " << GB / time << " GBytes/sec"
|
||||
<< std::endl;
|
||||
print_lock.unlock();
|
||||
|
||||
int* out_matrix = (int*) malloc(size);
|
||||
HIP_API_CALL(hipDeviceSynchronize());
|
||||
|
||||
int* out_matrix = new int[size];
|
||||
HIP_API_CALL(hipMemcpy(out_matrix, out, size, hipMemcpyDeviceToHost));
|
||||
|
||||
// cpu_transpose(matrix, out_matrix, M, N);
|
||||
@@ -145,8 +156,8 @@ run(int rank, int argc, char** argv)
|
||||
HIP_API_CALL(hipFree(in));
|
||||
HIP_API_CALL(hipFree(out));
|
||||
|
||||
free(matrix);
|
||||
free(out_matrix);
|
||||
delete[] matrix;
|
||||
delete[] out_matrix;
|
||||
}
|
||||
|
||||
#if defined(USE_MPI)
|
||||
@@ -174,12 +185,16 @@ main(int argc, char** argv)
|
||||
int rank = 0;
|
||||
int size = 1;
|
||||
int nthreads = 2;
|
||||
int nitr = 5000;
|
||||
if(argc > 1) nthreads = atoi(argv[1]);
|
||||
if(argc > 2) nitr = atoi(argv[2]);
|
||||
|
||||
#if defined(USE_MPI)
|
||||
MPI_Init(&argc, &argv);
|
||||
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
|
||||
MPI_Comm_size(MPI_COMM_WORLD, &size);
|
||||
#else
|
||||
(void) size;
|
||||
#endif
|
||||
// this is a temporary workaround in omnitrace when HIP + MPI is enabled
|
||||
int ndevice = 0;
|
||||
@@ -193,16 +208,24 @@ main(int argc, char** argv)
|
||||
if(rank == devid && rank < ndevice)
|
||||
{
|
||||
std::vector<std::thread> _threads{};
|
||||
std::vector<hipStream_t> _streams(nthreads);
|
||||
for(int i = 0; i < nthreads; ++i)
|
||||
HIP_API_CALL(hipStreamCreate(&_streams.at(i)));
|
||||
for(int i = 1; i < nthreads; ++i)
|
||||
_threads.emplace_back(run, rank, argc, argv);
|
||||
run(rank, argc, argv);
|
||||
_threads.emplace_back(run, rank, i, _streams.at(i), argc, argv);
|
||||
run(rank, 0, _streams.at(0), argc, argv);
|
||||
for(auto& itr : _threads)
|
||||
itr.join();
|
||||
for(int i = 0; i < nthreads; ++i)
|
||||
HIP_API_CALL(hipStreamDestroy(_streams.at(i)));
|
||||
}
|
||||
#if defined(USE_MPI)
|
||||
MPI_Barrier(MPI_COMM_WORLD);
|
||||
do_a2a(rank);
|
||||
MPI_Finalize();
|
||||
#endif
|
||||
HIP_API_CALL(hipDeviceSynchronize());
|
||||
HIP_API_CALL(hipDeviceReset());
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
Vendored
+1
-1
Submodule external/PTL updated: dd1b67829c...61f873cf79
Vendored
+1
-1
Submodule external/dyninst updated: 076d8bdef4...82b10fdcf5
Vendored
+1
-1
Submodule external/timemory updated: c040fe7022...335abea0c5
@@ -0,0 +1,337 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2020, The Regents of the University of California,
|
||||
// through Lawrence Berkeley National Laboratory (subject to receipt of any
|
||||
// required approvals from the U.S. Dept. of Energy). All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to
|
||||
// deal in the Software without restriction, including without limitation the
|
||||
// rights to use, copy, modify, merge, publish, distribute, sublicense, and
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
// IN THE SOFTWARE.
|
||||
|
||||
/** \file timemory/tools/available.hpp
|
||||
* \headerfile tools/available.hpp "tools/available.hpp"
|
||||
* Handles serializing the settings
|
||||
*
|
||||
*/
|
||||
|
||||
#pragma once
|
||||
|
||||
#define TIMEMORY_DISABLE_BANNER
|
||||
#define TIMEMORY_DISABLE_COMPONENT_STORAGE_INIT
|
||||
|
||||
#include "timemory/settings/macros.hpp"
|
||||
#include "timemory/tpls/cereal/archives.hpp"
|
||||
#include "timemory/tpls/cereal/cereal/external/base64.hpp"
|
||||
#include "timemory/utility/demangle.hpp"
|
||||
|
||||
#include <algorithm>
|
||||
#include <array>
|
||||
#include <functional>
|
||||
#include <iomanip>
|
||||
#include <sstream>
|
||||
#include <stack>
|
||||
#include <string>
|
||||
#include <tuple>
|
||||
#include <utility>
|
||||
#include <vector>
|
||||
|
||||
#if !defined(TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME)
|
||||
# define TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME epilogue
|
||||
#endif
|
||||
|
||||
#if !defined(TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME)
|
||||
# define TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME prologue
|
||||
#endif
|
||||
|
||||
//======================================================================================//
|
||||
|
||||
namespace tim
|
||||
{
|
||||
namespace cereal
|
||||
{
|
||||
class SettingsTextArchive
|
||||
: public OutputArchive<SettingsTextArchive>
|
||||
, public traits::TextArchive
|
||||
{
|
||||
public:
|
||||
using width_type = std::vector<uint64_t>;
|
||||
using value_type = std::string;
|
||||
using entry_type = std::map<std::string, value_type>;
|
||||
using array_type = std::vector<entry_type>;
|
||||
using unique_set = std::set<std::string>;
|
||||
using int_stack = std::stack<uint32_t>;
|
||||
|
||||
public:
|
||||
//! Construct, outputting to the provided stream
|
||||
/// \param stream The array of output data
|
||||
SettingsTextArchive(array_type& stream, unique_set exclude)
|
||||
: OutputArchive<SettingsTextArchive>(this)
|
||||
, output_stream(&stream)
|
||||
, exclude_stream(std::move(exclude))
|
||||
{
|
||||
name_counter.push(0);
|
||||
}
|
||||
|
||||
~SettingsTextArchive() override = default;
|
||||
|
||||
void saveBinaryValue(const void* data, size_t size, const char* name = nullptr)
|
||||
{
|
||||
setNextName(name);
|
||||
writeName();
|
||||
|
||||
auto base64string =
|
||||
base64::encode(reinterpret_cast<const unsigned char*>(data), size);
|
||||
saveValue(base64string);
|
||||
}
|
||||
|
||||
void startNode() { name_counter.push(0); }
|
||||
|
||||
void finishNode() { name_counter.pop(); }
|
||||
|
||||
//! Sets the name for the next node created with startNode
|
||||
void setNextName(const char* name)
|
||||
{
|
||||
if(exclude_stream.count(name) > 0) return;
|
||||
|
||||
if((current_entry != nullptr) && value_keys.count(name) > 0)
|
||||
{
|
||||
current_entry->insert({ name, "" });
|
||||
current_value = &((*current_entry)[name]);
|
||||
return;
|
||||
}
|
||||
if(value_keys.count(name) > 0)
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
current_value = nullptr;
|
||||
output_stream->push_back(entry_type{});
|
||||
current_entry = &(output_stream->back());
|
||||
|
||||
current_entry->insert({ "identifier", name });
|
||||
std::string func = name;
|
||||
const std::string prefix = TIMEMORY_SETTINGS_PREFIX;
|
||||
func = func.erase(0, prefix.length());
|
||||
std::transform(func.begin(), func.end(), func.begin(),
|
||||
[](char& c) { return tolower(c); });
|
||||
{
|
||||
std::stringstream ss;
|
||||
ss << "settings::" << func << "()";
|
||||
current_entry->insert({ "static_accessor", ss.str() });
|
||||
}
|
||||
{
|
||||
std::stringstream ss;
|
||||
ss << "settings::instance()->get_" << func << "()";
|
||||
current_entry->insert({ "member_accessor", ss.str() });
|
||||
}
|
||||
{
|
||||
std::stringstream ss;
|
||||
ss << "settings." << func;
|
||||
current_entry->insert({ "python_accessor", ss.str() });
|
||||
}
|
||||
}
|
||||
|
||||
void setNextType(const char*) {}
|
||||
|
||||
public:
|
||||
template <typename Tp>
|
||||
inline void saveValue(Tp _val)
|
||||
{
|
||||
std::stringstream ssval;
|
||||
ssval << std::boolalpha << _val;
|
||||
if(current_value)
|
||||
{
|
||||
*current_value = ssval.str();
|
||||
}
|
||||
}
|
||||
|
||||
void writeName() {}
|
||||
|
||||
void makeArray() {}
|
||||
|
||||
private:
|
||||
value_type* current_value = nullptr;
|
||||
entry_type* current_entry = nullptr;
|
||||
array_type* output_stream = nullptr;
|
||||
unique_set exclude_stream = {};
|
||||
int_stack name_counter;
|
||||
unique_set value_keys = { "name", "value", "description", "count",
|
||||
"environ", "max_count", "cmdline", "data_type",
|
||||
"initial", "categories" };
|
||||
};
|
||||
|
||||
//======================================================================================//
|
||||
//
|
||||
// prologue and epilogue functions
|
||||
//
|
||||
//======================================================================================//
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Prologue for NVPs for settings archive
|
||||
/*! NVPs do not start or finish nodes - they just set up the names */
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive&, const NameValuePair<T>&)
|
||||
{}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Epilogue for NVPs for settings archive
|
||||
/*! NVPs do not start or finish nodes - they just set up the names */
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive&, const NameValuePair<T>&)
|
||||
{}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Prologue for deferred data for settings archive
|
||||
/*! Do nothing for the defer wrapper */
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive&, const DeferredData<T>&)
|
||||
{}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Epilogue for deferred for settings archive
|
||||
/*! NVPs do not start or finish nodes - they just set up the names */
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive&, const DeferredData<T>&)
|
||||
{}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Prologue for SizeTags for settings archive
|
||||
/*! SizeTags are ignored */
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive& ar, const SizeTag<T>&)
|
||||
{
|
||||
ar.makeArray();
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Epilogue for SizeTags for settings archive
|
||||
/*! SizeTags are ignored */
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive&, const SizeTag<T>&)
|
||||
{}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Prologue for all other types for settings archive
|
||||
/*! Starts a new node, named either automatically or by some NVP,
|
||||
that may be given data by the type about to be archived*/
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive& ar, const T&)
|
||||
{
|
||||
ar.startNode();
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Epilogue for all other types other for settings archive
|
||||
/*! Finishes the node created in the prologue*/
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive& ar, const T&)
|
||||
{
|
||||
ar.finishNode();
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Prologue for arithmetic types for settings archive
|
||||
inline void
|
||||
TIMEMORY_CEREAL_PROLOGUE_FUNCTION_NAME(SettingsTextArchive&, const std::nullptr_t&)
|
||||
{}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Epilogue for arithmetic types for settings archive
|
||||
inline void
|
||||
TIMEMORY_CEREAL_EPILOGUE_FUNCTION_NAME(SettingsTextArchive&, const std::nullptr_t&)
|
||||
{}
|
||||
|
||||
//======================================================================================//
|
||||
//
|
||||
// Common serialization functions
|
||||
//
|
||||
//======================================================================================//
|
||||
|
||||
//! Serializing NVP types
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive& ar, const NameValuePair<T>& t)
|
||||
{
|
||||
ar.setNextName(t.name);
|
||||
if(std::is_same<T, std::string>::value)
|
||||
{
|
||||
ar.setNextType("string");
|
||||
}
|
||||
else
|
||||
{
|
||||
ar.setNextType(tim::demangle<T>().c_str());
|
||||
}
|
||||
ar(t.value);
|
||||
}
|
||||
|
||||
template <typename CharT, typename Traits, typename Alloc>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(
|
||||
SettingsTextArchive& ar,
|
||||
const NameValuePair<std::basic_string<CharT, Traits, Alloc>>& t)
|
||||
{
|
||||
ar.setNextName(t.name);
|
||||
ar.setNextType("string");
|
||||
ar(t.value);
|
||||
}
|
||||
|
||||
//! Saving for nullptr
|
||||
inline void
|
||||
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive&, const std::nullptr_t&)
|
||||
{}
|
||||
|
||||
//! Saving for arithmetic
|
||||
template <typename T, traits::EnableIf<std::is_arithmetic<T>::value> = traits::sfinae>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive& ar, const T& t)
|
||||
{
|
||||
if(std::is_same<T, std::string>::value) ar.setNextType("string");
|
||||
ar.saveValue(t);
|
||||
}
|
||||
|
||||
//! saving string
|
||||
template <typename CharT, typename Traits, typename Alloc>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive& ar,
|
||||
const std::basic_string<CharT, Traits, Alloc>& str)
|
||||
{
|
||||
ar.setNextType("string");
|
||||
ar.saveValue(str);
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//! Saving SizeTags
|
||||
template <typename T>
|
||||
inline void
|
||||
TIMEMORY_CEREAL_SAVE_FUNCTION_NAME(SettingsTextArchive&, const SizeTag<T>&)
|
||||
{
|
||||
// nothing to do here, we don't explicitly save the size
|
||||
}
|
||||
|
||||
} // namespace cereal
|
||||
} // namespace tim
|
||||
|
||||
// register archives for polymorphic support
|
||||
TIMEMORY_CEREAL_REGISTER_ARCHIVE(SettingsTextArchive)
|
||||
@@ -34,10 +34,10 @@
|
||||
// clang-format on
|
||||
|
||||
#include "library/timemory.hpp"
|
||||
#include "library/roctracer.hpp"
|
||||
#include "library/components/roctracer.hpp"
|
||||
#include "library/api.hpp"
|
||||
#include "library/fork_gotcha.hpp"
|
||||
#include "library/mpi_gotcha.hpp"
|
||||
#include "library/components/fork_gotcha.hpp"
|
||||
#include "library/components/mpi_gotcha.hpp"
|
||||
#include "library/api.hpp"
|
||||
#include "library/common.hpp"
|
||||
#include "library/state.hpp"
|
||||
@@ -51,6 +51,8 @@
|
||||
|
||||
#include <mutex>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
template <critical_trace::Device DevID, critical_trace::Phase PhaseID,
|
||||
bool UpdateStack = true>
|
||||
inline void
|
||||
@@ -74,7 +76,7 @@ add_critical_trace(int64_t _tid, size_t _cpu_cid, size_t _gpu_cid, size_t _paren
|
||||
if constexpr(PhaseID != critical_trace::Phase::NONE)
|
||||
{
|
||||
// unique lock per thread
|
||||
auto& _mtx = type_mutex<critical_insert, omnitrace, num_mutexes>(_tid);
|
||||
auto& _mtx = type_mutex<critical_insert, api::omnitrace, num_mutexes>(_tid);
|
||||
auto_lock_t _lk{ _mtx };
|
||||
|
||||
auto& _critical_trace = critical_trace::get(_tid);
|
||||
@@ -86,7 +88,7 @@ add_critical_trace(int64_t _tid, size_t _cpu_cid, size_t _gpu_cid, size_t _paren
|
||||
if constexpr(UpdateStack)
|
||||
{
|
||||
// unique lock per thread
|
||||
auto& _mtx = type_mutex<cpu_cid_stack, omnitrace, num_mutexes>(_tid);
|
||||
auto& _mtx = type_mutex<cpu_cid_stack, api::omnitrace, num_mutexes>(_tid);
|
||||
|
||||
if constexpr(PhaseID == critical_trace::Phase::NONE)
|
||||
{
|
||||
@@ -110,3 +112,4 @@ add_critical_trace(int64_t _tid, size_t _cpu_cid, size_t _gpu_cid, size_t _paren
|
||||
tim::consume_parameters(_tid, _cpu_cid, _gpu_cid, _parent_cid, _ts_beg, _ts_val,
|
||||
_hash, _depth, _prio);
|
||||
}
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -28,6 +28,8 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <timemory/compat/macros.h>
|
||||
|
||||
// forward decl of the API
|
||||
|
||||
@@ -28,6 +28,8 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <timemory/api.hpp>
|
||||
#include <timemory/backends/dmp.hpp>
|
||||
#include <timemory/backends/process.hpp>
|
||||
@@ -45,6 +47,10 @@
|
||||
#include <utility>
|
||||
#include <vector>
|
||||
|
||||
// timemory api struct
|
||||
struct omnitrace : tim::concepts::api
|
||||
{};
|
||||
TIMEMORY_DEFINE_NS_API(api, omnitrace)
|
||||
TIMEMORY_DEFINE_NS_API(api, sampling)
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace api = tim::api; // NOLINT
|
||||
}
|
||||
|
||||
@@ -0,0 +1,125 @@
|
||||
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// with the Software without restriction, including without limitation the
|
||||
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
// sell copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// * Redistributions of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimers.
|
||||
//
|
||||
// * Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimers in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this Software without specific prior written permission.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/common.hpp"
|
||||
#include "library/components/fwd.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
#include "library/timemory.hpp"
|
||||
|
||||
#include <timemory/components/base.hpp>
|
||||
#include <timemory/components/papi/papi_array.hpp>
|
||||
#include <timemory/macros/language.hpp>
|
||||
#include <timemory/mpl/concepts.hpp>
|
||||
#include <timemory/sampling/sampler.hpp>
|
||||
#include <timemory/variadic/types.hpp>
|
||||
|
||||
#include <array>
|
||||
#include <chrono>
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
#include <set>
|
||||
#include <vector>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace component
|
||||
{
|
||||
struct backtrace
|
||||
: tim::component::empty_base
|
||||
, tim::concepts::component
|
||||
{
|
||||
static constexpr size_t num_hw_counters = 8;
|
||||
|
||||
using data_t = std::array<char[512], 128>;
|
||||
using clock_type = std::chrono::steady_clock;
|
||||
using time_point_type = typename clock_type::time_point;
|
||||
using value_type = void;
|
||||
using hw_counters = tim::component::papi_array<num_hw_counters>;
|
||||
using hw_counter_data_t = typename hw_counters::value_type;
|
||||
using system_clock = std::chrono::system_clock;
|
||||
using system_time_point = typename system_clock::time_point;
|
||||
|
||||
static void preinit();
|
||||
static std::string label();
|
||||
static std::string description();
|
||||
|
||||
backtrace() = default;
|
||||
~backtrace() = default;
|
||||
backtrace(backtrace&&) = default;
|
||||
backtrace(const backtrace&) = default;
|
||||
|
||||
backtrace& operator=(const backtrace&) = default;
|
||||
backtrace& operator=(backtrace&&) = default;
|
||||
|
||||
bool operator<(const backtrace& rhs) const;
|
||||
|
||||
static std::set<int> configure(bool, int64_t _tid = threading::get_id());
|
||||
static void post_process(int64_t _tid = threading::get_id());
|
||||
static hw_counter_data_t& get_last_hwcounters();
|
||||
|
||||
static void start();
|
||||
static void stop();
|
||||
void sample(int = -1);
|
||||
bool empty() const;
|
||||
size_t size() const;
|
||||
std::vector<std::string> get() const;
|
||||
time_point_type get_timestamp() const;
|
||||
int64_t get_thread_cpu_timestamp() const;
|
||||
|
||||
private:
|
||||
int64_t m_tid = 0;
|
||||
int64_t m_thr_cpu_ts = 0;
|
||||
size_t m_size = 0;
|
||||
time_point_type m_ts = {};
|
||||
data_t m_data = {};
|
||||
hw_counter_data_t m_hw_counter = {};
|
||||
};
|
||||
} // namespace component
|
||||
} // namespace omnitrace
|
||||
|
||||
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
|
||||
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
|
||||
|
||||
# include <timemory/operations.hpp>
|
||||
|
||||
TIMEMORY_DECLARE_EXTERN_COMPONENT(
|
||||
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_wall_clock>), true,
|
||||
double)
|
||||
|
||||
TIMEMORY_DECLARE_EXTERN_COMPONENT(
|
||||
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_cpu_clock>), true,
|
||||
double)
|
||||
|
||||
TIMEMORY_DECLARE_EXTERN_COMPONENT(
|
||||
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_fraction>), true,
|
||||
double)
|
||||
|
||||
#endif
|
||||
@@ -29,8 +29,11 @@
|
||||
#pragma once
|
||||
|
||||
#include "library/common.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/timemory.hpp"
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
// this is used to wrap fork()
|
||||
struct fork_gotcha : comp::base<fork_gotcha, void>
|
||||
{
|
||||
@@ -38,11 +41,18 @@ struct fork_gotcha : comp::base<fork_gotcha, void>
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(fork_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "fork_gotcha"; }
|
||||
|
||||
// generate the gotcha wrappers
|
||||
static void configure();
|
||||
|
||||
// this will get called right before fork
|
||||
void audit(const gotcha_data_t& _data, audit::incoming);
|
||||
static void audit(const gotcha_data_t& _data, audit::incoming);
|
||||
|
||||
// this will get called right after fork with the return value
|
||||
void audit(const gotcha_data_t& _data, audit::outgoing, pid_t _pid);
|
||||
static void audit(const gotcha_data_t& _data, audit::outgoing, pid_t _pid);
|
||||
};
|
||||
|
||||
using fork_gotcha_t = comp::gotcha<4, tim::component_tuple<fork_gotcha>, omnitrace>;
|
||||
using fork_gotcha_t = comp::gotcha<4, tim::component_tuple<fork_gotcha>, api::omnitrace>;
|
||||
} // namespace omnitrace
|
||||
@@ -0,0 +1,121 @@
|
||||
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// with the Software without restriction, including without limitation the
|
||||
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
// sell copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// * Redistributions of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimers.
|
||||
//
|
||||
// * Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimers in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this Software without specific prior written permission.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/defines.hpp"
|
||||
#include "timemory/components/user_bundle/types.hpp"
|
||||
|
||||
#include <timemory/components/data_tracker/types.hpp>
|
||||
#include <timemory/components/macros.hpp>
|
||||
#include <timemory/enum.h>
|
||||
|
||||
TIMEMORY_DECLARE_COMPONENT(roctracer)
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace component
|
||||
{
|
||||
template <typename... Tp>
|
||||
using data_tracker = tim::component::data_tracker<Tp...>;
|
||||
|
||||
struct omnitrace;
|
||||
struct backtrace;
|
||||
struct backtrace_wall_clock
|
||||
{};
|
||||
struct backtrace_cpu_clock
|
||||
{};
|
||||
struct backtrace_fraction
|
||||
{};
|
||||
using sampling_wall_clock = data_tracker<double, backtrace_wall_clock>;
|
||||
using sampling_cpu_clock = data_tracker<double, backtrace_cpu_clock>;
|
||||
using sampling_percent = data_tracker<double, backtrace_fraction>;
|
||||
using roctracer = tim::component::roctracer;
|
||||
} // namespace component
|
||||
} // namespace omnitrace
|
||||
|
||||
#if !defined(OMNITRACE_USE_ROCTRACER)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::roctracer, false_type)
|
||||
#endif
|
||||
|
||||
#if !defined(TIMEMORY_USE_LIBUNWIND)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::api::sampling, false_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::sampling::backtrace, false_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_wall_clock,
|
||||
false_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_cpu_clock,
|
||||
false_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_percent,
|
||||
false_type)
|
||||
#endif
|
||||
|
||||
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::omnitrace, OMNITRACE_COMPONENT,
|
||||
"omnitrace", "omnitrace_component")
|
||||
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::roctracer, OMNITRACE_ROCTRACER,
|
||||
"roctracer", "omnitrace_roctracer")
|
||||
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::sampling_wall_clock,
|
||||
OMNITRACE_SAMPLING_WALL_CLOCK, "sampling_wall_clock", "")
|
||||
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::sampling_cpu_clock,
|
||||
OMNITRACE_SAMPLING_CPU_CLOCK, "sampling_cpu_clock", "")
|
||||
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::sampling_percent,
|
||||
OMNITRACE_SAMPLING_PERCENT, "sampling_percent", "")
|
||||
|
||||
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::sampling_wall_clock,
|
||||
"sampling_wall_clock", "Wall-clock timing",
|
||||
"derived from statistical sampling")
|
||||
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::sampling_cpu_clock,
|
||||
"sampling_cpu_clock", "CPU-clock timing",
|
||||
"derived from statistical sampling")
|
||||
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::sampling_percent,
|
||||
"sampling_percent",
|
||||
"Fraction of wall-clock time spent in functions",
|
||||
"derived from statistical sampling")
|
||||
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::roctracer, "roctracer",
|
||||
"High-precision ROCm API and kernel tracing", "")
|
||||
|
||||
TIMEMORY_STATISTICS_TYPE(omnitrace::component::sampling_wall_clock, double)
|
||||
TIMEMORY_STATISTICS_TYPE(omnitrace::component::sampling_cpu_clock, double)
|
||||
|
||||
// enable timing units
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category,
|
||||
omnitrace::component::sampling_wall_clock, true_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units,
|
||||
omnitrace::component::sampling_wall_clock, true_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category,
|
||||
omnitrace::component::sampling_cpu_clock, true_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units,
|
||||
omnitrace::component::sampling_cpu_clock, true_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category, omnitrace::component::sampling_percent,
|
||||
true_type)
|
||||
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_mean, omnitrace::component::sampling_percent,
|
||||
false_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_units, omnitrace::component::sampling_percent,
|
||||
false_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_statistics, omnitrace::component::sampling_percent,
|
||||
false_type)
|
||||
@@ -29,8 +29,11 @@
|
||||
#pragma once
|
||||
|
||||
#include "library/common.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/timemory.hpp"
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
// this is used to wrap MPI_Init and MPI_Init_thread
|
||||
struct mpi_gotcha : comp::base<mpi_gotcha, void>
|
||||
{
|
||||
@@ -39,6 +42,12 @@ struct mpi_gotcha : comp::base<mpi_gotcha, void>
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(mpi_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "mpi_gotcha"; }
|
||||
|
||||
// generate the gotcha wrappers
|
||||
static void configure();
|
||||
|
||||
// called right before MPI_Init with that functions arguments
|
||||
static void audit(const gotcha_data_t& _data, audit::incoming, int*, char***);
|
||||
|
||||
@@ -56,9 +65,10 @@ struct mpi_gotcha : comp::base<mpi_gotcha, void>
|
||||
void audit(const gotcha_data_t& _data, audit::outgoing, int _retval);
|
||||
|
||||
private:
|
||||
comm_t m_comm = tim::mpi::comm_world_v;
|
||||
int* m_rank = nullptr;
|
||||
int* m_size = nullptr;
|
||||
void* m_comm = nullptr;
|
||||
int* m_rank = nullptr;
|
||||
int* m_size = nullptr;
|
||||
};
|
||||
|
||||
using mpi_gotcha_t = comp::gotcha<5, tim::component_tuple<mpi_gotcha>, omnitrace>;
|
||||
using mpi_gotcha_t = comp::gotcha<5, tim::component_tuple<mpi_gotcha>, api::omnitrace>;
|
||||
} // namespace omnitrace
|
||||
+17
-4
@@ -28,16 +28,29 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/defines.hpp"
|
||||
#include "library/timemory.hpp"
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace component
|
||||
{
|
||||
// timemory component which calls omnitrace functions
|
||||
// (used in gotcha wrappers)
|
||||
struct omnitrace_component : comp::base<omnitrace_component, void>
|
||||
struct omnitrace : comp::base<omnitrace, void>
|
||||
{
|
||||
void start();
|
||||
void stop();
|
||||
void set_prefix(const char*);
|
||||
static std::string label() { return "omnitrace"; }
|
||||
void start();
|
||||
void stop();
|
||||
void set_prefix(const char*);
|
||||
|
||||
private:
|
||||
const char* m_prefix = nullptr;
|
||||
};
|
||||
} // namespace component
|
||||
} // namespace omnitrace
|
||||
|
||||
TIMEMORY_METADATA_SPECIALIZATION(
|
||||
omnitrace::component::omnitrace, "omnitrace",
|
||||
"Invokes instrumentation functions 'omnitrace_push_trace' and 'omnitrace_pop_trace'",
|
||||
"Used by gotcha wrappers")
|
||||
@@ -0,0 +1,75 @@
|
||||
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// with the Software without restriction, including without limitation the
|
||||
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
// sell copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// * Redistributions of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimers.
|
||||
//
|
||||
// * Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimers in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this Software without specific prior written permission.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/common.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/timemory.hpp"
|
||||
|
||||
#include <future>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
struct pthread_gotcha : tim::component::base<pthread_gotcha, void>
|
||||
{
|
||||
struct wrapper
|
||||
{
|
||||
using routine_t = void* (*) (void*);
|
||||
using promise_t = std::promise<void>;
|
||||
|
||||
wrapper(routine_t _routine, void* _arg, bool, promise_t*);
|
||||
void* operator()() const;
|
||||
|
||||
static void* wrap(void* _arg);
|
||||
|
||||
private:
|
||||
bool m_enable_sampling = false;
|
||||
routine_t m_routine = nullptr;
|
||||
void* m_arg = nullptr;
|
||||
promise_t* m_promise = nullptr;
|
||||
};
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(pthread_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "pthread_gotcha"; }
|
||||
|
||||
// generate the gotcha wrappers
|
||||
static void configure();
|
||||
|
||||
// threads can set this to avoid starting sampling on child threads
|
||||
static bool& enable_sampling_on_child_threads();
|
||||
|
||||
// pthread_create
|
||||
int operator()(pthread_t* thread, const pthread_attr_t* attr,
|
||||
void* (*start_routine)(void*), void* arg) const;
|
||||
};
|
||||
|
||||
using pthread_gotcha_t = tim::component::gotcha<2, std::tuple<>, pthread_gotcha>;
|
||||
} // namespace omnitrace
|
||||
@@ -28,20 +28,17 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "timemory/api.hpp"
|
||||
#include "timemory/components/base.hpp"
|
||||
#include "timemory/components/data_tracker/components.hpp"
|
||||
#include "timemory/components/macros.hpp"
|
||||
#include "timemory/enum.h"
|
||||
#include "timemory/macros/os.hpp"
|
||||
#include "timemory/mpl/type_traits.hpp"
|
||||
#include "timemory/mpl/types.hpp"
|
||||
#include "library/components/fwd.hpp"
|
||||
#include "library/defines.hpp"
|
||||
|
||||
TIMEMORY_DECLARE_COMPONENT(roctracer)
|
||||
|
||||
#if !defined(OMNITRACE_USE_ROCTRACER)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::roctracer, false_type)
|
||||
#endif
|
||||
#include <timemory/api.hpp>
|
||||
#include <timemory/components/base.hpp>
|
||||
#include <timemory/components/data_tracker/components.hpp>
|
||||
#include <timemory/components/macros.hpp>
|
||||
#include <timemory/enum.h>
|
||||
#include <timemory/macros/os.hpp>
|
||||
#include <timemory/mpl/type_traits.hpp>
|
||||
#include <timemory/mpl/types.hpp>
|
||||
|
||||
namespace tim
|
||||
{
|
||||
@@ -86,7 +83,12 @@ TIMEMORY_SET_COMPONENT_API(component::roctracer_data, project::timemory, categor
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category, component::roctracer_data, true_type)
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units, component::roctracer_data, true_type)
|
||||
|
||||
#include "timemory/operations.hpp"
|
||||
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
|
||||
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
|
||||
|
||||
# include <timemory/operations.hpp>
|
||||
|
||||
TIMEMORY_DECLARE_EXTERN_COMPONENT(roctracer, false, void)
|
||||
TIMEMORY_DECLARE_EXTERN_COMPONENT(roctracer_data, true, double)
|
||||
|
||||
#endif
|
||||
+11
-7
@@ -28,12 +28,12 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/components/roctracer.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/dynamic_library.hpp"
|
||||
#include "library/perfetto.hpp"
|
||||
#include "library/ptl.hpp"
|
||||
#include "library/roctracer.hpp"
|
||||
|
||||
#include <roctracer.h>
|
||||
#include <roctracer_ext.h>
|
||||
@@ -58,12 +58,15 @@
|
||||
} \
|
||||
} while(0)
|
||||
|
||||
using hsa_timer_t = hsa_rt_utils::Timer;
|
||||
using timestamp_t = hsa_timer_t::timestamp_t;
|
||||
using roctracer_bundle_t = tim::component_bundle<omnitrace, comp::roctracer_data,
|
||||
comp::wall_clock, quirk::explicit_pop>;
|
||||
using roctracer_hsa_bundle_t = tim::component_bundle<omnitrace, comp::roctracer_data>;
|
||||
using roctracer_functions_t = std::vector<std::pair<std::string, std::function<void()>>>;
|
||||
namespace omnitrace
|
||||
{
|
||||
using hsa_timer_t = hsa_rt_utils::Timer;
|
||||
using timestamp_t = hsa_timer_t::timestamp_t;
|
||||
using roctracer_bundle_t =
|
||||
tim::component_bundle<api::omnitrace, comp::roctracer_data, comp::wall_clock>;
|
||||
using roctracer_hsa_bundle_t =
|
||||
tim::component_bundle<api::omnitrace, comp::roctracer_data>;
|
||||
using roctracer_functions_t = std::vector<std::pair<std::string, std::function<void()>>>;
|
||||
|
||||
std::unique_ptr<hsa_timer_t>&
|
||||
get_hsa_timer();
|
||||
@@ -94,3 +97,4 @@ roctracer_setup_routines();
|
||||
|
||||
roctracer_functions_t&
|
||||
roctracer_tear_down_routines();
|
||||
} // namespace omnitrace
|
||||
@@ -30,9 +30,11 @@
|
||||
|
||||
#include "library/api.hpp"
|
||||
#include "library/common.hpp"
|
||||
#include "library/fork_gotcha.hpp"
|
||||
#include "library/mpi_gotcha.hpp"
|
||||
#include "library/roctracer.hpp"
|
||||
#include "library/components/fork_gotcha.hpp"
|
||||
#include "library/components/mpi_gotcha.hpp"
|
||||
#include "library/components/pthread_gotcha.hpp"
|
||||
#include "library/components/roctracer.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/state.hpp"
|
||||
#include "library/timemory.hpp"
|
||||
|
||||
@@ -40,37 +42,42 @@
|
||||
|
||||
#include <string_view>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
// bundle of components around omnitrace_init and omnitrace_finalize
|
||||
using main_bundle_t =
|
||||
tim::lightweight_tuple<comp::wall_clock, comp::peak_rss, comp::cpu_clock,
|
||||
comp::cpu_util, comp::roctracer, papi_tot_ins,
|
||||
comp::user_global_bundle, fork_gotcha_t, mpi_gotcha_t>;
|
||||
comp::cpu_util, comp::roctracer, comp::user_global_bundle,
|
||||
fork_gotcha_t, mpi_gotcha_t, pthread_gotcha_t>;
|
||||
|
||||
// bundle of components used in instrumentation
|
||||
using instrumentation_bundle_t =
|
||||
tim::component_bundle<omnitrace, comp::wall_clock*, comp::user_global_bundle*>;
|
||||
tim::component_bundle<api::omnitrace, comp::wall_clock*, comp::user_global_bundle*>;
|
||||
|
||||
// allocator for instrumentation_bundle_t
|
||||
using bundle_allocator_t = tim::data::ring_buffer_allocator<instrumentation_bundle_t>;
|
||||
|
||||
// bundle of components around each thread
|
||||
#if defined(TIMEMORY_RUSAGE_THREAD) && TIMEMORY_RUSAGE_THREAD > 0
|
||||
using omnitrace_thread_bundle_t =
|
||||
tim::lightweight_tuple<comp::wall_clock, comp::thread_cpu_clock,
|
||||
comp::thread_cpu_util,
|
||||
#if defined(TIMEMORY_RUSAGE_THREAD) && TIMEMORY_RUSAGE_THREAD > 0
|
||||
comp::peak_rss,
|
||||
comp::thread_cpu_util, comp::peak_rss>;
|
||||
#else
|
||||
using omnitrace_thread_bundle_t =
|
||||
tim::lightweight_tuple<comp::wall_clock, comp::thread_cpu_clock,
|
||||
comp::thread_cpu_util>;
|
||||
#endif
|
||||
papi_tot_ins>;
|
||||
|
||||
//
|
||||
// Initialization routines
|
||||
//
|
||||
void
|
||||
configure_settings();
|
||||
configure_settings() TIMEMORY_VISIBILITY("default");
|
||||
|
||||
void
|
||||
print_config_settings(std::ostream& _os,
|
||||
std::function<bool(const std::string_view&)>&& _filter);
|
||||
print_config_settings(
|
||||
std::ostream& _os,
|
||||
std::function<bool(const std::string_view&, const std::set<std::string>&)>&& _filter);
|
||||
|
||||
std::string&
|
||||
get_exe_name();
|
||||
@@ -81,24 +88,39 @@ get_exe_name();
|
||||
std::string
|
||||
get_config_file();
|
||||
|
||||
bool
|
||||
get_debug_env();
|
||||
|
||||
bool
|
||||
get_debug();
|
||||
|
||||
bool
|
||||
bool&
|
||||
get_use_perfetto();
|
||||
|
||||
bool
|
||||
bool&
|
||||
get_use_timemory();
|
||||
|
||||
bool&
|
||||
get_use_roctracer();
|
||||
|
||||
bool&
|
||||
get_use_sampling();
|
||||
|
||||
bool&
|
||||
get_use_pid();
|
||||
|
||||
bool
|
||||
bool&
|
||||
get_use_mpip();
|
||||
|
||||
bool
|
||||
bool&
|
||||
get_use_critical_trace();
|
||||
|
||||
bool
|
||||
get_timeline_sampling();
|
||||
|
||||
bool
|
||||
get_flat_sampling();
|
||||
|
||||
bool
|
||||
get_roctracer_timeline_profile();
|
||||
|
||||
@@ -135,14 +157,20 @@ get_trace_hsa_api_types();
|
||||
std::string&
|
||||
get_backend();
|
||||
|
||||
std::string
|
||||
std::string&
|
||||
get_perfetto_output_filename();
|
||||
|
||||
int64_t
|
||||
get_critical_trace_count();
|
||||
|
||||
size_t&
|
||||
get_sample_rate();
|
||||
get_instrumentation_interval();
|
||||
|
||||
double&
|
||||
get_sampling_freq();
|
||||
|
||||
double&
|
||||
get_sampling_delay();
|
||||
|
||||
int64_t
|
||||
get_critical_trace_per_row();
|
||||
@@ -161,3 +189,4 @@ get_cpu_cid();
|
||||
|
||||
std::unique_ptr<std::vector<uint64_t>>&
|
||||
get_cpu_cid_stack(int64_t _tid = threading::get_id());
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -29,8 +29,10 @@
|
||||
#pragma once
|
||||
|
||||
#include "library/config.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
#include "timemory/tpls/cereal/cereal/cereal.hpp"
|
||||
|
||||
#include <timemory/tpls/cereal/cereal/cereal.hpp>
|
||||
|
||||
#include <cstdint>
|
||||
#include <cstdlib>
|
||||
@@ -38,6 +40,8 @@
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace critical_trace
|
||||
{
|
||||
enum class Device : short
|
||||
@@ -207,3 +211,4 @@ struct id
|
||||
{};
|
||||
|
||||
} // namespace critical_trace
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -28,17 +28,23 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <cstdio>
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <timemory/api.hpp>
|
||||
#include <timemory/backends/dmp.hpp>
|
||||
#include <timemory/backends/process.hpp>
|
||||
#include <timemory/utility/utility.hpp>
|
||||
|
||||
#include <cstdio>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
bool
|
||||
get_debug();
|
||||
|
||||
bool
|
||||
get_critical_trace_debug();
|
||||
} // namespace omnitrace
|
||||
|
||||
#if defined(TIMEMORY_USE_MPI)
|
||||
# define OMNITRACE_CONDITIONAL_PRINT(COND, ...) \
|
||||
@@ -74,7 +80,8 @@ get_critical_trace_debug();
|
||||
fflush(stderr); \
|
||||
}
|
||||
|
||||
#define OMNITRACE_DEBUG(...) OMNITRACE_CONDITIONAL_PRINT(get_debug(), __VA_ARGS__)
|
||||
#define OMNITRACE_DEBUG(...) \
|
||||
OMNITRACE_CONDITIONAL_PRINT(::omnitrace::get_debug(), __VA_ARGS__)
|
||||
#define OMNITRACE_PRINT(...) OMNITRACE_CONDITIONAL_PRINT(true, __VA_ARGS__)
|
||||
#define OMNITRACE_CT_DEBUG(...) \
|
||||
OMNITRACE_CONDITIONAL_PRINT(get_critical_trace_debug(), __VA_ARGS__)
|
||||
OMNITRACE_CONDITIONAL_PRINT(::omnitrace::get_critical_trace_debug(), __VA_ARGS__)
|
||||
|
||||
@@ -33,10 +33,20 @@
|
||||
#define OMNITRACE_HIP_VERSION_MAJOR @HIP_VERSION_MAJOR@
|
||||
#define OMNITRACE_HIP_VERSION_MINOR @HIP_VERSION_MINOR@
|
||||
#define OMNITRACE_HIP_VERSION_PATCH @HIP_VERSION_PATCH@
|
||||
// clang-format on
|
||||
|
||||
#if defined(OMNITRACE_USE_ROCTRACER)
|
||||
# define OMNITRACE_ROCTRACER_LIBKFDWRAPPER "@roctracer_kfdwrapper_LIBRARY@"
|
||||
#else
|
||||
# define OMNITRACE_ROCTRACER_LIBKFDWRAPPER "/opt/rocm/roctracer/lib/libkfdwrapper64.so"
|
||||
#endif
|
||||
// clang-format on
|
||||
|
||||
#define TIMEMORY_USER_COMPONENT_ENUM \
|
||||
OMNITRACE_SAMPLING_WALL_CLOCK_idx, OMNITRACE_SAMPLING_CPU_CLOCK_idx, \
|
||||
OMNITRACE_SAMPLING_PERCENT_idx, OMNITRACE_COMPONENT_idx, OMNITRACE_ROCTRACER_idx,
|
||||
|
||||
#define OMNITRACE_COMPONENT OMNITRACE_COMPONENT_idx
|
||||
#define OMNITRACE_ROCTRACER OMNITRACE_ROCTRACER_idx
|
||||
#define OMNITRACE_SAMPLING_WALL_CLOCK OMNITRACE_SAMPLING_WALL_CLOCK_idx
|
||||
#define OMNITRACE_SAMPLING_CPU_CLOCK OMNITRACE_SAMPLING_CPU_CLOCK_idx
|
||||
#define OMNITRACE_SAMPLING_PERCENT OMNITRACE_SAMPLING_PERCENT_idx
|
||||
|
||||
@@ -29,11 +29,15 @@
|
||||
#pragma once
|
||||
|
||||
#include "library/debug.hpp"
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <timemory/environment.hpp>
|
||||
|
||||
#include <dlfcn.h>
|
||||
#include <string>
|
||||
#include <timemory/environment.hpp>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
struct dynamic_library
|
||||
{
|
||||
dynamic_library() = delete;
|
||||
@@ -69,3 +73,4 @@ struct dynamic_library
|
||||
int flags = 0;
|
||||
void* handle = nullptr;
|
||||
};
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -0,0 +1,32 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace gpu
|
||||
{
|
||||
int
|
||||
device_count();
|
||||
}
|
||||
} // namespace omnitrace
|
||||
@@ -28,6 +28,8 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#if defined(PERFETTO_CATEGORIES)
|
||||
# error "PERFETTO_CATEGORIES is already defined. Please include \"" __FILE__ "\" before including any timemory files"
|
||||
#endif
|
||||
@@ -58,6 +60,8 @@
|
||||
PERFETTO_DEFINE_CATEGORIES(PERFETTO_CATEGORIES);
|
||||
#endif
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
#if defined(CUSTOM_DATA_SOURCE)
|
||||
class CustomDataSource : public perfetto::DataSource<CustomDataSource>
|
||||
{
|
||||
@@ -89,3 +93,4 @@ public:
|
||||
|
||||
PERFETTO_DECLARE_DATA_SOURCE_STATIC_MEMBERS(CustomDataSource);
|
||||
#endif
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -28,11 +28,14 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "PTL/PTL.hh"
|
||||
#include "timemory/macros/attributes.hpp"
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <PTL/PTL.hh>
|
||||
|
||||
#include <mutex>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace tasking
|
||||
{
|
||||
std::mutex&
|
||||
@@ -53,3 +56,4 @@ get_critical_trace_thread_pool();
|
||||
PTL::TaskGroup<void>&
|
||||
get_critical_trace_task_group();
|
||||
} // namespace tasking
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -0,0 +1,76 @@
|
||||
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// with the Software without restriction, including without limitation the
|
||||
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
// sell copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// * Redistributions of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimers.
|
||||
//
|
||||
// * Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimers in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this Software without specific prior written permission.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/common.hpp"
|
||||
#include "library/components/backtrace.hpp"
|
||||
#include "library/components/fwd.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
#include "library/timemory.hpp"
|
||||
|
||||
#include <timemory/macros/language.hpp>
|
||||
#include <timemory/sampling/sampler.hpp>
|
||||
#include <timemory/variadic/types.hpp>
|
||||
|
||||
#include <cstdint>
|
||||
#include <memory>
|
||||
#include <set>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace sampling
|
||||
{
|
||||
using component::backtrace;
|
||||
using component::backtrace_cpu_clock; // NOLINT
|
||||
using component::backtrace_fraction; // NOLINT
|
||||
using component::backtrace_wall_clock; // NOLINT
|
||||
using component::sampling_cpu_clock;
|
||||
using component::sampling_percent;
|
||||
using component::sampling_wall_clock;
|
||||
|
||||
std::set<int>
|
||||
setup();
|
||||
|
||||
std::set<int>
|
||||
shutdown();
|
||||
|
||||
void block_signals(std::set<int> = {});
|
||||
|
||||
void unblock_signals(std::set<int> = {});
|
||||
|
||||
using bundle_t = tim::lightweight_tuple<backtrace>;
|
||||
using sampler_t = tim::sampling::sampler<bundle_t, tim::sampling::dynamic>;
|
||||
using sampler_instances = omnitrace_thread_data<sampler_t, api::sampling>;
|
||||
|
||||
std::unique_ptr<sampler_t>&
|
||||
get_sampler(int64_t _tid = threading::get_id());
|
||||
|
||||
} // namespace sampling
|
||||
} // namespace omnitrace
|
||||
@@ -28,6 +28,10 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/defines.hpp"
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
// used for specifying the state of omnitrace
|
||||
enum class State : unsigned short
|
||||
{
|
||||
@@ -36,3 +40,4 @@ enum class State : unsigned short
|
||||
Active,
|
||||
Finalized
|
||||
};
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -29,6 +29,7 @@
|
||||
#pragma once
|
||||
|
||||
#include "library/config.hpp"
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <array>
|
||||
#include <cstdint>
|
||||
@@ -40,6 +41,8 @@
|
||||
# define OMNITRACE_MAX_THREADS 1024
|
||||
#endif
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
static constexpr size_t max_supported_threads = OMNITRACE_MAX_THREADS;
|
||||
|
||||
template <typename Tp, typename Tag = void, size_t MaxThreads = max_supported_threads>
|
||||
@@ -63,8 +66,10 @@ template <typename... Args>
|
||||
void
|
||||
omnitrace_thread_data<Tp, Tag, MaxThreads>::construct(Args&&... _args)
|
||||
{
|
||||
static thread_local bool _v = [&_args...]() {
|
||||
instances().at(threading::get_id()) =
|
||||
// construct outside of lambda to prevent data-race
|
||||
static auto& _instances = instances();
|
||||
static thread_local bool _v = [&_args...]() {
|
||||
_instances.at(threading::get_id()) =
|
||||
std::make_unique<Tp>(std::forward<Args>(_args)...);
|
||||
return true;
|
||||
}();
|
||||
@@ -124,3 +129,4 @@ struct instrumentation_bundles
|
||||
|
||||
static instance_array_t& instances();
|
||||
};
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -28,36 +28,39 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "library/components/fwd.hpp"
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <timemory/api.hpp>
|
||||
#include <timemory/backends/mpi.hpp>
|
||||
#include <timemory/backends/process.hpp>
|
||||
#include <timemory/backends/threading.hpp>
|
||||
#include <timemory/components.hpp>
|
||||
#include <timemory/components/gotcha/mpip.hpp>
|
||||
#include <timemory/components/papi/papi_tuple.hpp>
|
||||
#include <timemory/config.hpp>
|
||||
#include <timemory/environment.hpp>
|
||||
#include <timemory/manager.hpp>
|
||||
#include <timemory/mpl/apply.hpp>
|
||||
#include <timemory/mpl.hpp>
|
||||
#include <timemory/operations.hpp>
|
||||
#include <timemory/runtime.hpp>
|
||||
#include <timemory/settings.hpp>
|
||||
#include <timemory/storage.hpp>
|
||||
#include <timemory/variadic.hpp>
|
||||
|
||||
namespace audit = tim::audit;
|
||||
namespace comp = tim::component;
|
||||
namespace quirk = tim::quirk;
|
||||
namespace threading = tim::threading;
|
||||
namespace scope = tim::scope;
|
||||
namespace dmp = tim::dmp;
|
||||
namespace process = tim::process;
|
||||
namespace units = tim::units;
|
||||
namespace trait = tim::trait;
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace audit = tim::audit; // NOLINT
|
||||
namespace comp = tim::component; // NOLINT
|
||||
namespace quirk = tim::quirk; // NOLINT
|
||||
namespace threading = tim::threading; // NOLINT
|
||||
namespace scope = tim::scope; // NOLINT
|
||||
namespace dmp = tim::dmp; // NOLINT
|
||||
namespace process = tim::process; // NOLINT
|
||||
namespace units = tim::units; // NOLINT
|
||||
namespace trait = tim::trait; // NOLINT
|
||||
|
||||
// same sort of functionality as python's " ".join([...])
|
||||
#if !defined(JOIN)
|
||||
# define JOIN(...) tim::mpl::apply<std::string>::join(__VA_ARGS__)
|
||||
#endif
|
||||
|
||||
using papi_tot_ins = comp::papi_tuple<PAPI_TOT_INS>;
|
||||
} // namespace omnitrace
|
||||
|
||||
+67
-63
@@ -32,7 +32,7 @@
|
||||
#include "timemory/environment.hpp"
|
||||
#include "timemory/mpl/apply.hpp"
|
||||
#include "timemory/utility/argparse.hpp"
|
||||
#include "timemory/utility/macros.hpp"
|
||||
#include "timemory/utility/demangle.hpp"
|
||||
#include "timemory/utility/popen.hpp"
|
||||
#include "timemory/variadic/macros.hpp"
|
||||
|
||||
@@ -49,9 +49,11 @@
|
||||
|
||||
#include <cstring>
|
||||
#include <limits>
|
||||
#include <memory>
|
||||
#include <numeric>
|
||||
#include <regex>
|
||||
#include <set>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
//
|
||||
@@ -121,23 +123,15 @@ omnitrace_prefork_callback(thread_t* parent, thread_t* child);
|
||||
//
|
||||
// boolean settings
|
||||
//
|
||||
static bool binary_rewrite = 0;
|
||||
static bool loop_level_instr = false;
|
||||
static bool werror = false;
|
||||
static bool stl_func_instr = false;
|
||||
static bool use_mpi = false;
|
||||
static bool is_static_exe = false;
|
||||
static bool use_return_info = false;
|
||||
static bool use_args_info = false;
|
||||
static bool use_file_info = false;
|
||||
static bool use_line_info = false;
|
||||
static bool use_return_info = false;
|
||||
static bool use_args_info = false;
|
||||
static bool use_file_info = false;
|
||||
static bool use_line_info = false;
|
||||
//
|
||||
// integral settings
|
||||
//
|
||||
static bool debug_print = false;
|
||||
static int expect_error = NO_ERROR;
|
||||
static int error_print = 0;
|
||||
static int verbose_level = tim::get_env<int>("TIMEMORY_RUN_VERBOSE", 0);
|
||||
extern bool debug_print;
|
||||
extern int verbose_level;
|
||||
//
|
||||
// string settings
|
||||
//
|
||||
@@ -150,7 +144,6 @@ static string_t prefer_library = {};
|
||||
// global variables
|
||||
//
|
||||
static patch_pointer_t bpatch = {};
|
||||
static call_expr_t* initialize_expr = nullptr;
|
||||
static call_expr_t* terminate_expr = nullptr;
|
||||
static snippet_vec_t init_names = {};
|
||||
static snippet_vec_t fini_names = {};
|
||||
@@ -161,18 +154,18 @@ static regexvec_t func_include = {};
|
||||
static regexvec_t func_exclude = {};
|
||||
static regexvec_t file_include = {};
|
||||
static regexvec_t file_exclude = {};
|
||||
static auto regex_opts = std::regex_constants::egrep | std::regex_constants::optimize;
|
||||
//
|
||||
//======================================================================================//
|
||||
|
||||
// control debug printf statements
|
||||
#define dprintf(...) \
|
||||
if(debug_print || verbose_level > 0) fprintf(stderr, __VA_ARGS__); \
|
||||
if(debug_print || verbose_level > 0) \
|
||||
fprintf(stderr, "[omnitrace][exe] " __VA_ARGS__); \
|
||||
fflush(stderr);
|
||||
|
||||
// control verbose printf statements
|
||||
#define verbprintf(LEVEL, ...) \
|
||||
if(verbose_level >= LEVEL) fprintf(stdout, __VA_ARGS__); \
|
||||
if(verbose_level >= LEVEL) fprintf(stdout, "[omnitrace][exe] " __VA_ARGS__); \
|
||||
fflush(stdout);
|
||||
|
||||
//======================================================================================//
|
||||
@@ -195,6 +188,9 @@ extern "C"
|
||||
|
||||
//======================================================================================//
|
||||
|
||||
strset_t
|
||||
get_whole_function_names();
|
||||
|
||||
function_signature
|
||||
get_func_file_line_info(module_t* mutatee_module, procedure_t* f);
|
||||
|
||||
@@ -217,7 +213,7 @@ void
|
||||
errorFunc(error_level_t level, int num, const char** params);
|
||||
|
||||
procedure_t*
|
||||
find_function(image_t* appImage, const string_t& functionName, strset_t = {});
|
||||
find_function(image_t* appImage, const string_t& functionName, const strset_t& = {});
|
||||
|
||||
void
|
||||
error_func_real(error_level_t level, int num, const char* const* params);
|
||||
@@ -242,15 +238,15 @@ get_absolute_path(const char* fname)
|
||||
|
||||
if(!(p = strrchr((char*) fname, '/')))
|
||||
{
|
||||
auto ret = getcwd(abs_exe_path, sizeof(abs_exe_path));
|
||||
auto* ret = getcwd(abs_exe_path, sizeof(abs_exe_path));
|
||||
consume_parameters(ret);
|
||||
}
|
||||
else
|
||||
{
|
||||
auto rets = getcwd(path_save, sizeof(path_save));
|
||||
auto retf = chdir(fname);
|
||||
auto reta = getcwd(abs_exe_path, sizeof(abs_exe_path));
|
||||
auto retp = chdir(path_save);
|
||||
auto* rets = getcwd(path_save, sizeof(path_save));
|
||||
auto retf = chdir(fname);
|
||||
auto* reta = getcwd(abs_exe_path, sizeof(abs_exe_path));
|
||||
auto retp = chdir(path_save);
|
||||
consume_parameters(rets, retf, reta, retp);
|
||||
}
|
||||
return string_t(abs_exe_path);
|
||||
@@ -285,34 +281,32 @@ struct function_signature
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(function_signature)
|
||||
|
||||
function_signature(string_t _ret, string_t _name, string_t _file,
|
||||
function_signature(string_t _ret, const string_t& _name, string_t _file,
|
||||
location_t _row = { 0, 0 }, location_t _col = { 0, 0 },
|
||||
bool _loop = false, bool _info_beg = false, bool _info_end = false)
|
||||
: m_loop(_loop)
|
||||
, m_info_beg(_info_beg)
|
||||
, m_info_end(_info_end)
|
||||
, m_row(_row)
|
||||
, m_col(_col)
|
||||
, m_return(_ret)
|
||||
, m_row(std::move(_row))
|
||||
, m_col(std::move(_col))
|
||||
, m_return(std::move(_ret))
|
||||
, m_name(tim::demangle(_name))
|
||||
, m_file(_file)
|
||||
, m_file(std::move(_file))
|
||||
{
|
||||
if(m_file.find('/') != string_t::npos)
|
||||
m_file = m_file.substr(m_file.find_last_of('/') + 1);
|
||||
}
|
||||
|
||||
function_signature(string_t _ret, string_t _name, string_t _file,
|
||||
std::vector<string_t> _params, location_t _row = { 0, 0 },
|
||||
location_t _col = { 0, 0 }, bool _loop = false,
|
||||
function_signature(const string_t& _ret, const string_t& _name, const string_t& _file,
|
||||
const std::vector<string_t>& _params, location_t&& _row = { 0, 0 },
|
||||
location_t&& _col = { 0, 0 }, bool _loop = false,
|
||||
bool _info_beg = false, bool _info_end = false)
|
||||
: function_signature(_ret, _name, _file, _row, _col, _loop, _info_beg, _info_end)
|
||||
{
|
||||
std::stringstream ss;
|
||||
ss << "(";
|
||||
for(auto& itr : _params)
|
||||
ss << itr << ", ";
|
||||
m_params = ss.str();
|
||||
m_params = m_params.substr(0, m_params.length() - 2);
|
||||
m_params = "(";
|
||||
for(const auto& itr : _params)
|
||||
m_params.append(itr + ", ");
|
||||
if(!_params.empty()) m_params = m_params.substr(0, m_params.length() - 2);
|
||||
m_params += ")";
|
||||
}
|
||||
|
||||
@@ -373,11 +367,11 @@ struct module_function
|
||||
get_width()[2] = std::max<size_t>(get_width()[2], rhs.signature.get().length());
|
||||
}
|
||||
|
||||
module_function(const string_t& _module, const string_t& _func,
|
||||
const function_signature& _sign, procedure_t* proc)
|
||||
: module(_module)
|
||||
, function(_func)
|
||||
, signature(_sign)
|
||||
module_function(string_t _module, string_t _func, function_signature _sign,
|
||||
procedure_t* proc)
|
||||
: module(std::move(_module))
|
||||
, function(std::move(_func))
|
||||
, signature(std::move(_sign))
|
||||
{
|
||||
if(proc)
|
||||
{
|
||||
@@ -482,35 +476,43 @@ dump_info(std::ostream& _os, const fmodset_t& _data)
|
||||
}
|
||||
//
|
||||
static inline void
|
||||
dump_info(const string_t& _oname, const fmodset_t& _data, int _level)
|
||||
dump_info(const string_t& _oname, const fmodset_t& _data, int _level, bool _fail)
|
||||
{
|
||||
if(!debug_print && verbose_level < _level) return;
|
||||
|
||||
std::ofstream ofs(_oname);
|
||||
std::ofstream ofs{ _oname };
|
||||
if(ofs)
|
||||
{
|
||||
verbprintf(_level, "Dumping '%s'... ", _oname.c_str());
|
||||
dump_info(ofs, _data);
|
||||
verbprintf(_level, "Done\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
std::stringstream _msg{};
|
||||
_msg << "[" << __FUNCTION__ << "] Error opening '" << _oname << " for output";
|
||||
verbprintf(_level, "%s\n", _msg.str().c_str());
|
||||
if(_fail) throw std::runtime_error(_msg.str());
|
||||
}
|
||||
ofs.close();
|
||||
}
|
||||
//
|
||||
//======================================================================================//
|
||||
//
|
||||
template <typename Tp>
|
||||
template <typename Tp, std::enable_if_t<!std::is_same<Tp, std::string>::value, int> = 0>
|
||||
snippet_pointer_t
|
||||
get_snippet(Tp arg)
|
||||
{
|
||||
return snippet_pointer_t(new const_expr_t(arg));
|
||||
return std::make_shared<snippet_t>(const_expr_t{ arg });
|
||||
}
|
||||
//
|
||||
//======================================================================================//
|
||||
//
|
||||
inline snippet_pointer_t
|
||||
get_snippet(string_t arg)
|
||||
template <typename Tp, std::enable_if_t<std::is_same<Tp, std::string>::value, int> = 0>
|
||||
snippet_pointer_t
|
||||
get_snippet(const Tp& arg)
|
||||
{
|
||||
return snippet_pointer_t(new const_expr_t(arg.c_str()));
|
||||
return std::make_shared<snippet_t>(const_expr_t{ arg.c_str() });
|
||||
}
|
||||
//
|
||||
//======================================================================================//
|
||||
@@ -519,7 +521,7 @@ template <typename... Args>
|
||||
snippet_pointer_vec_t
|
||||
get_snippets(Args&&... args)
|
||||
{
|
||||
snippet_pointer_vec_t _tmp;
|
||||
snippet_pointer_vec_t _tmp{};
|
||||
TIMEMORY_FOLD_EXPRESSION(_tmp.push_back(get_snippet(std::forward<Args>(args))));
|
||||
return _tmp;
|
||||
}
|
||||
@@ -587,8 +589,8 @@ private:
|
||||
//======================================================================================//
|
||||
//
|
||||
static inline address_space_t*
|
||||
omnitrace_get_address_space(patch_pointer_t _bpatch, int _cmdc, char** _cmdv,
|
||||
bool _rewrite, int _pid = -1, string_t _name = {})
|
||||
omnitrace_get_address_space(patch_pointer_t& _bpatch, int _cmdc, char** _cmdv,
|
||||
bool _rewrite, int _pid = -1, const string_t& _name = {})
|
||||
{
|
||||
address_space_t* mutatee = nullptr;
|
||||
|
||||
@@ -599,7 +601,8 @@ omnitrace_get_address_space(patch_pointer_t _bpatch, int _cmdc, char** _cmdv,
|
||||
if(!_name.empty()) mutatee = _bpatch->openBinary(_name.c_str(), false);
|
||||
if(!mutatee)
|
||||
{
|
||||
fprintf(stderr, "[omnitrace]> Failed to open binary '%s'\n", _name.c_str());
|
||||
fprintf(stderr, "[omnitrace][exe] Failed to open binary '%s'\n",
|
||||
_name.c_str());
|
||||
throw std::runtime_error("Failed to open binary");
|
||||
}
|
||||
verbprintf(1, "Done\n");
|
||||
@@ -612,7 +615,8 @@ omnitrace_get_address_space(patch_pointer_t _bpatch, int _cmdc, char** _cmdv,
|
||||
mutatee = _bpatch->processAttach(_cmdv0, _pid);
|
||||
if(!mutatee)
|
||||
{
|
||||
fprintf(stderr, "[omnitrace]> Failed to connect to process %i\n", (int) _pid);
|
||||
fprintf(stderr, "[omnitrace][exe] Failed to connect to process %i\n",
|
||||
(int) _pid);
|
||||
throw std::runtime_error("Failed to attach to process");
|
||||
}
|
||||
verbprintf(1, "Done\n");
|
||||
@@ -630,7 +634,7 @@ omnitrace_get_address_space(patch_pointer_t _bpatch, int _cmdc, char** _cmdv,
|
||||
if(!_cmdv[i]) continue;
|
||||
ss << _cmdv[i] << " ";
|
||||
}
|
||||
fprintf(stderr, "[omnitrace]> Failed to create process: '%s'\n",
|
||||
fprintf(stderr, "[omnitrace][exe] Failed to create process: '%s'\n",
|
||||
ss.str().c_str());
|
||||
throw std::runtime_error("Failed to create process");
|
||||
}
|
||||
@@ -651,7 +655,7 @@ omnitrace_thread_exit(thread_t* thread, BPatch_exitType exit_type)
|
||||
|
||||
if(!terminate_expr)
|
||||
{
|
||||
fprintf(stderr, "[omnitrace]> continuing execution\n");
|
||||
fprintf(stderr, "[omnitrace][exe] continuing execution\n");
|
||||
app->continueExecution();
|
||||
return;
|
||||
}
|
||||
@@ -660,18 +664,18 @@ omnitrace_thread_exit(thread_t* thread, BPatch_exitType exit_type)
|
||||
{
|
||||
case ExitedNormally:
|
||||
{
|
||||
fprintf(stderr, "[omnitrace]> Thread exited normally\n");
|
||||
fprintf(stderr, "[omnitrace][exe] Thread exited normally\n");
|
||||
break;
|
||||
}
|
||||
case ExitedViaSignal:
|
||||
{
|
||||
fprintf(stderr, "[omnitrace]> Thread terminated unexpectedly\n");
|
||||
fprintf(stderr, "[omnitrace][exe] Thread terminated unexpectedly\n");
|
||||
break;
|
||||
}
|
||||
case NoExit:
|
||||
default:
|
||||
{
|
||||
fprintf(stderr, "[omnitrace]> %s invoked with NoExit\n", __FUNCTION__);
|
||||
fprintf(stderr, "[omnitrace][exe] %s invoked with NoExit\n", __FUNCTION__);
|
||||
break;
|
||||
}
|
||||
}
|
||||
@@ -679,7 +683,7 @@ omnitrace_thread_exit(thread_t* thread, BPatch_exitType exit_type)
|
||||
// terminate_expr = nullptr;
|
||||
thread->oneTimeCode(*terminate_expr);
|
||||
|
||||
fprintf(stderr, "[omnitrace]> continuing execution\n");
|
||||
fprintf(stderr, "[omnitrace][exe] continuing execution\n");
|
||||
app->continueExecution();
|
||||
}
|
||||
//
|
||||
@@ -703,7 +707,7 @@ omnitrace_fork_callback(thread_t* parent, thread_t* child)
|
||||
|
||||
if(parent)
|
||||
{
|
||||
auto app = parent->getProcess();
|
||||
auto* app = parent->getProcess();
|
||||
if(app)
|
||||
{
|
||||
verbprintf(4, "Continuing execution on parent after fork callback...\n");
|
||||
|
||||
+11
-17
@@ -1,29 +1,23 @@
|
||||
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// with the Software without restriction, including without limitation the
|
||||
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
// sell copies of the Software, and to permit persons to whom the Software is
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// * Redistributions of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimers.
|
||||
//
|
||||
// * Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimers in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this Software without specific prior written permission.
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
@@ -1,15 +1,68 @@
|
||||
#!/bin/bash
|
||||
#!/bin/bash -e
|
||||
|
||||
: ${EXTRA_ARGS:=""}
|
||||
: ${EXTRA_TAGS:=""}
|
||||
: ${VERSION:=0.0.3}
|
||||
: ${ROCM_VERSION:=4.3.0}
|
||||
: ${NJOBS:=8}
|
||||
|
||||
STANDARD_ARGS="-DCPACK_GENERATOR=STGZ -DCMAKE_BUILD_TYPE=Release -DOMNITRACE_BUILD_DYNINST=ON -DTIMEMORY_BUILD_PORTABLE=ON"
|
||||
STANDARD_ARGS="-DCPACK_GENERATOR=STGZ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF -DOMNITRACE_MAX_THREADS=2048 -DOMNITRACE_BUILD_TESTING=OFF -DTIMEMORY_USE_LIBUNWIND=ON -DTIMEMORY_BUILD_LIBUNWIND=ON -DTIMEMORY_BUILD_PORTABLE=ON"
|
||||
STANDARD_ARGS="${STANDARD_ARGS} -DOMNITRACE_BUILD_DYNINST=ON $(echo -DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON)"
|
||||
if [ -n "${EXTRA_ARGS}" ]; then
|
||||
STANDARD_ARGS="${STANDARD_ARGS} ${EXTRA_ARGS}"
|
||||
fi
|
||||
|
||||
cmake -B build-release/core ${STANDARD_ARGS} -DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON -DDYNINST_USE_OpenMP=OFF -DOMNITRACE_USE_MPI_HEADERS=ON -DOMNITRACE_USE_ROCTRACER=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF -DOMNITRACE_MAX_THREADS=2048 .
|
||||
cmake --build build-release/core --target package --parallel 8
|
||||
cp build-release/core/omnitrace-${VERSION}-Linux.sh build-release/omnitrace-${VERSION}-Linux.sh
|
||||
PACKAGE_BASE_TAG=omnitrace-${VERSION}-Linux
|
||||
if [ -n "${EXTRA_TAGS}" ]; then
|
||||
PACKAGE_BASE_TAG="${PACKAGE_BASE_TAG}-${EXTRA_TAGS}"
|
||||
fi
|
||||
|
||||
cmake -B build-release/rocm-mpi ${STANDARD_ARGS} -DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON -DDYNINST_USE_OpenMP=ON -DOMNITRACE_USE_MPI_HEADERS=ON -DOMNITRACE_USE_ROCTRACER=ON -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF -DOMNITRACE_MAX_THREADS=2048 .
|
||||
cmake --build build-release/rocm-mpi --target package --parallel 8
|
||||
cp build-release/rocm-mpi/omnitrace-${VERSION}-Linux.sh build-release/omnitrace-${VERSION}-Linux-ROCm-${ROCM_VERSION}.sh
|
||||
SCRIPT_DIR=$(realpath $(dirname ${BASH_SOURCE[0]}))
|
||||
cd $(dirname ${SCRIPT_DIR})
|
||||
echo -e "Working directory: $(pwd)"
|
||||
|
||||
umask 000
|
||||
|
||||
if [ ! -f build-release/${PACKAGE_BASE_TAG}.sh ]; then
|
||||
cmake -B build-release/core ${STANDARD_ARGS} -DCMAKE_INSTALL_PREFIX=build-release/core/install-release -DDYNINST_USE_OpenMP=OFF -DOMNITRACE_USE_MPI_HEADERS=OFF -DOMNITRACE_USE_ROCTRACER=OFF .
|
||||
cmake --build build-release/core --target package --parallel ${NJOBS}
|
||||
cp build-release/core/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}.sh
|
||||
fi
|
||||
|
||||
apt-get install -y libmpich-dev mpich
|
||||
|
||||
STANDARD_ARGS="${STANDARD_ARGS} -DOMNITRACE_USE_ROCTRACER=ON -DOMNITRACE_USE_MPI_HEADERS=ON -DDYNINST_USE_OpenMP=ON"
|
||||
|
||||
if [ ! -f build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}.sh ]; then
|
||||
cmake -B build-release/rocm-${ROCM_VERSION} -DCMAKE_INSTALL_PREFIX=build-release/rocm-${ROCM_VERSION}/install-release ${STANDARD_ARGS} .
|
||||
cmake --build build-release/rocm-${ROCM_VERSION} --target package --parallel ${NJOBS}
|
||||
cp build-release/rocm-${ROCM_VERSION}/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}.sh
|
||||
fi
|
||||
|
||||
apt-get install -y libpapi-dev libpfm4-dev
|
||||
|
||||
STANDARD_ARGS="${STANDARD_ARGS} -DTIMEMORY_USE_PAPI=ON"
|
||||
|
||||
if [ ! -f build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI.sh ]; then
|
||||
cmake -B build-release/rocm-${ROCM_VERSION}-papi -DCMAKE_INSTALL_PREFIX=build-release/rocm-${ROCM_VERSION}-papi/install-release ${STANDARD_ARGS} .
|
||||
cmake --build build-release/rocm-${ROCM_VERSION}-papi --target package --parallel ${NJOBS}
|
||||
cp build-release/rocm-${ROCM_VERSION}-papi/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI.sh
|
||||
fi
|
||||
|
||||
STANDARD_ARGS="${STANDARD_ARGS} -DOMNITRACE_USE_MPI=ON"
|
||||
apt-get install -y libmpich-dev mpich
|
||||
|
||||
if [ ! -f build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI-MPICH.sh ]; then
|
||||
cmake -B build-release/rocm-${ROCM_VERSION}-papi-mpich -DCMAKE_INSTALL_PREFIX=build-release/rocm-${ROCM_VERSION}-papi-mpich/install-release ${STANDARD_ARGS} .
|
||||
cmake --build build-release/rocm-${ROCM_VERSION}-papi-mpich --target package --parallel ${NJOBS}
|
||||
cp build-release/rocm-${ROCM_VERSION}-papi-mpich/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI-MPICH.sh
|
||||
fi
|
||||
|
||||
apt-get purge -y libmpich-dev mpich
|
||||
apt-get install -y libopenmpi-dev openmpi-bin
|
||||
|
||||
if [ ! -f build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI-OpenMPI.sh ]; then
|
||||
cmake -B build-release/rocm-${ROCM_VERSION}-papi-openmpi -DCMAKE_INSTALL_PREFIX=build-release/rocm-${ROCM_VERSION}-papi-openmpi/install-release ${STANDARD_ARGS} .
|
||||
cmake --build build-release/rocm-${ROCM_VERSION}-papi-openmpi --target package --parallel ${NJOBS}
|
||||
cp build-release/rocm-${ROCM_VERSION}-papi-openmpi/omnitrace-${VERSION}-Linux.sh build-release/${PACKAGE_BASE_TAG}-ROCm-${ROCM_VERSION}-PAPI-OpenMPI.sh
|
||||
fi
|
||||
|
||||
+1389
Plik diff jest za duży
Load Diff
+209
-82
@@ -27,15 +27,26 @@
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library.hpp"
|
||||
#include "library/components/fork_gotcha.hpp"
|
||||
#include "library/components/mpi_gotcha.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/critical_trace.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/gpu.hpp"
|
||||
#include "library/sampling.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
#include "library/timemory.hpp"
|
||||
|
||||
#include <mutex>
|
||||
#include <string_view>
|
||||
|
||||
using namespace omnitrace;
|
||||
|
||||
namespace
|
||||
{
|
||||
std::vector<bool>&
|
||||
get_sample_data()
|
||||
get_interval_data()
|
||||
{
|
||||
static thread_local auto _v = std::vector<bool>{};
|
||||
return _v;
|
||||
@@ -48,33 +59,29 @@ setup_gotchas()
|
||||
if(_initialized) return;
|
||||
_initialized = true;
|
||||
|
||||
OMNITRACE_DEBUG(
|
||||
OMNITRACE_CONDITIONAL_PRINT(
|
||||
get_debug_env(),
|
||||
"[%s] Configuring gotcha wrapper around fork, MPI_Init, and MPI_Init_thread\n",
|
||||
__FUNCTION__);
|
||||
|
||||
fork_gotcha_t::get_initializer() = []() {
|
||||
TIMEMORY_C_GOTCHA(fork_gotcha_t, 0, fork);
|
||||
};
|
||||
|
||||
mpi_gotcha_t::get_initializer() = []() {
|
||||
mpi_gotcha_t::template configure<0, int, int*, char***>("MPI_Init");
|
||||
mpi_gotcha_t::template configure<1, int, int*, char***, int, int*>(
|
||||
"MPI_Init_thread");
|
||||
mpi_gotcha_t::template configure<2, int>("MPI_Finalize");
|
||||
// mpi_gotcha_t::template configure<3, int, tim::mpi::comm_t,
|
||||
// int*>("MPI_Comm_rank");
|
||||
// mpi_gotcha_t::template configure<4, int, tim::mpi::comm_t,
|
||||
// int*>("MPI_Comm_size");
|
||||
};
|
||||
mpi_gotcha::configure();
|
||||
fork_gotcha::configure();
|
||||
pthread_gotcha::configure();
|
||||
}
|
||||
|
||||
auto
|
||||
ensure_finalization(bool _static_init = false)
|
||||
{
|
||||
auto _main_tid = threading::get_id();
|
||||
(void) _main_tid;
|
||||
if(!_static_init)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s]\n", __FUNCTION__);
|
||||
}
|
||||
else
|
||||
{
|
||||
OMNITRACE_CONDITIONAL_PRINT(get_debug_env(), "[%s]\n", __FUNCTION__);
|
||||
}
|
||||
return scope::destructor{ []() { omnitrace_trace_finalize(); } };
|
||||
}
|
||||
|
||||
@@ -127,9 +134,52 @@ omnitrace_init_tooling()
|
||||
if(get_state() != State::PreInit || _once) return false;
|
||||
_once = true;
|
||||
|
||||
auto _tid = threading::get_id();
|
||||
(void) _tid;
|
||||
|
||||
auto _mode = tim::get_env<std::string>("OMNITRACE_MODE", "");
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(true, "Instrumentation mode: %s\n", _mode.c_str());
|
||||
|
||||
// configure the settings
|
||||
configure_settings();
|
||||
|
||||
if(gpu::device_count() == 0)
|
||||
{
|
||||
OMNITRACE_DEBUG("No HIP devices were found: disabling roctracer...\n");
|
||||
get_use_roctracer() = false;
|
||||
}
|
||||
|
||||
if(_mode == "sampling")
|
||||
{
|
||||
OMNITRACE_PRINT(
|
||||
"Disabling perfetto, timemory, and critical trace in %s mode...\n",
|
||||
_mode.c_str());
|
||||
get_use_sampling() = true;
|
||||
get_use_timemory() = false;
|
||||
get_use_perfetto() = false;
|
||||
get_use_roctracer() = false;
|
||||
get_use_critical_trace() = false;
|
||||
}
|
||||
|
||||
auto _dtor = scope::destructor{ []() {
|
||||
if(get_use_sampling())
|
||||
{
|
||||
pthread_gotcha::enable_sampling_on_child_threads() = false;
|
||||
sampling::setup();
|
||||
pthread_gotcha::enable_sampling_on_child_threads() = true;
|
||||
sampling::unblock_signals();
|
||||
}
|
||||
} };
|
||||
|
||||
if(get_use_sampling())
|
||||
{
|
||||
pthread_gotcha::enable_sampling_on_child_threads() = false;
|
||||
sampling::block_signals();
|
||||
}
|
||||
|
||||
OMNITRACE_DEBUG("[%s]\n", __FUNCTION__);
|
||||
|
||||
if(!get_use_timemory() && !get_use_perfetto())
|
||||
if(!get_use_timemory() && !get_use_perfetto() && !get_use_sampling())
|
||||
{
|
||||
get_state() = State::Finalized;
|
||||
OMNITRACE_DEBUG("[%s] Both perfetto and timemory are disabled. Setting the state "
|
||||
@@ -138,20 +188,15 @@ omnitrace_init_tooling()
|
||||
return false;
|
||||
}
|
||||
|
||||
int _threadpool_verbose = (get_debug()) ? 4 : -1;
|
||||
tasking::get_roctracer_thread_pool().set_verbose(_threadpool_verbose);
|
||||
tasking::get_critical_trace_thread_pool().set_verbose(_threadpool_verbose);
|
||||
|
||||
// below will effectively do:
|
||||
// get_cpu_cid_stack(0)->emplace_back(-1);
|
||||
// plus query some env variables
|
||||
add_critical_trace<Device::CPU, Phase::NONE>(0, -1, 0, 0, 0, 0, 0, 0);
|
||||
|
||||
// configure the settings
|
||||
configure_settings();
|
||||
tim::trait::runtime_enabled<comp::roctracer>::set(get_use_roctracer());
|
||||
|
||||
if(get_sample_rate() < 1) get_sample_rate() = 1;
|
||||
get_sample_data().reserve(512);
|
||||
if(get_instrumentation_interval() < 1) get_instrumentation_interval() = 1;
|
||||
get_interval_data().reserve(512);
|
||||
|
||||
if(get_use_timemory())
|
||||
{
|
||||
@@ -178,7 +223,7 @@ omnitrace_init_tooling()
|
||||
}
|
||||
else
|
||||
{
|
||||
tim::trait::runtime_enabled<omnitrace>::set(false);
|
||||
tim::trait::runtime_enabled<api::omnitrace>::set(false);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -186,9 +231,13 @@ omnitrace_init_tooling()
|
||||
auto& _main_bundle = get_main_bundle();
|
||||
_main_bundle->start();
|
||||
assert(_main_bundle->get<mpi_gotcha_t>()->get_is_running());
|
||||
|
||||
#if defined(OMNITRACE_USE_ROCTRACER)
|
||||
assert(_main_bundle->get<comp::roctracer>() != nullptr);
|
||||
assert(_main_bundle->get<comp::roctracer>()->get_is_running());
|
||||
if(get_use_roctracer())
|
||||
{
|
||||
assert(_main_bundle->get<comp::roctracer>() != nullptr);
|
||||
assert(_main_bundle->get<comp::roctracer>()->get_is_running());
|
||||
}
|
||||
#endif
|
||||
|
||||
perfetto::TracingInitArgs args{};
|
||||
@@ -227,7 +276,13 @@ omnitrace_init_tooling()
|
||||
omnitrace_thread_data<omnitrace_thread_bundle_t>::construct(
|
||||
TIMEMORY_JOIN("", _exe, "/thread-", threading::get_id()),
|
||||
quirk::config<quirk::auto_start>{});
|
||||
if(get_use_sampling())
|
||||
{
|
||||
static thread_local auto _once = std::once_flag{};
|
||||
std::call_once(_once, sampling::setup);
|
||||
}
|
||||
static thread_local auto _dtor = scope::destructor{ []() {
|
||||
if(get_use_sampling()) sampling::shutdown();
|
||||
omnitrace_thread_data<omnitrace_thread_bundle_t>::instance()->stop();
|
||||
} };
|
||||
(void) _dtor;
|
||||
@@ -247,15 +302,7 @@ omnitrace_init_tooling()
|
||||
|
||||
static auto _push_perfetto = [](const char* name) {
|
||||
_thread_init();
|
||||
TRACE_EVENT_BEGIN("host", perfetto::StaticString(name),
|
||||
[&](perfetto::EventContext ctx) {
|
||||
// compile-time check
|
||||
IF_CONSTEXPR(trait::is_available<papi_tot_ins>::value)
|
||||
{
|
||||
ctx.event()->set_thread_instruction_count_absolute(
|
||||
papi_tot_ins::record().at(0));
|
||||
}
|
||||
});
|
||||
TRACE_EVENT_BEGIN("host", perfetto::StaticString(name));
|
||||
};
|
||||
|
||||
static auto _pop_timemory = [](const char* name) {
|
||||
@@ -272,15 +319,7 @@ omnitrace_init_tooling()
|
||||
_data.bundles.pop_back();
|
||||
};
|
||||
|
||||
static auto _pop_perfetto = [](const char*) {
|
||||
TRACE_EVENT_END("host", [&](perfetto::EventContext ctx) {
|
||||
IF_CONSTEXPR(trait::is_available<papi_tot_ins>::value)
|
||||
{
|
||||
ctx.event()->set_thread_instruction_count_absolute(
|
||||
papi_tot_ins::record().at(0));
|
||||
}
|
||||
});
|
||||
};
|
||||
static auto _pop_perfetto = [](const char*) { TRACE_EVENT_END("host"); };
|
||||
|
||||
if(get_use_perfetto() && get_use_timemory())
|
||||
{
|
||||
@@ -306,19 +345,50 @@ omnitrace_init_tooling()
|
||||
|
||||
if(dmp::rank() == 0)
|
||||
{
|
||||
static std::set<tim::string_view_t> _sample_options = {
|
||||
"OMNITRACE_SAMPLING_FREQ", "OMNITRACE_SAMPLING_DELAY",
|
||||
"OMNITRACE_FLAT_SAMPLING", "OMNITRACE_TIMELINE_SAMPLING",
|
||||
"OMNITRACE_FLAT_SAMPLING", "OMNITRACE_TIMELINE_SAMPLING",
|
||||
};
|
||||
static std::set<tim::string_view_t> _perfetto_options = {
|
||||
"OMNITRACE_OUTPUT_FILE",
|
||||
"OMNITRACE_BACKEND",
|
||||
"OMNITRACE_SHMEM_SIZE_HINT_KB",
|
||||
"OMNITRACE_BUFFER_SIZE_KB",
|
||||
};
|
||||
static std::set<tim::string_view_t> _timemory_options = {
|
||||
"OMNITRACE_ROCTRACER_FLAT_PROFILE", "OMNITRACE_ROCTRACER_TIMELINE_PROFILE"
|
||||
};
|
||||
// generic filter for filtering relevant options
|
||||
auto _is_omnitrace_option = [](const auto& _v) {
|
||||
#if !defined(OMNITRACE_USE_ROCTRACER)
|
||||
if(_v.find("OMNITRACE_ROCTRACER_") == 0) return false;
|
||||
#endif
|
||||
auto _is_omnitrace_option = [](const auto& _v, const auto& _c) {
|
||||
if(!get_use_roctracer() && _v.find("OMNITRACE_ROCTRACER_") == 0) return false;
|
||||
if(!get_use_critical_trace() && _v.find("OMNITRACE_CRITICAL_TRACE_") == 0)
|
||||
return false;
|
||||
return (_v.find("OMNITRACE_") == 0) ||
|
||||
((_v.find("TIMEMORY_") != 0) && (_v.find("SIGNAL_") != 0));
|
||||
if(!get_use_perfetto() && _perfetto_options.count(_v) > 0) return false;
|
||||
if(!get_use_timemory() && _timemory_options.count(_v) > 0) return false;
|
||||
if(!get_use_sampling() && _sample_options.count(_v) > 0) return false;
|
||||
const auto npos = std::string::npos;
|
||||
if(_v.find("WIDTH") != npos || _v.find("SEPARATOR_FREQ") != npos ||
|
||||
_v.find("AUTO_OUTPUT") != npos || _v.find("DART_OUTPUT") != npos ||
|
||||
_v.find("FILE_OUTPUT") != npos || _v.find("PLOT_OUTPUT") != npos ||
|
||||
_v.find("FLAMEGRAPH_OUTPUT") != npos)
|
||||
return false;
|
||||
if(!_c.empty())
|
||||
{
|
||||
if(_c.find("omnitrace") != _c.end()) return true;
|
||||
if(_c.find("debugging") != _c.end() && _v.find("DEBUG") != npos)
|
||||
return true;
|
||||
if(_c.find("config") != _c.end()) return true;
|
||||
if(_c.find("dart") != _c.end()) return false;
|
||||
if(_c.find("io") != _c.end() && _v.find("_OUTPUT") != npos) return true;
|
||||
if(_c.find("format") != _c.end()) return true;
|
||||
return false;
|
||||
}
|
||||
return (_v.find("OMNITRACE_") == 0);
|
||||
};
|
||||
|
||||
tim::print_env(std::cerr, [_is_omnitrace_option](const std::string& _v) {
|
||||
return _is_omnitrace_option(_v);
|
||||
return _is_omnitrace_option(_v, std::set<std::string>{});
|
||||
});
|
||||
|
||||
print_config_settings(std::cerr, _is_omnitrace_option);
|
||||
@@ -354,6 +424,7 @@ omnitrace_init_tooling()
|
||||
static auto _ensure_finalization = ensure_finalization();
|
||||
|
||||
if(dmp::rank() == 0) puts("");
|
||||
|
||||
return true;
|
||||
}
|
||||
} // namespace
|
||||
@@ -378,10 +449,10 @@ extern "C"
|
||||
OMNITRACE_DEBUG("[%s] %s\n", __FUNCTION__, name);
|
||||
}
|
||||
|
||||
static auto _sample_rate = std::max<size_t>(get_sample_rate(), 1);
|
||||
static thread_local size_t _sample_idx = 0;
|
||||
auto _enabled = (_sample_idx++ % _sample_rate == 0);
|
||||
get_sample_data().emplace_back(_enabled);
|
||||
static auto _sample_rate = std::max<size_t>(get_instrumentation_interval(), 1);
|
||||
static thread_local size_t _sample_idx = 0;
|
||||
auto _enabled = (_sample_idx++ % _sample_rate == 0);
|
||||
get_interval_data().emplace_back(_enabled);
|
||||
if(_enabled) get_functors().first(name);
|
||||
if(get_use_critical_trace())
|
||||
{
|
||||
@@ -405,24 +476,27 @@ extern "C"
|
||||
if(get_state() == State::Active)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] %s\n", __FUNCTION__, name);
|
||||
auto& _sample_data = get_sample_data();
|
||||
if(!_sample_data.empty())
|
||||
auto& _interval_data = get_interval_data();
|
||||
if(!_interval_data.empty())
|
||||
{
|
||||
if(_sample_data.back()) get_functors().second(name);
|
||||
_sample_data.pop_back();
|
||||
if(_interval_data.back()) get_functors().second(name);
|
||||
_interval_data.pop_back();
|
||||
}
|
||||
if(get_use_critical_trace())
|
||||
{
|
||||
if(get_cpu_cid_stack() && !get_cpu_cid_stack()->empty())
|
||||
{
|
||||
auto _ts = comp::wall_clock::record();
|
||||
auto _cid = get_cpu_cid_stack()->back();
|
||||
uint64_t _parent_cid = 0;
|
||||
uint16_t _depth = 0;
|
||||
std::tie(_parent_cid, _depth) = get_cpu_cid_parents().at(_cid);
|
||||
add_critical_trace<Device::CPU, Phase::END>(
|
||||
threading::get_id(), _cid, 0, _parent_cid, _ts, _ts,
|
||||
critical_trace::add_hash_id(name), _depth);
|
||||
auto _cid = get_cpu_cid_stack()->back();
|
||||
if(get_cpu_cid_parents().find(_cid) != get_cpu_cid_parents().end())
|
||||
{
|
||||
uint64_t _parent_cid = 0;
|
||||
uint16_t _depth = 0;
|
||||
auto _ts = comp::wall_clock::record();
|
||||
std::tie(_parent_cid, _depth) = get_cpu_cid_parents().at(_cid);
|
||||
add_critical_trace<Device::CPU, Phase::END>(
|
||||
threading::get_id(), _cid, 0, _parent_cid, _ts, _ts,
|
||||
critical_trace::add_hash_id(name), _depth);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -432,9 +506,10 @@ extern "C"
|
||||
}
|
||||
}
|
||||
|
||||
void omnitrace_trace_init(const char*, bool, const char*)
|
||||
void omnitrace_trace_init(const char* _info, bool _b, const char* _extra)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s]\n", __FUNCTION__);
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s | %s | %s\n",
|
||||
__FUNCTION__, _info, (_b) ? "y" : "n", _extra);
|
||||
omnitrace_init_tooling();
|
||||
}
|
||||
|
||||
@@ -445,13 +520,26 @@ extern "C"
|
||||
|
||||
OMNITRACE_DEBUG("[%s]\n", __FUNCTION__);
|
||||
|
||||
if(get_use_sampling())
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] Shutting down sampling...\n", __FUNCTION__);
|
||||
pthread_gotcha::enable_sampling_on_child_threads() = false;
|
||||
sampling::shutdown();
|
||||
sampling::block_signals();
|
||||
}
|
||||
|
||||
int _threadpool_verbose = (get_debug()) ? 4 : -1;
|
||||
tasking::get_roctracer_thread_pool().set_verbose(_threadpool_verbose);
|
||||
tasking::get_critical_trace_thread_pool().set_verbose(_threadpool_verbose);
|
||||
|
||||
if(dmp::rank() == 0) puts("");
|
||||
|
||||
get_state() = State::Finalized;
|
||||
|
||||
#if defined(OMNITRACE_USE_ROCTRACER)
|
||||
OMNITRACE_DEBUG("[%s] Shutting down roctracer...\n", __FUNCTION__);
|
||||
// ensure that threads running roctracer callbacks shutdown
|
||||
comp::roctracer::tear_down();
|
||||
if(get_use_roctracer()) comp::roctracer::tear_down();
|
||||
#endif
|
||||
|
||||
// join extra thread(s) used by roctracer
|
||||
@@ -459,6 +547,7 @@ extern "C"
|
||||
__FUNCTION__);
|
||||
tasking::get_roctracer_task_group().join();
|
||||
|
||||
OMNITRACE_DEBUG("[%s] Stopping main bundle...\n", __FUNCTION__);
|
||||
// stop the main bundle and report the high-level metrics
|
||||
if(get_main_bundle())
|
||||
{
|
||||
@@ -474,6 +563,7 @@ extern "C"
|
||||
// if they are still running (e.g. thread-pool still alive), the
|
||||
// thread-specific data will be wrong if try to stop them from
|
||||
// the main thread.
|
||||
OMNITRACE_DEBUG("[%s] Destroying thread bundle data...\n", __FUNCTION__);
|
||||
for(auto& itr : omnitrace_thread_data<omnitrace_thread_bundle_t>::instances())
|
||||
{
|
||||
if(itr && itr->get<comp::wall_clock>() &&
|
||||
@@ -487,6 +577,8 @@ extern "C"
|
||||
}
|
||||
|
||||
// ensure that all the MT instances are flushed
|
||||
OMNITRACE_DEBUG("[%s] Stopping and destroying instrumentation bundles...\n",
|
||||
__FUNCTION__);
|
||||
for(auto& itr : instrumentation_bundles::instances())
|
||||
{
|
||||
while(!itr.bundles.empty())
|
||||
@@ -499,8 +591,21 @@ extern "C"
|
||||
}
|
||||
}
|
||||
|
||||
// ensure that all the MT instances are flushed
|
||||
if(get_use_sampling())
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] Post-processing the sampling backtraces...\n",
|
||||
__FUNCTION__);
|
||||
for(size_t i = 0; i < max_supported_threads; ++i)
|
||||
{
|
||||
sampling::backtrace::post_process(i);
|
||||
sampling::get_sampler(i).reset();
|
||||
}
|
||||
}
|
||||
|
||||
if(get_use_critical_trace())
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] Generating the critical trace...\n", __FUNCTION__);
|
||||
// increase the thread-pool size
|
||||
tasking::get_critical_trace_thread_pool().initialize_threadpool(
|
||||
get_critical_trace_num_threads());
|
||||
@@ -540,12 +645,16 @@ extern "C"
|
||||
bool _perfetto_output_error = false;
|
||||
if(get_use_perfetto() && !is_system_backend())
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] Flushing perfetto...\n", __FUNCTION__);
|
||||
// Make sure the last event is closed for this example.
|
||||
perfetto::TrackEvent::Flush();
|
||||
|
||||
auto& tracing_session = get_trace_session();
|
||||
OMNITRACE_DEBUG("[%s] Stopping the blocking perfetto trace sessions...\n",
|
||||
__FUNCTION__);
|
||||
tracing_session->StopBlocking();
|
||||
|
||||
OMNITRACE_DEBUG("[%s] Getting the trace data...\n", __FUNCTION__);
|
||||
std::vector<char> trace_data{ tracing_session->ReadTraceBlocking() };
|
||||
|
||||
if(trace_data.empty())
|
||||
@@ -558,7 +667,7 @@ extern "C"
|
||||
// Write the trace into a file.
|
||||
fprintf(stderr,
|
||||
"[%s]> Outputting '%s'. Trace data: %lu B (%.2f KB / %.2f MB / %.2f "
|
||||
"GB)...\n",
|
||||
"GB)... ",
|
||||
__FUNCTION__, get_perfetto_output_filename().c_str(),
|
||||
(unsigned long) trace_data.size(),
|
||||
static_cast<double>(trace_data.size()) / units::KB,
|
||||
@@ -568,23 +677,33 @@ extern "C"
|
||||
if(!tim::filepath::open(ofs, get_perfetto_output_filename(),
|
||||
std::ios::out | std::ios::binary))
|
||||
{
|
||||
fprintf(stderr, "[%s]> Error opening '%s'...\n", __FUNCTION__,
|
||||
fprintf(stderr, "\n[%s]> Error opening '%s'...\n", __FUNCTION__,
|
||||
get_perfetto_output_filename().c_str());
|
||||
_perfetto_output_error = true;
|
||||
}
|
||||
else
|
||||
{
|
||||
// Write the trace into a file.
|
||||
fprintf(stderr, "Done\n");
|
||||
ofs.write(&trace_data[0], trace_data.size());
|
||||
}
|
||||
ofs.close();
|
||||
}
|
||||
|
||||
// these should be destroyed before timemory is finalized, especially the
|
||||
// roctracer thread-pool
|
||||
OMNITRACE_DEBUG("[%s] Destroing the thread pools...\n", __FUNCTION__);
|
||||
tasking::get_roctracer_thread_pool().destroy_threadpool();
|
||||
tasking::get_critical_trace_thread_pool().destroy_threadpool();
|
||||
|
||||
OMNITRACE_DEBUG("Finalizing timemory...\n");
|
||||
if(get_use_sampling())
|
||||
static_cast<tim::tsettings<bool>*>(
|
||||
tim::settings::instance()->find("OMNITRACE_DEBUG")->second.get())
|
||||
->set(false);
|
||||
|
||||
OMNITRACE_DEBUG("[%s] Finalizing timemory...\n", __FUNCTION__);
|
||||
tim::timemory_finalize();
|
||||
OMNITRACE_DEBUG("Finalizing timemory... Done\n");
|
||||
OMNITRACE_DEBUG("[%s] Finalizing timemory... Done\n", __FUNCTION__);
|
||||
|
||||
if(_perfetto_output_error)
|
||||
throw std::runtime_error("Unable to create perfetto output file");
|
||||
@@ -592,25 +711,32 @@ extern "C"
|
||||
|
||||
void omnitrace_trace_set_env(const char* env_name, const char* env_val)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] Setting env: %s=%s\n", __FUNCTION__, env_name, env_val);
|
||||
// just search env to avoid initializing the settings
|
||||
OMNITRACE_CONDITIONAL_PRINT(get_debug_env(), "[%s] Setting env: %s=%s\n",
|
||||
__FUNCTION__, env_name, env_val);
|
||||
|
||||
tim::set_env(env_name, env_val, 0);
|
||||
}
|
||||
|
||||
void omnitrace_trace_set_mpi(bool use, bool attached)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] use: %s, attached: %s\n", __FUNCTION__, (use) ? "y" : "n",
|
||||
(attached) ? "y" : "n");
|
||||
if(use && !attached)
|
||||
// just search env to avoid initializing the settings
|
||||
OMNITRACE_CONDITIONAL_PRINT(get_debug_env(), "[%s] use: %s, attached: %s\n",
|
||||
__FUNCTION__, (use) ? "y" : "n",
|
||||
(attached) ? "y" : "n");
|
||||
if(use && !attached &&
|
||||
(get_state() == State::PreInit || get_state() == State::DelayedInit))
|
||||
{
|
||||
auto& _main_bundle = get_main_bundle();
|
||||
_main_bundle->start();
|
||||
get_use_pid() = true;
|
||||
get_state() = State::DelayedInit;
|
||||
tim::set_env("OMNITRACE_USE_PID", "ON", 1);
|
||||
get_state() = State::DelayedInit;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
std::unique_ptr<main_bundle_t>&
|
||||
get_main_bundle()
|
||||
{
|
||||
@@ -619,6 +745,7 @@ get_main_bundle()
|
||||
"omnitrace", quirk::config<quirk::auto_start>{}));
|
||||
return _v;
|
||||
}
|
||||
} // namespace omnitrace
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
@@ -0,0 +1,590 @@
|
||||
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// with the Software without restriction, including without limitation the
|
||||
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
// sell copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// * Redistributions of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimers.
|
||||
//
|
||||
// * Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimers in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this Software without specific prior written permission.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/components/fwd.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/ptl.hpp"
|
||||
#include "library/sampling.hpp"
|
||||
|
||||
#include <timemory/backends/papi.hpp>
|
||||
#include <timemory/backends/threading.hpp>
|
||||
#include <timemory/components/data_tracker/components.hpp>
|
||||
#include <timemory/components/macros.hpp>
|
||||
#include <timemory/components/papi/extern.hpp>
|
||||
#include <timemory/components/papi/papi_array.hpp>
|
||||
#include <timemory/components/papi/papi_vector.hpp>
|
||||
#include <timemory/components/timing/backends.hpp>
|
||||
#include <timemory/components/trip_count/extern.hpp>
|
||||
#include <timemory/macros.hpp>
|
||||
#include <timemory/math.hpp>
|
||||
#include <timemory/mpl.hpp>
|
||||
#include <timemory/mpl/quirks.hpp>
|
||||
#include <timemory/mpl/type_traits.hpp>
|
||||
#include <timemory/operations.hpp>
|
||||
#include <timemory/sampling/allocator.hpp>
|
||||
#include <timemory/sampling/sampler.hpp>
|
||||
#include <timemory/storage.hpp>
|
||||
#include <timemory/utility/backtrace.hpp>
|
||||
#include <timemory/utility/demangle.hpp>
|
||||
#include <timemory/utility/types.hpp>
|
||||
#include <timemory/variadic.hpp>
|
||||
|
||||
#include <array>
|
||||
#include <cstring>
|
||||
#include <ctime>
|
||||
#include <initializer_list>
|
||||
#include <mutex>
|
||||
#include <regex>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
#include <type_traits>
|
||||
|
||||
#include <pthread.h>
|
||||
#include <signal.h>
|
||||
|
||||
namespace
|
||||
{
|
||||
template <typename... Tp>
|
||||
struct ensure_storage
|
||||
{
|
||||
TIMEMORY_DEFAULT_OBJECT(ensure_storage)
|
||||
|
||||
void operator()() const { TIMEMORY_FOLD_EXPRESSION((*this)(tim::type_list<Tp>{})); }
|
||||
|
||||
private:
|
||||
template <typename Up, std::enable_if_t<tim::trait::is_available<Up>::value, int> = 0>
|
||||
void operator()(tim::type_list<Up>) const
|
||||
{
|
||||
using namespace tim;
|
||||
static thread_local auto _storage = operation::get_storage<Up>{}();
|
||||
static thread_local auto _tid = threading::get_id();
|
||||
static thread_local auto _dtor =
|
||||
scope::destructor{ []() { operation::set_storage<Up>{}(nullptr, _tid); } };
|
||||
|
||||
tim::operation::set_storage<Up>{}(_storage, _tid);
|
||||
if(_tid == 0 && !_storage) tim::trait::runtime_enabled<Up>::set(false);
|
||||
}
|
||||
|
||||
template <typename Up,
|
||||
std::enable_if_t<!tim::trait::is_available<Up>::value, long> = 0>
|
||||
void operator()(tim::type_list<Up>) const
|
||||
{
|
||||
tim::trait::runtime_enabled<Up>::set(false);
|
||||
}
|
||||
};
|
||||
} // namespace
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace component
|
||||
{
|
||||
using signal_type_instances = omnitrace_thread_data<std::set<int>, api::sampling>;
|
||||
using backtrace_init_instances = omnitrace_thread_data<backtrace, api::sampling>;
|
||||
using sampler_running_instances = omnitrace_thread_data<bool, api::sampling>;
|
||||
using papi_vector_instances = omnitrace_thread_data<comp::papi_vector, api::sampling>;
|
||||
|
||||
namespace
|
||||
{
|
||||
std::unique_ptr<comp::papi_vector>&
|
||||
get_papi_vector(int64_t _tid)
|
||||
{
|
||||
static auto& _v = papi_vector_instances::instances();
|
||||
if(_tid == threading::get_id()) papi_vector_instances::construct();
|
||||
return _v.at(_tid);
|
||||
}
|
||||
|
||||
std::unique_ptr<backtrace>&
|
||||
get_backtrace_init(int64_t _tid)
|
||||
{
|
||||
static auto& _v = backtrace_init_instances::instances();
|
||||
return _v.at(_tid);
|
||||
}
|
||||
|
||||
std::unique_ptr<bool>&
|
||||
get_sampler_running(int64_t _tid)
|
||||
{
|
||||
static auto& _v = sampler_running_instances::instances();
|
||||
return _v.at(_tid);
|
||||
}
|
||||
|
||||
std::unique_ptr<std::set<int>>&
|
||||
get_signal_types(int64_t _tid)
|
||||
{
|
||||
static auto& _v = signal_type_instances::instances();
|
||||
// on the main thread, use both SIGALRM and SIGPROF.
|
||||
// on secondary threads, only use SIGPROF.
|
||||
signal_type_instances::construct((_tid == 0) ? std::set<int>{ SIGALRM, SIGPROF }
|
||||
: std::set<int>{ SIGPROF });
|
||||
return _v.at(_tid);
|
||||
}
|
||||
} // namespace
|
||||
|
||||
bool
|
||||
backtrace::operator<(const backtrace& rhs) const
|
||||
{
|
||||
return (m_ts == rhs.m_ts) ? (m_tid < rhs.m_tid) : (m_ts < rhs.m_ts);
|
||||
}
|
||||
|
||||
std::vector<std::string>
|
||||
backtrace::get() const
|
||||
{
|
||||
std::vector<std::string> _v{};
|
||||
_v.reserve(m_size);
|
||||
for(size_t i = 0; i < m_size; ++i)
|
||||
_v.emplace_back(m_data.at(i));
|
||||
return _v;
|
||||
}
|
||||
|
||||
void
|
||||
backtrace::preinit()
|
||||
{
|
||||
sampling_wall_clock::label() = "sampling_wall_clock";
|
||||
sampling_wall_clock::description() = "Wall clock time (via sampling)";
|
||||
|
||||
sampling_cpu_clock::label() = "sampling_cpu_clock";
|
||||
sampling_cpu_clock::description() = "CPU clock time (via sampling)";
|
||||
|
||||
sampling_percent::label() = "sampling_percent";
|
||||
sampling_percent::description() = "Percentage of samples";
|
||||
sampling_percent::display_unit() = "%";
|
||||
}
|
||||
|
||||
std::string
|
||||
backtrace::label()
|
||||
{
|
||||
return "backtrace";
|
||||
}
|
||||
|
||||
std::string
|
||||
backtrace::description()
|
||||
{
|
||||
return "Records backtrace data";
|
||||
}
|
||||
|
||||
void
|
||||
backtrace::start()
|
||||
{}
|
||||
|
||||
void
|
||||
backtrace::stop()
|
||||
{}
|
||||
|
||||
bool
|
||||
backtrace::empty() const
|
||||
{
|
||||
return (m_size == 0);
|
||||
}
|
||||
|
||||
size_t
|
||||
backtrace::size() const
|
||||
{
|
||||
return m_size;
|
||||
}
|
||||
|
||||
backtrace::time_point_type
|
||||
backtrace::get_timestamp() const
|
||||
{
|
||||
return m_ts;
|
||||
}
|
||||
|
||||
int64_t
|
||||
backtrace::get_thread_cpu_timestamp() const
|
||||
{
|
||||
return m_thr_cpu_ts;
|
||||
}
|
||||
|
||||
void
|
||||
backtrace::sample(int signum)
|
||||
{
|
||||
static bool _debug = tim::get_env<bool>("OMNITRACE_DEBUG_SAMPLING", get_debug());
|
||||
if(_debug)
|
||||
{
|
||||
static auto _timestamp_str = [](const auto& _tp) {
|
||||
char _repr[64];
|
||||
std::memset(_repr, '\0', sizeof(_repr));
|
||||
std::time_t _value = system_clock::to_time_t(_tp);
|
||||
// alternative: "%c %Z"
|
||||
if(std::strftime(_repr, sizeof(_repr), "%a %b %d %T %Y %Z",
|
||||
std::localtime(&_value)) > 0)
|
||||
return std::string{ _repr };
|
||||
return std::string{};
|
||||
};
|
||||
|
||||
static thread_local size_t _tot = 0;
|
||||
static thread_local auto _last = system_clock::now();
|
||||
auto _now = system_clock::now();
|
||||
auto _diff = (_now - _last).count();
|
||||
_last = _now;
|
||||
_tot += _diff;
|
||||
|
||||
OMNITRACE_PRINT(
|
||||
"Sample on signal %i taken at %s after interval %zu :: total %zu\n", signum,
|
||||
_timestamp_str(_now).c_str(), _diff, _tot);
|
||||
}
|
||||
|
||||
m_size = 0;
|
||||
m_tid = threading::get_id();
|
||||
m_ts = clock_type::now();
|
||||
m_thr_cpu_ts = tim::get_clock_thread_now<int64_t, std::nano>();
|
||||
m_data = tim::get_unw_backtrace<128, 4, false>();
|
||||
auto* itr = m_data.begin();
|
||||
for(; itr != m_data.end(); ++itr, ++m_size)
|
||||
{
|
||||
if(strlen(*itr) == 0) break;
|
||||
}
|
||||
std::reverse(m_data.begin(), itr);
|
||||
if(!get_debug())
|
||||
{
|
||||
bool _ignore = false;
|
||||
for(auto& itr : m_data)
|
||||
{
|
||||
if(strlen(itr) == 0) break;
|
||||
if(strncmp(itr, "funlockfile", 11) == 0) _ignore = true;
|
||||
if(_ignore && strlen(itr) > 0)
|
||||
{
|
||||
OMNITRACE_DEBUG("Discarding sample: '%s'...\n", itr);
|
||||
itr[0] = '\0';
|
||||
--m_size;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if constexpr(tim::trait::is_available<comp::papi_vector>::value)
|
||||
{
|
||||
assert(get_papi_vector(m_tid).get() != nullptr);
|
||||
static thread_local auto& _pv = get_papi_vector(m_tid);
|
||||
auto _hw_counter = _pv->record();
|
||||
for(size_t i = 0; i < std::min<size_t>(_hw_counter.size(), num_hw_counters); ++i)
|
||||
{
|
||||
auto& _last = get_last_hwcounters().at(i);
|
||||
auto itr = _hw_counter.at(i);
|
||||
m_hw_counter[i] = itr - _last;
|
||||
_last = itr;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
std::set<int>
|
||||
backtrace::configure(bool _setup, int64_t _tid)
|
||||
{
|
||||
auto& _sampler = sampling::get_sampler(_tid);
|
||||
auto& _running = get_sampler_running(_tid);
|
||||
bool _is_running = (!_running) ? false : *_running;
|
||||
auto& _signal_types = get_signal_types(_tid);
|
||||
|
||||
ensure_storage<comp::trip_count, sampling_wall_clock, sampling_cpu_clock, hw_counters,
|
||||
sampling_percent>{}();
|
||||
|
||||
if(_setup && !_sampler && !_is_running)
|
||||
{
|
||||
assert(_tid == threading::get_id());
|
||||
sampling::block_signals(*_signal_types);
|
||||
if constexpr(tim::trait::is_available<comp::papi_vector>::value)
|
||||
{
|
||||
OMNITRACE_DEBUG("HW COUNTER: starting...\n");
|
||||
if(get_papi_vector(_tid)) get_papi_vector(_tid)->start();
|
||||
}
|
||||
|
||||
auto _alrm_freq = 1.0 / std::min<double>(get_sampling_freq(), 10.0);
|
||||
auto _prof_freq = 1.0 / get_sampling_freq();
|
||||
auto _delay = std::max<double>(1.0e-3, get_sampling_delay());
|
||||
|
||||
OMNITRACE_DEBUG("Configuring sampler for thread %lu...\n", _tid);
|
||||
sampler_running_instances::construct(true);
|
||||
backtrace_init_instances::construct();
|
||||
sampling::sampler_instances::construct("omnitrace", _tid, *_signal_types);
|
||||
_sampler->set_signals(*_signal_types);
|
||||
_sampler->set_flags(SA_RESTART);
|
||||
_sampler->set_delay(_delay);
|
||||
_sampler->set_frequency(_prof_freq, { SIGPROF });
|
||||
_sampler->set_frequency(_alrm_freq, { SIGALRM });
|
||||
|
||||
OMNITRACE_DEBUG("Sampler for thread %lu will be triggered %5.1fx per second "
|
||||
"(every %5.2e seconds)...\n",
|
||||
_tid, _sampler->get_frequency(units::sec),
|
||||
_sampler->get_rate(units::sec));
|
||||
|
||||
(void) sampling::sampler_t::get_samplers(_tid);
|
||||
get_backtrace_init(_tid)->sample();
|
||||
_sampler->configure(false);
|
||||
_sampler->start();
|
||||
}
|
||||
else if(!_setup && _sampler && _is_running)
|
||||
{
|
||||
OMNITRACE_DEBUG("Destroying sampler for thread %lu...\n", _tid);
|
||||
*_running = false;
|
||||
|
||||
if(_tid == threading::get_id())
|
||||
{
|
||||
sampling::block_signals(*_signal_types);
|
||||
}
|
||||
|
||||
// this propagates to all threads
|
||||
if(_tid == 0) _sampler->ignore(*_signal_types);
|
||||
|
||||
_sampler->stop();
|
||||
_sampler->swap_data();
|
||||
if constexpr(tim::trait::is_available<comp::papi_vector>::value)
|
||||
{
|
||||
if(_tid == threading::get_id())
|
||||
{
|
||||
if(get_papi_vector(_tid)) get_papi_vector(_tid)->stop();
|
||||
OMNITRACE_DEBUG("HW COUNTER: stopped...\n");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return (_signal_types) ? *_signal_types : std::set<int>{};
|
||||
}
|
||||
|
||||
backtrace::hw_counter_data_t&
|
||||
backtrace::get_last_hwcounters()
|
||||
{
|
||||
static thread_local auto _v = hw_counter_data_t{ 0 };
|
||||
return _v;
|
||||
}
|
||||
|
||||
void
|
||||
backtrace::post_process(int64_t _tid)
|
||||
{
|
||||
configure(false, _tid);
|
||||
|
||||
auto& _sampler = sampling::sampler_instances::instances().at(_tid);
|
||||
if(!_sampler)
|
||||
{
|
||||
// this should be relatively common
|
||||
OMNITRACE_DEBUG(
|
||||
"Post-processing sampling entries for thread %lu skipped (no sampler)\n",
|
||||
_tid);
|
||||
return;
|
||||
}
|
||||
|
||||
auto& _init = backtrace_init_instances::instances().at(_tid);
|
||||
if(!_init)
|
||||
{
|
||||
// this is not common
|
||||
OMNITRACE_PRINT(
|
||||
"Post-processing sampling entries for thread %lu skipped (not initialized)\n",
|
||||
_tid);
|
||||
return;
|
||||
}
|
||||
|
||||
// check whether the call-stack entry should be used. -1 means break, 0 means continue
|
||||
auto _use_label = [](const std::string& _lbl, bool _check_internal) -> short {
|
||||
// debugging feature
|
||||
static bool _keep_internal =
|
||||
tim::get_env<bool>("OMNITRACE_SAMPLING_KEEP_INTERNAL", get_debug());
|
||||
if(_keep_internal) return 1;
|
||||
const auto _npos = std::string::npos;
|
||||
if(_lbl.find("omnitrace_init_tooling") != _npos) return -1;
|
||||
if(_check_internal)
|
||||
{
|
||||
if(std::regex_search(
|
||||
_lbl, std::regex("(14pthread_gotcha7wrapper|default_error_condition)",
|
||||
std::regex_constants::optimize)))
|
||||
return 0;
|
||||
else if(std::regex_search(
|
||||
_lbl, std::regex("(8sampling9backtrace9configure|"
|
||||
"8sampling15unblock_signals|pthread_sigmask)",
|
||||
std::regex_constants::optimize)))
|
||||
return 0;
|
||||
}
|
||||
return 1;
|
||||
};
|
||||
|
||||
// in the dyninst binary rewrite runtime, instrumented functions are appended with
|
||||
// "_dyninst", i.e. "main" will show up as "main_dyninst" in the backtrace.
|
||||
auto _patch_label = [](std::string _lbl) -> std::string {
|
||||
// debugging feature
|
||||
static bool _keep_suffix =
|
||||
tim::get_env<bool>("OMNITRACE_SAMPLING_KEEP_DYNINST_SUFFIX", get_debug());
|
||||
if(_keep_suffix) return _lbl;
|
||||
const std::string _dyninst{ "_dyninst" };
|
||||
auto _pos = _lbl.find(_dyninst);
|
||||
if(_pos == std::string::npos) return _lbl;
|
||||
return _lbl.replace(_pos, _dyninst.length(), "");
|
||||
};
|
||||
|
||||
auto _data = _sampler->get_allocator().get_data();
|
||||
// single sample that is useless (backtrace to unblocking signals)
|
||||
if(_data.size() == 1 && _data.front().size() <= 1) _data.clear();
|
||||
OMNITRACE_DEBUG("Post-processing %zu sampling entries for thread %lu...\n",
|
||||
_data.size(), _tid);
|
||||
|
||||
std::map<int64_t, std::map<int64_t, int64_t>> _depth_sum = {};
|
||||
auto _scope = tim::scope::config{};
|
||||
if(get_timeline_sampling()) _scope += scope::timeline{};
|
||||
if(get_flat_sampling()) _scope += scope::flat{};
|
||||
|
||||
time_point_type _last_wall_ts = _init->get_timestamp();
|
||||
int64_t _last_cpu_ts = _init->get_thread_cpu_timestamp();
|
||||
for(auto& ditr : _data)
|
||||
{
|
||||
using bundle_t = tim::lightweight_tuple<comp::trip_count, sampling_wall_clock,
|
||||
sampling_cpu_clock, hw_counters>;
|
||||
|
||||
for(auto& ritr : ditr)
|
||||
{
|
||||
auto* _bt = ritr.get<backtrace>();
|
||||
if(!_bt)
|
||||
{
|
||||
OMNITRACE_PRINT(
|
||||
"Warning! Nullptr to backtrace instance for thread %lu...\n", _tid);
|
||||
continue;
|
||||
}
|
||||
|
||||
if(_bt->empty()) continue;
|
||||
|
||||
double _elapsed_wc = (_bt->m_ts - _last_wall_ts).count();
|
||||
double _elapsed_cc = (_bt->m_thr_cpu_ts - _last_cpu_ts);
|
||||
|
||||
std::vector<bundle_t> _tc{};
|
||||
_tc.reserve(_bt->size());
|
||||
|
||||
// generate the instances of the tuple of components and start them
|
||||
for(const auto& itr : _bt->get())
|
||||
{
|
||||
auto _lbl = _patch_label(itr);
|
||||
auto _use = _use_label(_lbl, !_tc.empty() &&
|
||||
(_tc.back().key() == "start_thread" ||
|
||||
_tc.back().key() == "clone"));
|
||||
if(_use == -1) break;
|
||||
if(_use == 0) continue;
|
||||
_tc.emplace_back(tim::string_view_t{ _lbl }, _scope);
|
||||
_tc.back().push(_bt->m_tid);
|
||||
_tc.back().start();
|
||||
}
|
||||
|
||||
// stop the instances and update the values as needed
|
||||
for(size_t i = 0; i < _tc.size(); ++i)
|
||||
{
|
||||
auto& itr = _tc.at(_tc.size() - i - 1);
|
||||
size_t _depth = 0;
|
||||
_depth_sum[_bt->m_tid][_depth] += 1;
|
||||
itr.stop();
|
||||
if constexpr(tim::trait::is_available<sampling_wall_clock>::value)
|
||||
{
|
||||
auto* _sc = itr.get<sampling_wall_clock>();
|
||||
if(_sc)
|
||||
{
|
||||
auto _value = _elapsed_wc / sampling_wall_clock::get_unit();
|
||||
_sc->set_value(_value);
|
||||
_sc->set_accum(_value);
|
||||
}
|
||||
}
|
||||
if constexpr(tim::trait::is_available<sampling_cpu_clock>::value)
|
||||
{
|
||||
auto* _cc = itr.get<sampling_cpu_clock>();
|
||||
if(_cc)
|
||||
{
|
||||
_cc->set_value(_elapsed_cc / sampling_cpu_clock::get_unit());
|
||||
_cc->set_accum(_elapsed_cc / sampling_cpu_clock::get_unit());
|
||||
}
|
||||
}
|
||||
if constexpr(tim::trait::is_available<hw_counters>::value)
|
||||
{
|
||||
auto* _hw_counter = itr.get<hw_counters>();
|
||||
if(_hw_counter)
|
||||
{
|
||||
_hw_counter->set_value(_bt->m_hw_counter);
|
||||
_hw_counter->set_accum(_bt->m_hw_counter);
|
||||
}
|
||||
}
|
||||
itr.pop();
|
||||
}
|
||||
_last_wall_ts = _bt->m_ts;
|
||||
_last_cpu_ts = _bt->m_thr_cpu_ts;
|
||||
}
|
||||
}
|
||||
|
||||
namespace quirk = tim::quirk;
|
||||
|
||||
for(auto&& ditr : _data)
|
||||
{
|
||||
using bundle_t =
|
||||
tim::lightweight_tuple<sampling_percent, quirk::config<quirk::tree_scope>>;
|
||||
|
||||
for(auto& ritr : ditr)
|
||||
{
|
||||
auto* _bt = ritr.get<backtrace>();
|
||||
if(!_bt)
|
||||
{
|
||||
OMNITRACE_PRINT(
|
||||
"Warning! Nullptr to backtrace instance for thread %lu...\n", _tid);
|
||||
continue;
|
||||
}
|
||||
|
||||
if(_bt->empty()) continue;
|
||||
|
||||
std::vector<bundle_t> _tc{};
|
||||
_tc.reserve(_bt->size());
|
||||
|
||||
// generate the instances of the tuple of components and start them
|
||||
for(const auto& itr : _bt->get())
|
||||
{
|
||||
auto _lbl = _patch_label(itr);
|
||||
auto _use =
|
||||
_use_label(_lbl, !_tc.empty() && _tc.back().key() == "start_thread");
|
||||
if(_use == -1) break;
|
||||
if(_use == 0) continue;
|
||||
_tc.emplace_back(tim::string_view_t{ _lbl });
|
||||
_tc.back().push(_bt->m_tid);
|
||||
_tc.back().start();
|
||||
}
|
||||
|
||||
// stop the instances and update the values as needed
|
||||
for(size_t i = 0; i < _tc.size(); ++i)
|
||||
{
|
||||
auto& itr = _tc.at(_tc.size() - i - 1);
|
||||
size_t _depth = 0;
|
||||
double _value = (1.0 / _depth_sum[_bt->m_tid][_depth]) * 100.0;
|
||||
itr.store(std::plus<double>{}, _value);
|
||||
itr.stop();
|
||||
itr.pop();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace component
|
||||
} // namespace omnitrace
|
||||
|
||||
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
|
||||
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_wall_clock>), true,
|
||||
double)
|
||||
|
||||
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
|
||||
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_cpu_clock>), true,
|
||||
double)
|
||||
|
||||
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
|
||||
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_fraction>), true,
|
||||
double)
|
||||
|
||||
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::backtrace)
|
||||
@@ -26,21 +26,38 @@
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/fork_gotcha.hpp"
|
||||
#include "library/components/fork_gotcha.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
void
|
||||
fork_gotcha::configure()
|
||||
{
|
||||
fork_gotcha_t::get_initializer() = []() {
|
||||
TIMEMORY_C_GOTCHA(fork_gotcha_t, 0, fork);
|
||||
};
|
||||
|
||||
pthread_gotcha_t::get_initializer() = []() {
|
||||
TIMEMORY_C_GOTCHA(pthread_gotcha_t, 0, pthread_create);
|
||||
};
|
||||
}
|
||||
|
||||
void
|
||||
fork_gotcha::audit(const gotcha_data_t&, audit::incoming)
|
||||
{
|
||||
OMNITRACE_DEBUG(
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(
|
||||
get_debug_env(),
|
||||
"Warning! Calling fork() within an OpenMPI application using libfabric "
|
||||
"may result is segmentation fault\n");
|
||||
TIMEMORY_CONDITIONAL_DEMANGLED_BACKTRACE(get_debug(), 16);
|
||||
TIMEMORY_CONDITIONAL_DEMANGLED_BACKTRACE(get_debug_env(), 16);
|
||||
}
|
||||
|
||||
void
|
||||
fork_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, pid_t _pid)
|
||||
{
|
||||
OMNITRACE_DEBUG("%s() return PID %i\n", _data.tool_id.c_str(), (int) _pid);
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "%s() return PID %i\n",
|
||||
_data.tool_id.c_str(), (int) _pid);
|
||||
}
|
||||
} // namespace omnitrace
|
||||
@@ -26,11 +26,14 @@
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/mpi_gotcha.hpp"
|
||||
#include "library/components/mpi_gotcha.hpp"
|
||||
#include "library/components/omnitrace.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/omnitrace_component.hpp"
|
||||
#include "timemory/backends/mpi.hpp"
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace
|
||||
{
|
||||
uint64_t mpip_index = std::numeric_limits<uint64_t>::max();
|
||||
@@ -45,10 +48,11 @@ omnitrace_mpi_set_attr()
|
||||
return MPI_SUCCESS;
|
||||
};
|
||||
static auto _mpi_fini = [](MPI_Comm, int, void*, void*) {
|
||||
OMNITRACE_DEBUG("MPI Comm attribute finalize\n");
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(),
|
||||
"MPI Comm attribute finalize\n");
|
||||
if(mpip_index != std::numeric_limits<uint64_t>::max())
|
||||
comp::deactivate_mpip<tim::component_tuple<omnitrace_component>, omnitrace>(
|
||||
mpip_index);
|
||||
comp::deactivate_mpip<tim::component_tuple<omnitrace::component::omnitrace>,
|
||||
api::omnitrace>(mpip_index);
|
||||
if(!mpi_init_string.empty()) omnitrace_pop_trace(mpi_init_string.c_str());
|
||||
mpi_init_string = {};
|
||||
omnitrace_trace_finalize();
|
||||
@@ -65,46 +69,75 @@ omnitrace_mpi_set_attr()
|
||||
}
|
||||
} // namespace
|
||||
|
||||
void
|
||||
mpi_gotcha::configure()
|
||||
{
|
||||
mpi_gotcha_t::get_initializer() = []() {
|
||||
mpi_gotcha_t::template configure<0, int, int*, char***>("MPI_Init");
|
||||
mpi_gotcha_t::template configure<1, int, int*, char***, int, int*>(
|
||||
"MPI_Init_thread");
|
||||
mpi_gotcha_t::template configure<2, int>("MPI_Finalize");
|
||||
mpi_gotcha_t::template configure<3, int, tim::mpi::comm_t, int*>("MPI_Comm_rank");
|
||||
mpi_gotcha_t::template configure<4, int, tim::mpi::comm_t, int*>("MPI_Comm_size");
|
||||
};
|
||||
}
|
||||
|
||||
void
|
||||
mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, int*, char***)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] %s(int*, char***)\n", __FUNCTION__, _data.tool_id.c_str());
|
||||
if(get_state() == ::State::DelayedInit)
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s(int*, char***)\n",
|
||||
__FUNCTION__, _data.tool_id.c_str());
|
||||
if(get_state() == ::omnitrace::State::DelayedInit)
|
||||
{
|
||||
get_state() = ::State::PreInit;
|
||||
get_state() = ::omnitrace::State::PreInit;
|
||||
mpi_init_string = _data.tool_id;
|
||||
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
|
||||
tim::mpi::is_initialized_callback() = []() { return true; };
|
||||
tim::mpi::is_finalized() = false;
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, int*, char***, int, int*)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] %s(int*, char***, int, int*)\n", __FUNCTION__,
|
||||
_data.tool_id.c_str());
|
||||
if(get_state() == ::State::DelayedInit)
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(),
|
||||
"[%s] %s(int*, char***, int, int*)\n", __FUNCTION__,
|
||||
_data.tool_id.c_str());
|
||||
if(get_state() == ::omnitrace::State::DelayedInit)
|
||||
{
|
||||
get_state() = ::State::PreInit;
|
||||
get_state() = ::omnitrace::State::PreInit;
|
||||
mpi_init_string = _data.tool_id;
|
||||
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
|
||||
tim::mpi::is_initialized_callback() = []() { return true; };
|
||||
tim::mpi::is_finalized() = false;
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] %s()\n", __FUNCTION__, _data.tool_id.c_str());
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s()\n", __FUNCTION__,
|
||||
_data.tool_id.c_str());
|
||||
if(mpip_index != std::numeric_limits<uint64_t>::max())
|
||||
comp::deactivate_mpip<tim::component_tuple<omnitrace_component>, omnitrace>(
|
||||
mpip_index);
|
||||
comp::deactivate_mpip<tim::component_tuple<omnitrace::component::omnitrace>,
|
||||
api::omnitrace>(mpip_index);
|
||||
if(!mpi_init_string.empty()) omnitrace_pop_trace(mpi_init_string.c_str());
|
||||
mpi_init_string = {};
|
||||
omnitrace_trace_finalize();
|
||||
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
|
||||
tim::mpi::is_initialized_callback() = []() { return false; };
|
||||
tim::mpi::is_finalized() = true;
|
||||
#endif
|
||||
}
|
||||
|
||||
void
|
||||
mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, comm_t _comm, int* _val)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] %s()\n", __FUNCTION__, _data.tool_id.c_str());
|
||||
m_comm = _comm;
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s()\n", __FUNCTION__,
|
||||
_data.tool_id.c_str());
|
||||
m_comm = &_comm;
|
||||
if(_data.tool_id == "MPI_Comm_rank")
|
||||
{
|
||||
m_rank = _val;
|
||||
@@ -123,9 +156,9 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, comm_t _comm, int
|
||||
void
|
||||
mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] %s() returned %i\n", __FUNCTION__, _data.tool_id.c_str(),
|
||||
(int) _retval);
|
||||
if(_retval == tim::mpi::success_v && get_state() == ::State::PreInit &&
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(get_debug_env(), "[%s] %s() returned %i\n",
|
||||
__FUNCTION__, _data.tool_id.c_str(), (int) _retval);
|
||||
if(_retval == tim::mpi::success_v && get_state() == ::omnitrace::State::PreInit &&
|
||||
_data.tool_id.find("MPI_Init") == 0)
|
||||
{
|
||||
omnitrace_mpi_set_attr();
|
||||
@@ -136,19 +169,27 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
|
||||
// were excluded via a regex expression)
|
||||
if(get_use_mpip())
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] Activating MPI wrappers...\n", __FUNCTION__);
|
||||
comp::configure_mpip<tim::component_tuple<omnitrace_component>, omnitrace>();
|
||||
mpip_index = comp::activate_mpip<tim::component_tuple<omnitrace_component>,
|
||||
omnitrace>();
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(
|
||||
get_debug_env(), "[%s] Activating MPI wrappers...\n", __FUNCTION__);
|
||||
comp::configure_mpip<tim::component_tuple<omnitrace::component::omnitrace>,
|
||||
api::omnitrace>();
|
||||
mpip_index =
|
||||
comp::activate_mpip<tim::component_tuple<omnitrace::component::omnitrace>,
|
||||
api::omnitrace>();
|
||||
}
|
||||
omnitrace_push_trace(_data.tool_id.c_str());
|
||||
}
|
||||
else if(_retval == tim::mpi::success_v && _data.tool_id.find("MPI_Comm_") == 0)
|
||||
{
|
||||
/*if(_data.tool_id == "MPI_Comm_rank")
|
||||
if(_data.tool_id == "MPI_Comm_rank")
|
||||
{
|
||||
if(m_rank)
|
||||
tim::mpi::set_rank(*m_rank, m_comm);
|
||||
{
|
||||
tim::mpi::set_rank(*m_rank, *static_cast<comm_t*>(m_comm));
|
||||
tim::settings::default_process_suffix() = *m_rank;
|
||||
get_perfetto_output_filename().clear();
|
||||
(void) get_perfetto_output_filename();
|
||||
}
|
||||
else
|
||||
{
|
||||
OMNITRACE_PRINT("[%s] %s() returned %i :: nullptr to rank\n",
|
||||
@@ -158,7 +199,7 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
|
||||
else if(_data.tool_id == "MPI_Comm_size")
|
||||
{
|
||||
if(m_size)
|
||||
tim::mpi::set_size(*m_size, m_comm);
|
||||
tim::mpi::set_size(*m_size, *static_cast<comm_t*>(m_comm));
|
||||
else
|
||||
{
|
||||
OMNITRACE_PRINT("[%s] %s() returned %i :: nullptr to size\n",
|
||||
@@ -169,8 +210,9 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
|
||||
{
|
||||
OMNITRACE_PRINT("[%s] %s() returned %i :: unexpected function wrapper\n",
|
||||
__FUNCTION__, _data.tool_id.c_str(), (int) _retval);
|
||||
}*/
|
||||
}
|
||||
}
|
||||
}
|
||||
} // namespace omnitrace
|
||||
|
||||
TIMEMORY_INITIALIZE_STORAGE(mpi_gotcha)
|
||||
TIMEMORY_INITIALIZE_STORAGE(omnitrace::mpi_gotcha)
|
||||
@@ -26,25 +26,31 @@
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/omnitrace_component.hpp"
|
||||
#include "library/components/omnitrace.hpp"
|
||||
#include "library/api.hpp"
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace component
|
||||
{
|
||||
void
|
||||
omnitrace_component::start()
|
||||
omnitrace::start()
|
||||
{
|
||||
if(m_prefix) omnitrace_push_trace(m_prefix);
|
||||
}
|
||||
|
||||
void
|
||||
omnitrace_component::stop()
|
||||
omnitrace::stop()
|
||||
{
|
||||
if(m_prefix) omnitrace_pop_trace(m_prefix);
|
||||
}
|
||||
|
||||
void
|
||||
omnitrace_component::set_prefix(const char* _prefix)
|
||||
omnitrace::set_prefix(const char* _prefix)
|
||||
{
|
||||
m_prefix = _prefix;
|
||||
}
|
||||
} // namespace component
|
||||
} // namespace omnitrace
|
||||
|
||||
TIMEMORY_INITIALIZE_STORAGE(omnitrace_component)
|
||||
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::omnitrace)
|
||||
@@ -0,0 +1,151 @@
|
||||
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// with the Software without restriction, including without limitation the
|
||||
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
// sell copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// * Redistributions of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimers.
|
||||
//
|
||||
// * Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimers in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this Software without specific prior written permission.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/components/pthread_gotcha.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/sampling.hpp"
|
||||
|
||||
#include <timemory/sampling/allocator.hpp>
|
||||
#include <timemory/utility/types.hpp>
|
||||
|
||||
#include <pthread.h>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace sampling
|
||||
{
|
||||
std::set<int>
|
||||
setup();
|
||||
std::set<int>
|
||||
shutdown();
|
||||
} // namespace sampling
|
||||
|
||||
pthread_gotcha::wrapper::wrapper(routine_t _routine, void* _arg, bool _enable_sampling,
|
||||
promise_t* _p)
|
||||
: m_enable_sampling{ _enable_sampling }
|
||||
, m_routine{ _routine }
|
||||
, m_arg{ _arg }
|
||||
, m_promise{ _p }
|
||||
{}
|
||||
|
||||
void*
|
||||
pthread_gotcha::wrapper::operator()() const
|
||||
{
|
||||
std::set<int> _signals{};
|
||||
auto& _enable_sampling = pthread_gotcha::enable_sampling_on_child_threads();
|
||||
if(m_enable_sampling && _enable_sampling)
|
||||
{
|
||||
_enable_sampling = false;
|
||||
_signals = sampling::setup();
|
||||
_enable_sampling = true;
|
||||
sampling::unblock_signals();
|
||||
}
|
||||
|
||||
if(m_promise) m_promise->set_value();
|
||||
|
||||
// execute the original function
|
||||
auto* _ret = m_routine(m_arg);
|
||||
|
||||
if(m_enable_sampling && _enable_sampling)
|
||||
{
|
||||
sampling::block_signals(_signals);
|
||||
sampling::shutdown();
|
||||
}
|
||||
|
||||
return _ret;
|
||||
}
|
||||
|
||||
void*
|
||||
pthread_gotcha::wrapper::wrap(void* _arg)
|
||||
{
|
||||
if(_arg == nullptr) return nullptr;
|
||||
|
||||
// convert the argument
|
||||
wrapper* _wrapper = static_cast<wrapper*>(_arg);
|
||||
|
||||
// execute the original function
|
||||
return (*_wrapper)();
|
||||
}
|
||||
|
||||
void
|
||||
pthread_gotcha::configure()
|
||||
{
|
||||
pthread_gotcha_t::get_initializer() = []() {
|
||||
TIMEMORY_C_GOTCHA(pthread_gotcha_t, 0, pthread_create);
|
||||
};
|
||||
}
|
||||
|
||||
bool&
|
||||
pthread_gotcha::enable_sampling_on_child_threads()
|
||||
{
|
||||
static thread_local bool _v = get_use_sampling();
|
||||
return _v;
|
||||
}
|
||||
|
||||
// pthread_create
|
||||
int
|
||||
pthread_gotcha::operator()(pthread_t* thread, const pthread_attr_t* attr,
|
||||
void* (*start_routine)(void*), void* arg) const
|
||||
{
|
||||
auto _enable_sampling = enable_sampling_on_child_threads();
|
||||
|
||||
if(!_enable_sampling)
|
||||
{
|
||||
auto* _obj = new wrapper(start_routine, arg, _enable_sampling, nullptr);
|
||||
// create the thread
|
||||
return pthread_create(thread, attr, &wrapper::wrap, static_cast<void*>(_obj));
|
||||
}
|
||||
|
||||
// block the signals in entire process
|
||||
OMNITRACE_DEBUG("blocking signals...\n");
|
||||
tim::sampling::block_signals({ SIGALRM, SIGPROF },
|
||||
tim::sampling::sigmask_scope::process);
|
||||
|
||||
// promise set by thread when signal handler is configured
|
||||
auto _promise = std::promise<void>{};
|
||||
auto _fut = _promise.get_future();
|
||||
auto* _obj = new wrapper(start_routine, arg, _enable_sampling, &_promise);
|
||||
|
||||
// create the thread
|
||||
auto _ret = pthread_create(thread, attr, &wrapper::wrap, static_cast<void*>(_obj));
|
||||
|
||||
// wait for thread to set promise
|
||||
OMNITRACE_DEBUG("waiting for child to signal it is setup...\n");
|
||||
_fut.wait();
|
||||
|
||||
// unblock the signals in the entire process
|
||||
OMNITRACE_DEBUG("unblocking signals...\n");
|
||||
tim::sampling::unblock_signals({ SIGALRM, SIGPROF },
|
||||
tim::sampling::sigmask_scope::process);
|
||||
|
||||
OMNITRACE_DEBUG("returning success...\n");
|
||||
return _ret;
|
||||
}
|
||||
|
||||
} // namespace omnitrace
|
||||
@@ -26,12 +26,14 @@
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/roctracer.hpp"
|
||||
#include "library/components/roctracer.hpp"
|
||||
#include "library/components/roctracer_callbacks.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/roctracer_callbacks.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
|
||||
using namespace omnitrace;
|
||||
|
||||
namespace tim
|
||||
{
|
||||
namespace component
|
||||
+16
-5
@@ -26,7 +26,7 @@
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/roctracer_callbacks.hpp"
|
||||
#include "library/components/roctracer_callbacks.hpp"
|
||||
#include "library.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/critical_trace.hpp"
|
||||
@@ -36,6 +36,8 @@
|
||||
#include <cstdint>
|
||||
|
||||
TIMEMORY_DEFINE_API(roctracer)
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace api = tim::api;
|
||||
|
||||
std::unordered_set<uint64_t>&
|
||||
@@ -364,9 +366,17 @@ hip_api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void*
|
||||
}
|
||||
if(get_use_timemory())
|
||||
{
|
||||
get_roctracer_hip_data()->emplace(
|
||||
data->correlation_id,
|
||||
roctracer_bundle_t{ op_name, quirk::config<quirk::auto_start>{} });
|
||||
auto itr = get_roctracer_hip_data()->emplace(data->correlation_id,
|
||||
roctracer_bundle_t{ op_name });
|
||||
if(itr.second)
|
||||
{
|
||||
itr.first->second.start();
|
||||
}
|
||||
else if(itr.first != get_roctracer_hip_data()->end())
|
||||
{
|
||||
itr.first->second.stop();
|
||||
get_roctracer_hip_data()->erase(itr.first);
|
||||
}
|
||||
}
|
||||
if(get_use_critical_trace())
|
||||
{
|
||||
@@ -403,7 +413,7 @@ hip_api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void*
|
||||
auto itr = _data->find(data->correlation_id);
|
||||
if(itr != get_roctracer_hip_data()->end())
|
||||
{
|
||||
itr->second.stop().pop();
|
||||
itr->second.stop();
|
||||
_data->erase(itr);
|
||||
return true;
|
||||
}
|
||||
@@ -597,3 +607,4 @@ roctracer_tear_down_routines()
|
||||
static auto _v = roctracer_functions_t{};
|
||||
return _v;
|
||||
}
|
||||
} // namespace omnitrace
|
||||
+198
-69
@@ -28,11 +28,16 @@
|
||||
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
#include "timemory/backends/dmp.hpp"
|
||||
#include "timemory/backends/process.hpp"
|
||||
#include "timemory/settings/types.hpp"
|
||||
#include "timemory/utility/argparse.hpp"
|
||||
|
||||
#include <timemory/backends/dmp.hpp>
|
||||
#include <timemory/backends/mpi.hpp>
|
||||
#include <timemory/backends/process.hpp>
|
||||
#include <timemory/environment.hpp>
|
||||
#include <timemory/settings.hpp>
|
||||
#include <timemory/settings/types.hpp>
|
||||
#include <timemory/utility/argparse.hpp>
|
||||
|
||||
#include <array>
|
||||
#include <cstdint>
|
||||
@@ -40,9 +45,9 @@
|
||||
#include <numeric>
|
||||
#include <ostream>
|
||||
#include <string>
|
||||
#include <timemory/environment.hpp>
|
||||
#include <timemory/settings.hpp>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
using settings = tim::settings;
|
||||
|
||||
namespace
|
||||
@@ -55,9 +60,10 @@ get_config()
|
||||
(void) _once;
|
||||
}
|
||||
|
||||
#define OMNITRACE_CONFIG_SETTING(TYPE, ENV_NAME, DESCRIPTION, INITIAL_VALUE) \
|
||||
_config->insert<TYPE, TYPE>(ENV_NAME, ENV_NAME, DESCRIPTION, INITIAL_VALUE, \
|
||||
std::vector<std::string>{})
|
||||
#define OMNITRACE_CONFIG_SETTING(TYPE, ENV_NAME, DESCRIPTION, INITIAL_VALUE, ...) \
|
||||
_config->insert<TYPE, TYPE>( \
|
||||
ENV_NAME, ENV_NAME, DESCRIPTION, INITIAL_VALUE, \
|
||||
std::set<std::string>{ "custom", "omnitrace", __VA_ARGS__ })
|
||||
} // namespace
|
||||
|
||||
void
|
||||
@@ -81,28 +87,49 @@ configure_settings()
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_CONFIG_FILE",
|
||||
"Configuration file of omnitrace and timemory settings",
|
||||
_default_config_file);
|
||||
_default_config_file, "config");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_DEBUG", "Enable debugging output",
|
||||
_config->get_debug());
|
||||
_config->get_debug(), "debugging");
|
||||
|
||||
auto _omnitrace_debug = _config->get<bool>("OMNITRACE_DEBUG");
|
||||
if(_omnitrace_debug) tim::set_env("TIMEMORY_DEBUG_SETTINGS", "1", 0);
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_USE_PERFETTO", "Enable perfetto backend",
|
||||
_default_perfetto_v);
|
||||
_default_perfetto_v, "backend", "perfetto");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_USE_TIMEMORY", "Enable timemory backend",
|
||||
!_config->get<bool>("OMNITRACE_USE_PERFETTO"));
|
||||
!_config->get<bool>("OMNITRACE_USE_PERFETTO"), "backend",
|
||||
"timemory");
|
||||
|
||||
#if defined(OMNITRACE_USE_ROCTRACER)
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_USE_ROCTRACER", "Enable ROCM tracing", true,
|
||||
"backend", "roctracer");
|
||||
#endif
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_USE_SAMPLING",
|
||||
"Enable statistical sampling of call-stack", false,
|
||||
"backend", "sampling");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
bool, "OMNITRACE_USE_PID",
|
||||
"Enable tagging filenames with process identifier (either MPI rank or pid)",
|
||||
true);
|
||||
"Enable tagging filenames with process identifier (either MPI rank or pid)", true,
|
||||
"io");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(size_t, "OMNITRACE_INSTRUMENTATION_INTERVAL",
|
||||
"Instrumentation only takes measurements once every N "
|
||||
"function calls (not statistical)",
|
||||
1, "instrumentation");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
size_t, "OMNITRACE_SAMPLE_RATE",
|
||||
"Counts every function call (N), only record function if (N % <VALUE> == 0)", 1);
|
||||
double, "OMNITRACE_SAMPLING_FREQ",
|
||||
"Number of software interrupts per second when OMNITTRACE_USE_SAMPLING=ON", 10.0,
|
||||
"sampling");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
double, "OMNITRACE_SAMPLING_DELAY",
|
||||
"Number of seconds to delay activating the statistical sampling", 0.05,
|
||||
"sampling");
|
||||
|
||||
auto _backend = tim::get_env_choice<std::string>(
|
||||
"OMNITRACE_BACKEND",
|
||||
@@ -114,71 +141,84 @@ configure_settings()
|
||||
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_BACKEND",
|
||||
"Specify the perfetto backend to activate. Options are: "
|
||||
"'inprocess', 'system', or 'all'",
|
||||
_backend);
|
||||
_backend, "perfetto");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_CRITICAL_TRACE",
|
||||
"Enable generation of the critical trace", false);
|
||||
"Enable generation of the critical trace", false, "feature");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_FLAT_SAMPLING",
|
||||
"Ignore hierarchy in all statistical sampling entries",
|
||||
_config->get_flat_profile(), "sampling", "data_layout");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
bool, "OMNITRACE_ROCTRACER_TIMELINE_PROFILE",
|
||||
"Create unique entries for every kernel with timemory backend",
|
||||
_config->get_timeline_profile());
|
||||
bool, "OMNITRACE_TIMELINE_SAMPLING",
|
||||
"Create unique entries for every sample when statistical sampling is enabled",
|
||||
_config->get_timeline_profile(), "sampling", "data_layout");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
bool, "OMNITRACE_ROCTRACER_FLAT_PROFILE",
|
||||
"Ignore hierarchy in all kernels entries with timemory backend",
|
||||
_config->get_flat_profile());
|
||||
_config->get_flat_profile(), "roctracer", "data_layout");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
bool, "OMNITRACE_ROCTRACER_TIMELINE_PROFILE",
|
||||
"Create unique entries for every kernel with timemory backend",
|
||||
_config->get_timeline_profile(), "roctracer", "data_layout");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_ROCTRACER_HSA_ACTIVITY",
|
||||
"Enable HSA activity tracing support", false);
|
||||
"Enable HSA activity tracing support", false, "roctracer");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_ROCTRACER_HSA_API",
|
||||
"Enable HSA API tracing support", false);
|
||||
"Enable HSA API tracing support", false, "roctracer");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_ROCTRACER_HSA_API_TYPES",
|
||||
"HSA API type to collect", "");
|
||||
"HSA API type to collect", "", "roctracer");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_CRITICAL_TRACE_DEBUG",
|
||||
"Enable debugging for critical trace", _omnitrace_debug);
|
||||
"Enable debugging for critical trace", _omnitrace_debug,
|
||||
"debugging");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
bool, "OMNITRACE_CRITICAL_TRACE_SERIALIZE_NAMES",
|
||||
"Include names in serialization of critical trace (mainly for debugging)",
|
||||
_omnitrace_debug);
|
||||
_omnitrace_debug, "debugging");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(size_t, "OMNITRACE_SHMEM_SIZE_HINT_KB",
|
||||
"Hint for shared-memory buffer size in perfetto (in KB)",
|
||||
40960);
|
||||
40960, "perfetto", "data");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(size_t, "OMNITRACE_BUFFER_SIZE_KB",
|
||||
"Size of perfetto buffer (in KB)", 1024000);
|
||||
"Size of perfetto buffer (in KB)", 1024000, "perfetto",
|
||||
"data");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(int64_t, "OMNITRACE_CRITICAL_TRACE_COUNT",
|
||||
"Number of critical trace to export (0 == all)", 0);
|
||||
"Number of critical trace to export (0 == all)", 0, "data");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(uint64_t, "OMNITRACE_CRITICAL_TRACE_BUFFER_COUNT",
|
||||
"Number of critical trace records to store in thread-local "
|
||||
"memory before submitting to shared buffer",
|
||||
2000);
|
||||
2000, "data");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
uint64_t, "OMNITRACE_CRITICAL_TRACE_NUM_THREADS",
|
||||
"Number of threads to use when generating the critical trace",
|
||||
std::min<uint64_t>(8, std::thread::hardware_concurrency()));
|
||||
std::min<uint64_t>(8, std::thread::hardware_concurrency()), "parallelism");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
int64_t, "OMNITRACE_CRITICAL_TRACE_PER_ROW",
|
||||
"How many critical traces per row in perfetto (0 == all in one row)", 0);
|
||||
"How many critical traces per row in perfetto (0 == all in one row)", 0, "io");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
std::string, "OMNITRACE_COMPONENTS",
|
||||
"List of components to collect via timemory (see timemory-avail)", "wall_clock");
|
||||
std::string, "OMNITRACE_TIMEMORY_COMPONENTS",
|
||||
"List of components to collect via timemory (see timemory-avail)", "wall_clock",
|
||||
"timemory", "component");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_OUTPUT_FILE", "Perfetto filename",
|
||||
"");
|
||||
"", "perfetto", "io");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_SETTINGS_DESC",
|
||||
"Provide descriptions when printing settings", false);
|
||||
"Provide descriptions when printing settings", false,
|
||||
"debugging");
|
||||
|
||||
_config->get_flamegraph_output() = false;
|
||||
_config->get_cout_output() = false;
|
||||
@@ -199,7 +239,8 @@ configure_settings()
|
||||
_config->read(itr);
|
||||
}
|
||||
|
||||
_config->get_global_components() = _config->get<std::string>("OMNITRACE_COMPONENTS");
|
||||
_config->get_global_components() =
|
||||
_config->get<std::string>("OMNITRACE_TIMEMORY_COMPONENTS");
|
||||
|
||||
// always initialize timemory because gotcha wrappers are always used
|
||||
auto _cmd = tim::read_command_line(process::get_id());
|
||||
@@ -225,14 +266,23 @@ configure_settings()
|
||||
settings::suppress_parsing() = true;
|
||||
settings::suppress_config() = true;
|
||||
settings::use_output_suffix() = _config->get<bool>("OMNITRACE_USE_PID");
|
||||
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
|
||||
if(tim::mpi::is_initialized()) settings::default_process_suffix() = tim::mpi::rank();
|
||||
#endif
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(true, "configuration complete\n");
|
||||
}
|
||||
|
||||
void
|
||||
print_config_settings(std::ostream& _os,
|
||||
std::function<bool(const std::string_view&)>&& _filter)
|
||||
print_config_settings(
|
||||
std::ostream& _os,
|
||||
std::function<bool(const std::string_view&, const std::set<std::string>&)>&& _filter)
|
||||
{
|
||||
OMNITRACE_CONDITIONAL_BASIC_PRINT(true, "configuration:\n");
|
||||
|
||||
auto _flags = _os.flags();
|
||||
|
||||
bool _md = tim::get_env<bool>("OMNITRACE_SETTINGS_DESC_MARKDOWN", false);
|
||||
|
||||
constexpr size_t nfields = 3;
|
||||
using str_array_t = std::array<std::string, nfields>;
|
||||
std::vector<str_array_t> _data{};
|
||||
@@ -240,14 +290,17 @@ print_config_settings(std::ostream& _os,
|
||||
_widths.fill(0);
|
||||
for(const auto& itr : *get_config())
|
||||
{
|
||||
if(_filter(itr.first))
|
||||
if(_filter(itr.first, itr.second->get_categories()))
|
||||
{
|
||||
auto _disp = itr.second->get_display(std::ios::boolalpha);
|
||||
_data.emplace_back(str_array_t{ _disp.at("name"), _disp.at("value"),
|
||||
_data.emplace_back(str_array_t{ _disp.at("env_name"), _disp.at("value"),
|
||||
_disp.at("description") });
|
||||
for(size_t i = 0; i < nfields; ++i)
|
||||
_widths.at(i) =
|
||||
std::max<size_t>(_widths.at(i), _data.back().at(i).length());
|
||||
{
|
||||
size_t _wextra = (_md && i < 2) ? 2 : 0;
|
||||
_widths.at(i) = std::max<size_t>(_widths.at(i),
|
||||
_data.back().at(i).length() + _wextra);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -261,10 +314,8 @@ print_config_settings(std::ostream& _os,
|
||||
auto _rhs_use = rhs.at(0).find("OMNITRACE_USE_");
|
||||
if(_lhs_use != _rhs_use && _lhs_use < _rhs_use) return true;
|
||||
if(_lhs_use != _rhs_use && _lhs_use > _rhs_use) return false;
|
||||
// length sort followed by alphabetical sort
|
||||
return (lhs.at(0).length() == rhs.at(0).length())
|
||||
? (lhs.at(0) < rhs.at(0))
|
||||
: (lhs.at(0).length() < rhs.at(0).length());
|
||||
// alphabetical sort
|
||||
return lhs.at(0) < rhs.at(0);
|
||||
});
|
||||
|
||||
bool _print_desc = get_debug() || get_config()->get<bool>("OMNITRACE_SETTINGS_DESC");
|
||||
@@ -272,15 +323,20 @@ print_config_settings(std::ostream& _os,
|
||||
auto tot_width = std::accumulate(_widths.begin(), _widths.end(), 0);
|
||||
if(!_print_desc) tot_width -= _widths.back() + 4;
|
||||
|
||||
size_t _spacer_extra = 9;
|
||||
if(!_md)
|
||||
_spacer_extra += 2;
|
||||
else if(_md && _print_desc)
|
||||
_spacer_extra -= 1;
|
||||
std::stringstream _spacer{};
|
||||
_spacer.fill('-');
|
||||
_spacer << "#" << std::setw(tot_width + 11) << ""
|
||||
_spacer << "#" << std::setw(tot_width + _spacer_extra) << ""
|
||||
<< "#";
|
||||
_os << _spacer.str() << "\n";
|
||||
// _os << "# Omnitrace settings:" << std::setw(tot_width - 8) << "#" << "\n";
|
||||
// _os << "# api::omnitrace settings:" << std::setw(tot_width - 8) << "#" << "\n";
|
||||
for(const auto& itr : _data)
|
||||
{
|
||||
_os << "# ";
|
||||
_os << ((_md) ? "| " : "# ");
|
||||
for(size_t i = 0; i < nfields; ++i)
|
||||
{
|
||||
switch(i)
|
||||
@@ -289,16 +345,28 @@ print_config_settings(std::ostream& _os,
|
||||
case 1: _os << std::left; break;
|
||||
case 2: _os << std::left; break;
|
||||
}
|
||||
_os << std::setw(_widths.at(i)) << itr.at(i) << " ";
|
||||
if(!_print_desc && i == 1) break;
|
||||
switch(i)
|
||||
if(_md)
|
||||
{
|
||||
case 0: _os << "= "; break;
|
||||
case 1: _os << "[ "; break;
|
||||
case 2: _os << "]"; break;
|
||||
std::stringstream _ss{};
|
||||
_ss.setf(_os.flags());
|
||||
std::string _extra = (i < 2) ? "`" : "";
|
||||
_ss << _extra << itr.at(i) << _extra;
|
||||
_os << std::setw(_widths.at(i)) << _ss.str() << " | ";
|
||||
if(!_print_desc && i == 1) break;
|
||||
}
|
||||
else
|
||||
{
|
||||
_os << std::setw(_widths.at(i)) << itr.at(i) << " ";
|
||||
if(!_print_desc && i == 1) break;
|
||||
switch(i)
|
||||
{
|
||||
case 0: _os << "= "; break;
|
||||
case 1: _os << "[ "; break;
|
||||
case 2: _os << "]"; break;
|
||||
}
|
||||
}
|
||||
}
|
||||
_os << " #\n";
|
||||
_os << ((_md) ? "\n" : " #\n");
|
||||
}
|
||||
_os << _spacer.str() << "\n";
|
||||
|
||||
@@ -319,6 +387,12 @@ get_config_file()
|
||||
return static_cast<tim::tsettings<std::string>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
get_debug_env()
|
||||
{
|
||||
return tim::get_env<bool>("OMNITRACE_DEBUG", false);
|
||||
}
|
||||
|
||||
bool
|
||||
get_debug()
|
||||
{
|
||||
@@ -326,20 +400,47 @@ get_debug()
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
bool&
|
||||
get_use_perfetto()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_USE_PERFETTO");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
bool&
|
||||
get_use_timemory()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_USE_TIMEMORY");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool&
|
||||
get_use_roctracer()
|
||||
{
|
||||
#if defined(OMNITRACE_USE_ROCTRACER)
|
||||
static auto _v = get_config()->find("OMNITRACE_USE_ROCTRACER");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
#else
|
||||
static auto _v = false;
|
||||
return _v;
|
||||
#endif
|
||||
}
|
||||
|
||||
bool&
|
||||
get_use_sampling()
|
||||
{
|
||||
#if defined(TIMEMORY_USE_LIBUNWIND)
|
||||
static auto _v = get_config()->find("OMNITRACE_USE_SAMPLING");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
#else
|
||||
static bool _v = false;
|
||||
if(_v)
|
||||
throw std::runtime_error("Error! sampling was enabled but omnitrace was not "
|
||||
"built with libunwind support");
|
||||
return _v;
|
||||
#endif
|
||||
}
|
||||
|
||||
bool&
|
||||
get_use_pid()
|
||||
{
|
||||
@@ -347,14 +448,14 @@ get_use_pid()
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
bool&
|
||||
get_use_mpip()
|
||||
{
|
||||
static bool _v = tim::get_env("OMNITRACE_USE_MPIP", false, false);
|
||||
return _v;
|
||||
}
|
||||
|
||||
bool
|
||||
bool&
|
||||
get_use_critical_trace()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_CRITICAL_TRACE");
|
||||
@@ -375,6 +476,20 @@ get_critical_trace_serialize_names()
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
get_timeline_sampling()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_TIMELINE_SAMPLING");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
get_flat_sampling()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_FLAT_SAMPLING");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
get_roctracer_timeline_profile()
|
||||
{
|
||||
@@ -456,7 +571,7 @@ get_backend()
|
||||
return static_cast<tim::tsettings<std::string>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
std::string
|
||||
std::string&
|
||||
get_perfetto_output_filename()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_OUTPUT_FILE");
|
||||
@@ -464,9 +579,8 @@ get_perfetto_output_filename()
|
||||
if(_t.get().empty())
|
||||
{
|
||||
// default name: perfetto-trace.<pid>.proto or perfetto-trace.<rank>.proto
|
||||
auto _default_fname = settings::compose_output_filename(
|
||||
"perfetto-trace", "proto", get_use_pid(),
|
||||
(tim::dmp::is_initialized()) ? tim::dmp::rank() : process::get_id());
|
||||
auto _default_fname =
|
||||
settings::compose_output_filename("perfetto-trace", "proto", get_use_pid());
|
||||
auto _pid_patch = std::string{ "/" } + std::to_string(tim::process::get_id()) +
|
||||
"-perfetto-trace";
|
||||
auto _dpos = _default_fname.find(_pid_patch);
|
||||
@@ -483,12 +597,26 @@ get_perfetto_output_filename()
|
||||
}
|
||||
|
||||
size_t&
|
||||
get_sample_rate()
|
||||
get_instrumentation_interval()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLE_RATE");
|
||||
static auto _v = get_config()->find("OMNITRACE_INSTRUMENTATION_INTERVAL");
|
||||
return static_cast<tim::tsettings<size_t>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
double&
|
||||
get_sampling_freq()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_FREQ");
|
||||
return static_cast<tim::tsettings<double>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
double&
|
||||
get_sampling_delay()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_DELAY");
|
||||
return static_cast<tim::tsettings<double>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
int64_t
|
||||
get_critical_trace_count()
|
||||
{
|
||||
@@ -526,3 +654,4 @@ get_cpu_cid_stack(int64_t _tid)
|
||||
return _v.at(_tid);
|
||||
(void) _v_check;
|
||||
}
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -27,18 +27,20 @@
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/critical_trace.hpp"
|
||||
#include "PTL/ThreadPool.hh"
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/perfetto.hpp"
|
||||
#include "library/ptl.hpp"
|
||||
#include "timemory/backends/dmp.hpp"
|
||||
#include "timemory/hash/types.hpp"
|
||||
#include "timemory/tpls/cereal/cereal/archives/json.hpp"
|
||||
#include "timemory/tpls/cereal/cereal/cereal.hpp"
|
||||
#include "timemory/utility/macros.hpp"
|
||||
#include "timemory/utility/types.hpp"
|
||||
#include "timemory/utility/utility.hpp"
|
||||
|
||||
#include <PTL/ThreadPool.hh>
|
||||
#include <timemory/backends/dmp.hpp>
|
||||
#include <timemory/hash/types.hpp>
|
||||
#include <timemory/tpls/cereal/cereal/archives/json.hpp>
|
||||
#include <timemory/tpls/cereal/cereal/cereal.hpp>
|
||||
#include <timemory/utility/macros.hpp>
|
||||
#include <timemory/utility/types.hpp>
|
||||
#include <timemory/utility/utility.hpp>
|
||||
|
||||
#include <cctype>
|
||||
#include <cstdint>
|
||||
@@ -47,6 +49,8 @@
|
||||
#include <stdexcept>
|
||||
#include <utility>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace critical_trace
|
||||
{
|
||||
namespace
|
||||
@@ -1165,7 +1169,7 @@ compute_critical_trace()
|
||||
try
|
||||
{
|
||||
PTL::ThreadPool _tp{ get_critical_trace_num_threads(), []() { copy_hash_ids(); },
|
||||
false };
|
||||
[]() {} };
|
||||
_tp.set_verbose(-1);
|
||||
PTL::TaskGroup<void> _tg{ &_tp };
|
||||
|
||||
@@ -1191,7 +1195,7 @@ compute_critical_trace()
|
||||
OMNITRACE_CT_DEBUG("%s\n", JOIN("", _perf).c_str());
|
||||
|
||||
_tg.run(
|
||||
[](call_chain _chain, std::string _func) {
|
||||
[](call_chain _chain, std::string _func) { // NOLINT
|
||||
save_call_chain_json(tim::settings::compose_output_filename(
|
||||
"call-chain", ".json", get_use_pid(),
|
||||
(tim::dmp::is_initialized())
|
||||
@@ -1298,3 +1302,4 @@ compute_critical_trace()
|
||||
}
|
||||
} // namespace
|
||||
} // namespace critical_trace
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -0,0 +1,51 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#include "library/gpu.hpp"
|
||||
|
||||
#if defined(OMNITRACE_USE_HIP)
|
||||
# if !defined(TIMEMORY_USE_HIP)
|
||||
# define TIMEMORY_USE_HIP 1
|
||||
# endif
|
||||
|
||||
# include "timemory/components/hip/backends.hpp"
|
||||
#endif
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace gpu
|
||||
{
|
||||
#if defined(OMNITRACE_USE_HIP)
|
||||
int
|
||||
device_count()
|
||||
{
|
||||
return ::tim::hip::device_count();
|
||||
}
|
||||
#else
|
||||
int
|
||||
device_count()
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
#endif
|
||||
} // namespace gpu
|
||||
} // namespace omnitrace
|
||||
+26
-7
@@ -27,9 +27,32 @@
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/ptl.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <PTL/ThreadPool.hh>
|
||||
#include <timemory/utility/declaration.hpp>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace tasking
|
||||
{
|
||||
namespace
|
||||
{
|
||||
auto _thread_pool_cfg = []() {
|
||||
PTL::ThreadPool::Config _v{};
|
||||
_v.init = true;
|
||||
_v.use_affinity = false;
|
||||
_v.use_tbb = false;
|
||||
_v.initializer = []() {};
|
||||
_v.finalizer = []() {};
|
||||
_v.priority = 5;
|
||||
_v.pool_size = 1;
|
||||
return _v;
|
||||
}();
|
||||
}
|
||||
|
||||
std::mutex&
|
||||
get_roctracer_mutex()
|
||||
{
|
||||
@@ -40,7 +63,7 @@ get_roctracer_mutex()
|
||||
PTL::ThreadPool&
|
||||
get_roctracer_thread_pool()
|
||||
{
|
||||
static auto _v = PTL::ThreadPool{ 1 };
|
||||
static auto _v = PTL::ThreadPool{ _thread_pool_cfg };
|
||||
return _v;
|
||||
}
|
||||
|
||||
@@ -61,7 +84,7 @@ get_critical_trace_mutex()
|
||||
PTL::ThreadPool&
|
||||
get_critical_trace_thread_pool()
|
||||
{
|
||||
static auto _v = PTL::ThreadPool{ 1 };
|
||||
static auto _v = PTL::ThreadPool{ _thread_pool_cfg };
|
||||
return _v;
|
||||
}
|
||||
|
||||
@@ -72,9 +95,5 @@ get_critical_trace_task_group()
|
||||
return _v;
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
bool _ptl_initialized =
|
||||
(get_roctracer_thread_pool(), get_critical_trace_thread_pool(), true);
|
||||
}
|
||||
} // namespace tasking
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -0,0 +1,219 @@
|
||||
// Copyright (c) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// with the Software without restriction, including without limitation the
|
||||
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
// sell copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// * Redistributions of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimers.
|
||||
//
|
||||
// * Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimers in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// * Neither the names of Advanced Micro Devices, Inc. nor the names of its
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this Software without specific prior written permission.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
|
||||
// THE SOFTWARE.
|
||||
|
||||
#include "library/sampling.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/ptl.hpp"
|
||||
|
||||
#include <timemory/backends/papi.hpp>
|
||||
#include <timemory/backends/threading.hpp>
|
||||
#include <timemory/components/data_tracker/components.hpp>
|
||||
#include <timemory/components/macros.hpp>
|
||||
#include <timemory/components/papi/extern.hpp>
|
||||
#include <timemory/components/papi/papi_array.hpp>
|
||||
#include <timemory/components/papi/papi_vector.hpp>
|
||||
#include <timemory/components/timing/backends.hpp>
|
||||
#include <timemory/components/trip_count/extern.hpp>
|
||||
#include <timemory/macros.hpp>
|
||||
#include <timemory/math.hpp>
|
||||
#include <timemory/mpl.hpp>
|
||||
#include <timemory/mpl/quirks.hpp>
|
||||
#include <timemory/mpl/type_traits.hpp>
|
||||
#include <timemory/operations.hpp>
|
||||
#include <timemory/sampling/allocator.hpp>
|
||||
#include <timemory/sampling/sampler.hpp>
|
||||
#include <timemory/storage.hpp>
|
||||
#include <timemory/utility/backtrace.hpp>
|
||||
#include <timemory/utility/demangle.hpp>
|
||||
#include <timemory/utility/types.hpp>
|
||||
#include <timemory/variadic.hpp>
|
||||
|
||||
#include <array>
|
||||
#include <cstring>
|
||||
#include <ctime>
|
||||
#include <initializer_list>
|
||||
#include <mutex>
|
||||
#include <regex>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
#include <type_traits>
|
||||
|
||||
#include <pthread.h>
|
||||
#include <signal.h>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace sampling
|
||||
{
|
||||
using bundle_t = tim::lightweight_tuple<backtrace>;
|
||||
using sampler_t = tim::sampling::sampler<bundle_t, tim::sampling::dynamic>;
|
||||
} // namespace sampling
|
||||
} // namespace omnitrace
|
||||
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(check_signals, omnitrace::sampling::sampler_t,
|
||||
std::false_type)
|
||||
|
||||
TIMEMORY_DEFINE_CONCRETE_TRAIT(buffer_size, omnitrace::sampling::sampler_t,
|
||||
TIMEMORY_ESC(std::integral_constant<size_t, 256>))
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace sampling
|
||||
{
|
||||
using signal_type_instances = omnitrace_thread_data<std::set<int>, api::sampling>;
|
||||
using backtrace_init_instances = omnitrace_thread_data<backtrace, api::sampling>;
|
||||
using sampler_running_instances = omnitrace_thread_data<bool, api::sampling>;
|
||||
using papi_vector_instances = omnitrace_thread_data<comp::papi_vector, api::sampling>;
|
||||
|
||||
namespace
|
||||
{
|
||||
std::unique_ptr<comp::papi_vector>&
|
||||
get_papi_vector(int64_t _tid)
|
||||
{
|
||||
static auto& _v = papi_vector_instances::instances();
|
||||
if(_tid == threading::get_id()) papi_vector_instances::construct();
|
||||
return _v.at(_tid);
|
||||
}
|
||||
|
||||
std::unique_ptr<backtrace>&
|
||||
get_backtrace_init(int64_t _tid)
|
||||
{
|
||||
static auto& _v = backtrace_init_instances::instances();
|
||||
return _v.at(_tid);
|
||||
}
|
||||
|
||||
std::unique_ptr<bool>&
|
||||
get_sampler_running(int64_t _tid)
|
||||
{
|
||||
static auto& _v = sampler_running_instances::instances();
|
||||
return _v.at(_tid);
|
||||
}
|
||||
|
||||
std::unique_ptr<std::set<int>>&
|
||||
get_signal_types(int64_t _tid)
|
||||
{
|
||||
static auto& _v = signal_type_instances::instances();
|
||||
// on the main thread, use both SIGALRM and SIGPROF.
|
||||
// on secondary threads, only use SIGPROF.
|
||||
signal_type_instances::construct((_tid == 0) ? std::set<int>{ SIGALRM, SIGPROF }
|
||||
: std::set<int>{ SIGPROF });
|
||||
return _v.at(_tid);
|
||||
}
|
||||
|
||||
template <typename... Args>
|
||||
void
|
||||
thread_sigmask(Args... _args)
|
||||
{
|
||||
auto _err = pthread_sigmask(_args...);
|
||||
if(_err != 0)
|
||||
{
|
||||
errno = _err;
|
||||
perror("pthread_sigmask");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
}
|
||||
|
||||
template <typename Tp>
|
||||
sigset_t
|
||||
get_signal_set(Tp&& _v)
|
||||
{
|
||||
sigset_t _sigset;
|
||||
sigemptyset(&_sigset);
|
||||
for(auto itr : _v)
|
||||
sigaddset(&_sigset, itr);
|
||||
return _sigset;
|
||||
}
|
||||
|
||||
template <typename Tp>
|
||||
std::string
|
||||
get_signal_names(Tp&& _v)
|
||||
{
|
||||
std::string _sig_names{};
|
||||
for(auto&& itr : _v)
|
||||
_sig_names += std::get<0>(tim::signal_settings::get_info(
|
||||
static_cast<tim::sys_signal>(itr))) +
|
||||
" ";
|
||||
return _sig_names.substr(0, _sig_names.length() - 1);
|
||||
}
|
||||
} // namespace
|
||||
|
||||
std::set<int>
|
||||
setup()
|
||||
{
|
||||
return backtrace::configure(true);
|
||||
}
|
||||
|
||||
std::set<int>
|
||||
shutdown()
|
||||
{
|
||||
return backtrace::configure(false);
|
||||
}
|
||||
|
||||
void
|
||||
block_signals(std::set<int> _signals)
|
||||
{
|
||||
if(_signals.empty()) _signals = *get_signal_types(threading::get_id());
|
||||
if(_signals.empty())
|
||||
{
|
||||
OMNITRACE_PRINT("No signals to block...\n");
|
||||
return;
|
||||
}
|
||||
|
||||
OMNITRACE_DEBUG("Blocking signals [%s] on thread #%lu...\n",
|
||||
get_signal_names(_signals).c_str(), threading::get_id());
|
||||
|
||||
sigset_t _v = get_signal_set(_signals);
|
||||
thread_sigmask(SIG_BLOCK, &_v, nullptr);
|
||||
}
|
||||
|
||||
void
|
||||
unblock_signals(std::set<int> _signals)
|
||||
{
|
||||
if(_signals.empty()) _signals = *get_signal_types(threading::get_id());
|
||||
if(_signals.empty())
|
||||
{
|
||||
OMNITRACE_PRINT("No signals to unblock...\n");
|
||||
return;
|
||||
}
|
||||
|
||||
OMNITRACE_DEBUG("Unblocking signals [%s] on thread #%lu...\n",
|
||||
get_signal_names(_signals).c_str(), threading::get_id());
|
||||
|
||||
sigset_t _v = get_signal_set(_signals);
|
||||
thread_sigmask(SIG_UNBLOCK, &_v, nullptr);
|
||||
}
|
||||
|
||||
std::unique_ptr<sampler_t>&
|
||||
get_sampler(int64_t _tid)
|
||||
{
|
||||
static auto& _v = sampler_instances::instances();
|
||||
return _v.at(_tid);
|
||||
}
|
||||
} // namespace sampling
|
||||
} // namespace omnitrace
|
||||
@@ -28,9 +28,12 @@
|
||||
|
||||
#include "library/thread_data.hpp"
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
instrumentation_bundles::instance_array_t&
|
||||
instrumentation_bundles::instances()
|
||||
{
|
||||
static auto _v = instance_array_t{};
|
||||
return _v;
|
||||
}
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -28,4 +28,6 @@
|
||||
|
||||
#include "library/timemory.hpp"
|
||||
|
||||
using namespace omnitrace;
|
||||
|
||||
TIMEMORY_INITIALIZE_STORAGE(comp::wall_clock, comp::user_global_bundle)
|
||||
|
||||
+447
-438
Plik diff jest za duży
Load Diff
+489
-30
@@ -28,6 +28,465 @@
|
||||
|
||||
#include "omnitrace.hpp"
|
||||
|
||||
static int expect_error = NO_ERROR;
|
||||
static int error_print = 0;
|
||||
static auto regex_opts = std::regex_constants::egrep | std::regex_constants::optimize;
|
||||
|
||||
// set of whole function names to exclude
|
||||
strset_t
|
||||
get_whole_function_names()
|
||||
{
|
||||
return strset_t{ "a64l",
|
||||
"advance",
|
||||
"aio_return",
|
||||
"aio_return64",
|
||||
"argp_error",
|
||||
"argp_failure",
|
||||
"argp_help",
|
||||
"argp_parse",
|
||||
"argp_state_help",
|
||||
"argp_usage",
|
||||
"argz_add",
|
||||
"argz_add_sep",
|
||||
"argz_append",
|
||||
"argz_count",
|
||||
"argz_create",
|
||||
"argz_create_sep",
|
||||
"argz_delete",
|
||||
"argz_extract",
|
||||
"argz_insert",
|
||||
"argz_next",
|
||||
"argz_replace",
|
||||
"argz_stringify",
|
||||
"atexit",
|
||||
"atof",
|
||||
"atoi",
|
||||
"atol",
|
||||
"atoll",
|
||||
"atomic_flag_clear_explicit",
|
||||
"atomic_flag_test_and_set_explicit",
|
||||
"authdes_create",
|
||||
"authdes_getucred",
|
||||
"authdes_pk_create",
|
||||
"authnone_create",
|
||||
"authunix_create",
|
||||
"authunix_create_default",
|
||||
"backtrace",
|
||||
"backtrace_symbols",
|
||||
"backtrace_symbols_fd",
|
||||
"bindresvport",
|
||||
"bindtextdomain",
|
||||
"bind_textdomain_codeset",
|
||||
"bsearch",
|
||||
"btowc",
|
||||
"c16rtomb",
|
||||
"callrpc",
|
||||
"canonicalize_file_name",
|
||||
"catclose",
|
||||
"catgets",
|
||||
"catopen",
|
||||
"cfmakeraw",
|
||||
"cfsetspeed",
|
||||
"chflags",
|
||||
"clearerr",
|
||||
"clearerr_unlocked",
|
||||
"clnt_broadcast",
|
||||
"clnt_create",
|
||||
"clnt_pcreateerror",
|
||||
"clnt_perrno",
|
||||
"clnt_perror",
|
||||
"clntraw_create",
|
||||
"clnt_spcreateerror",
|
||||
"clnt_sperrno",
|
||||
"clnt_sperror",
|
||||
"clnttcp_create",
|
||||
"clntudp_bufcreate",
|
||||
"clntudp_create",
|
||||
"clntunix_create",
|
||||
"confstr",
|
||||
"daemon",
|
||||
"des_setparity",
|
||||
"div",
|
||||
"dysize",
|
||||
"endutxent",
|
||||
"envz_add",
|
||||
"envz_entry",
|
||||
"envz_get",
|
||||
"envz_merge",
|
||||
"envz_remove",
|
||||
"envz_strip",
|
||||
"ether_aton",
|
||||
"ether_hostton",
|
||||
"ether_line",
|
||||
"ether_ntoa",
|
||||
"ether_ntohost",
|
||||
"execl",
|
||||
"execle",
|
||||
"execlp",
|
||||
"execv",
|
||||
"execvp",
|
||||
"execvpe",
|
||||
"explicit_bzero",
|
||||
"fattach",
|
||||
"fclose",
|
||||
"fdetach",
|
||||
"fdopen",
|
||||
"feof_unlocked",
|
||||
"ferror_unlocked",
|
||||
"fflush",
|
||||
"fflush_unlocked",
|
||||
"fgetpos",
|
||||
"fgets",
|
||||
"fgets_unlocked",
|
||||
"fgetws",
|
||||
"fgetws_unlocked",
|
||||
"_fini",
|
||||
"fini",
|
||||
"fmemopen",
|
||||
"fopen",
|
||||
"fopen64",
|
||||
"fopencookie",
|
||||
"fork",
|
||||
"fork_alias",
|
||||
"fork_compat",
|
||||
"fputc_unlocked",
|
||||
"fputs",
|
||||
"fputs_unlocked",
|
||||
"fputwc_unlocked",
|
||||
"fputws",
|
||||
"fputws_unlocked",
|
||||
"fread",
|
||||
"fread_unlocked",
|
||||
"fsetpos",
|
||||
"fsetpos64",
|
||||
"ftell",
|
||||
"fwrite",
|
||||
"fwrite_unlocked",
|
||||
"getdelim",
|
||||
"getgrouplist",
|
||||
"gethostbyname2",
|
||||
"getmntent",
|
||||
"getmsg",
|
||||
"getnetname",
|
||||
"getopt_long",
|
||||
"getopt_long_only",
|
||||
"getpmsg",
|
||||
"getpublickey",
|
||||
"gets",
|
||||
"getsecretkey",
|
||||
"glob_pattern_p",
|
||||
"gnu_dev_major",
|
||||
"gnu_dev_makedev",
|
||||
"gnu_dev_minor",
|
||||
"gnu_get_libc_release",
|
||||
"gnu_get_libc_version",
|
||||
"group_member",
|
||||
"gtty",
|
||||
"hcreate",
|
||||
"hdestroy",
|
||||
"herror",
|
||||
"host2netname",
|
||||
"hsearch",
|
||||
"hstrerror",
|
||||
"htons",
|
||||
"iconv",
|
||||
"iconv_close",
|
||||
"iconv_open",
|
||||
"inet6_opt_append",
|
||||
"inet6_opt_find",
|
||||
"inet6_opt_finish",
|
||||
"inet6_opt_get_val",
|
||||
"inet6_opt_init",
|
||||
"inet6_option_alloc",
|
||||
"inet6_option_append",
|
||||
"inet6_option_find",
|
||||
"inet6_option_init",
|
||||
"inet6_option_next",
|
||||
"inet6_option_space",
|
||||
"inet6_opt_next",
|
||||
"inet6_opt_set_val",
|
||||
"inet6_rth_add",
|
||||
"inet6_rth_getaddr",
|
||||
"inet6_rth_init",
|
||||
"inet6_rth_reverse",
|
||||
"inet6_rth_segments",
|
||||
"inet6_rth_space",
|
||||
"inet_addr",
|
||||
"inet_aton",
|
||||
"inet_lnaof",
|
||||
"inet_makeaddr",
|
||||
"inet_netof",
|
||||
"inet_network",
|
||||
"inet_nsap_addr",
|
||||
"inet_nsap_ntoa",
|
||||
"inet_ntoa",
|
||||
"inet_ntop",
|
||||
"inet_pton",
|
||||
"_init",
|
||||
"init",
|
||||
"initgroups",
|
||||
"initstate",
|
||||
"insque",
|
||||
"iruserok",
|
||||
"iruserok_af",
|
||||
"key_decryptsession",
|
||||
"key_decryptsession_pk",
|
||||
"key_encryptsession",
|
||||
"key_encryptsession_pk",
|
||||
"key_gendes",
|
||||
"key_get_conv",
|
||||
"key_secretkey_is_set",
|
||||
"key_setnet",
|
||||
"key_setsecret",
|
||||
"l64a",
|
||||
"lchmod",
|
||||
"lckpwdf",
|
||||
"lfind",
|
||||
"llabs",
|
||||
"lldiv",
|
||||
"localeconv",
|
||||
"lockf",
|
||||
"lsearch",
|
||||
"mbrtoc16",
|
||||
"mbrtoc32",
|
||||
"mcheck",
|
||||
"mcheck_check_all",
|
||||
"mcheck_pedantic",
|
||||
"mkdtemp",
|
||||
"mkdtemp64",
|
||||
"mkostemp",
|
||||
"mkostemp64",
|
||||
"mkostemps",
|
||||
"mkostemps64",
|
||||
"mkstemp",
|
||||
"mkstemp64",
|
||||
"mkstemps",
|
||||
"mkstemps64",
|
||||
"mktemp",
|
||||
"mktemp64",
|
||||
"moncontrol",
|
||||
"monstartup",
|
||||
"mprobe",
|
||||
"mtrace",
|
||||
"muntrace",
|
||||
"nanosleep",
|
||||
"netname2host",
|
||||
"netname2user",
|
||||
"nl_langinfo",
|
||||
"nl_langinfo_l",
|
||||
"ntohs",
|
||||
"parse_printf_format",
|
||||
"passwd2des",
|
||||
"pclose",
|
||||
"perror",
|
||||
"pmap_getmaps",
|
||||
"pmap_getport",
|
||||
"pmap_rmtcall",
|
||||
"pmap_set",
|
||||
"pmap_unset",
|
||||
"popen",
|
||||
"printf_size",
|
||||
"printf_size_info",
|
||||
"psiginfo",
|
||||
"psignal",
|
||||
"putchar",
|
||||
"putchar_unlocked",
|
||||
"putc_unlocked",
|
||||
"putenv",
|
||||
"putgrent",
|
||||
"putmsg",
|
||||
"putpmsg",
|
||||
"putpwent",
|
||||
"puts",
|
||||
"putsgent",
|
||||
"putspent",
|
||||
"pututxline",
|
||||
"putw",
|
||||
"putwc",
|
||||
"putwchar",
|
||||
"putwchar_unlocked",
|
||||
"putwc_unlocked",
|
||||
"rcmd",
|
||||
"rcmd_af",
|
||||
"reallocarray",
|
||||
"realpath",
|
||||
"re_comp",
|
||||
"re_compile_fastmap",
|
||||
"re_compile_pattern",
|
||||
"re_exec",
|
||||
"regcomp",
|
||||
"regerror",
|
||||
"regexec",
|
||||
"register_printf_modifier",
|
||||
"register_printf_type",
|
||||
"registerrpc",
|
||||
"re_match",
|
||||
"re_match_2",
|
||||
"remque",
|
||||
"re_search",
|
||||
"re_search_2",
|
||||
"re_set_registers",
|
||||
"re_set_syntax",
|
||||
"revoke",
|
||||
"rexec",
|
||||
"rexec_af",
|
||||
"rpmatch",
|
||||
"rresvport",
|
||||
"rresvport_af",
|
||||
"ruserok",
|
||||
"ruserok_af",
|
||||
"ruserpass",
|
||||
"secure_getenv",
|
||||
"seed48",
|
||||
"setbuffer",
|
||||
"setstate",
|
||||
"setvbuf",
|
||||
"sgetsgent",
|
||||
"sgetspent",
|
||||
"sigcancel_handler",
|
||||
"sighandler_setxid",
|
||||
"sstk",
|
||||
"step",
|
||||
"stty",
|
||||
"svcerr_auth",
|
||||
"svcerr_decode",
|
||||
"svcerr_noproc",
|
||||
"svcerr_noprog",
|
||||
"svcerr_progvers",
|
||||
"svcerr_systemerr",
|
||||
"svcerr_weakauth",
|
||||
"svc_exit",
|
||||
"svcfd_create",
|
||||
"svc_getreq",
|
||||
"svc_getreq_common",
|
||||
"svc_getreq_poll",
|
||||
"svc_getreqset",
|
||||
"svcraw_create",
|
||||
"svc_register",
|
||||
"svc_run",
|
||||
"svc_sendreply",
|
||||
"svctcp_create",
|
||||
"svcudp_bufcreate",
|
||||
"svcudp_create",
|
||||
"svcudp_enablecache",
|
||||
"svcunix_create",
|
||||
"svcunixfd_create",
|
||||
"svc_unregister",
|
||||
"swab",
|
||||
"tcgetsid",
|
||||
"tdelete",
|
||||
"tdestroy",
|
||||
"tempnam",
|
||||
"textdomain",
|
||||
"tfind",
|
||||
"thrd_create",
|
||||
"thrd_current",
|
||||
"thrd_detach",
|
||||
"thrd_equal",
|
||||
"thrd_exit",
|
||||
"thrd_join",
|
||||
"thrd_sleep",
|
||||
"thrd_yield",
|
||||
"tmpnam",
|
||||
"tolower",
|
||||
"toupper",
|
||||
"towctrans",
|
||||
"towctrans_l",
|
||||
"tr_break",
|
||||
"tsearch",
|
||||
"tss_create",
|
||||
"tss_delete",
|
||||
"tss_get",
|
||||
"tss_set",
|
||||
"ttyslot",
|
||||
"twalk",
|
||||
"twalk_r",
|
||||
"tzset",
|
||||
"ulckpwdf",
|
||||
"ungetc",
|
||||
"ungetwc",
|
||||
"unwind_stop",
|
||||
"updwtmpx",
|
||||
"user2netname",
|
||||
"utmpname",
|
||||
"utmpxname",
|
||||
"vlimit",
|
||||
"vtimes",
|
||||
"wait",
|
||||
"wait3",
|
||||
"waitpid",
|
||||
"wordexp",
|
||||
"xdecrypt",
|
||||
"xdr_accepted_reply",
|
||||
"xdr_array",
|
||||
"xdr_authdes_cred",
|
||||
"xdr_authdes_verf",
|
||||
"xdr_authunix_parms",
|
||||
"xdr_bool",
|
||||
"xdr_bytes",
|
||||
"xdr_callhdr",
|
||||
"xdr_callmsg",
|
||||
"xdr_char",
|
||||
"xdr_cryptkeyarg",
|
||||
"xdr_cryptkeyarg2",
|
||||
"xdr_cryptkeyres",
|
||||
"xdr_des_block",
|
||||
"xdr_double",
|
||||
"xdr_enum",
|
||||
"xdr_float",
|
||||
"xdr_getcredres",
|
||||
"xdr_hyper",
|
||||
"xdr_int",
|
||||
"xdr_int16_t",
|
||||
"xdr_int64_t",
|
||||
"xdr_int8_t",
|
||||
"xdr_keybuf",
|
||||
"xdr_key_netstarg",
|
||||
"xdr_key_netstres",
|
||||
"xdr_keystatus",
|
||||
"xdr_longlong_t",
|
||||
"xdrmem_create",
|
||||
"xdr_netnamestr",
|
||||
"xdr_netobj",
|
||||
"xdr_opaque",
|
||||
"xdr_opaque_auth",
|
||||
"xdr_pmap",
|
||||
"xdr_pmaplist",
|
||||
"xdr_pointer",
|
||||
"xdr_quad_t",
|
||||
"xdrrec_create",
|
||||
"xdrrec_endofrecord",
|
||||
"xdrrec_eof",
|
||||
"xdrrec_skiprecord",
|
||||
"xdr_reference",
|
||||
"xdr_rejected_reply",
|
||||
"xdr_replymsg",
|
||||
"xdr_rmtcall_args",
|
||||
"xdr_rmtcallres",
|
||||
"xdr_short",
|
||||
"xdr_sizeof",
|
||||
"xdrstdio_create",
|
||||
"xdr_string",
|
||||
"xdr_u_char",
|
||||
"xdr_u_hyper",
|
||||
"xdr_u_int",
|
||||
"xdr_uint16_t",
|
||||
"xdr_uint64_t",
|
||||
"xdr_uint8_t",
|
||||
"xdr_u_long",
|
||||
"xdr_u_longlong_t",
|
||||
"xdr_union",
|
||||
"xdr_unixcred",
|
||||
"xdr_u_quad_t",
|
||||
"xdr_u_short",
|
||||
"xdr_vector",
|
||||
"xdr_void",
|
||||
"xdr_wrapstring",
|
||||
"xencrypt",
|
||||
"xprt_register",
|
||||
"xprt_unregister" };
|
||||
}
|
||||
|
||||
//======================================================================================//
|
||||
//
|
||||
// For selective instrumentation (unused)
|
||||
@@ -49,11 +508,14 @@ get_loop_file_line_info(module_t* mutatee_module, procedure_t* f, flow_graph_t*
|
||||
{
|
||||
if(!cfGraph || !loopToInstrument || !f) return function_signature{ "", "", "" };
|
||||
|
||||
char fname[MUTNAMELEN];
|
||||
char mname[MUTNAMELEN];
|
||||
char fname[FUNCNAMELEN + 1];
|
||||
char mname[FUNCNAMELEN + 1];
|
||||
std::string typeName = {};
|
||||
|
||||
mutatee_module->getName(mname, MUTNAMELEN);
|
||||
memset(fname, '\0', FUNCNAMELEN + 1);
|
||||
memset(mname, '\0', FUNCNAMELEN + 1);
|
||||
|
||||
mutatee_module->getName(mname, FUNCNAMELEN);
|
||||
|
||||
bpvector_t<point_t*>* loopStartInst =
|
||||
cfGraph->findLoopInstPoints(BPatch_locLoopStartIter, loopToInstrument);
|
||||
@@ -69,7 +531,7 @@ get_loop_file_line_info(module_t* mutatee_module, procedure_t* f, flow_graph_t*
|
||||
(unsigned long) loopExitInst->size(), (unsigned long) baseAddr,
|
||||
(unsigned long) lastAddr);
|
||||
|
||||
f->getName(fname, MUTNAMELEN);
|
||||
f->getName(fname, FUNCNAMELEN);
|
||||
|
||||
auto* returnType = f->getReturnType();
|
||||
|
||||
@@ -78,11 +540,11 @@ get_loop_file_line_info(module_t* mutatee_module, procedure_t* f, flow_graph_t*
|
||||
typeName = returnType->getName();
|
||||
}
|
||||
|
||||
auto params = f->getParams();
|
||||
auto* params = f->getParams();
|
||||
std::vector<string_t> _params;
|
||||
if(params)
|
||||
{
|
||||
for(auto itr : *params)
|
||||
for(auto* itr : *params)
|
||||
{
|
||||
string_t _name = itr->getType()->getName();
|
||||
if(_name.empty()) _name = itr->getName();
|
||||
@@ -147,18 +609,21 @@ get_func_file_line_info(module_t* mutatee_module, procedure_t* f)
|
||||
{
|
||||
bool info1, info2;
|
||||
unsigned long baseAddr, lastAddr;
|
||||
char fname[MUTNAMELEN];
|
||||
char mname[MUTNAMELEN];
|
||||
char fname[FUNCNAMELEN + 1];
|
||||
char mname[FUNCNAMELEN + 1];
|
||||
int row1, col1, row2, col2;
|
||||
string_t filename = {};
|
||||
string_t typeName = {};
|
||||
|
||||
mutatee_module->getName(mname, MUTNAMELEN);
|
||||
memset(fname, '\0', FUNCNAMELEN + 1);
|
||||
memset(mname, '\0', FUNCNAMELEN + 1);
|
||||
|
||||
mutatee_module->getName(mname, FUNCNAMELEN);
|
||||
|
||||
baseAddr = (unsigned long) (f->getBaseAddr());
|
||||
f->getAddressRange(baseAddr, lastAddr);
|
||||
bpvector_t<BPatch_statement> lines;
|
||||
f->getName(fname, MUTNAMELEN);
|
||||
f->getName(fname, FUNCNAMELEN);
|
||||
|
||||
auto* returnType = f->getReturnType();
|
||||
|
||||
@@ -167,11 +632,11 @@ get_func_file_line_info(module_t* mutatee_module, procedure_t* f)
|
||||
typeName = returnType->getName();
|
||||
}
|
||||
|
||||
auto params = f->getParams();
|
||||
auto* params = f->getParams();
|
||||
std::vector<string_t> _params;
|
||||
if(params)
|
||||
{
|
||||
for(auto itr : *params)
|
||||
for(auto* itr : *params)
|
||||
{
|
||||
string_t _name = itr->getType()->getName();
|
||||
if(_name.empty()) _name = itr->getName();
|
||||
@@ -238,27 +703,34 @@ errorFunc(error_level_t level, int num, const char** params)
|
||||
// For compatibility purposes
|
||||
//
|
||||
procedure_t*
|
||||
find_function(image_t* app_image, const std::string& _name, strset_t _extra)
|
||||
find_function(image_t* app_image, const std::string& _name, const strset_t& _extra)
|
||||
{
|
||||
if(_name.empty()) return nullptr;
|
||||
|
||||
auto _find = [app_image](const string_t& _f) -> procedure_t* {
|
||||
// Extract the vector of functions
|
||||
bpvector_t<procedure_t*> _found;
|
||||
auto ret = app_image->findFunction(_f.c_str(), _found, false, true, true);
|
||||
auto* ret = app_image->findFunction(_f.c_str(), _found, false, true, true);
|
||||
if(ret == nullptr || _found.empty()) return nullptr;
|
||||
return _found.at(0);
|
||||
};
|
||||
|
||||
procedure_t* _func = _find(_name);
|
||||
auto itr = _extra.begin();
|
||||
while(!_func && itr != _extra.end())
|
||||
while(_func == nullptr && itr != _extra.end())
|
||||
{
|
||||
_func = _find(*itr);
|
||||
++itr;
|
||||
}
|
||||
|
||||
if(!_func) verbprintf(2, "omnitrace: Unable to find function %s\n", _name.c_str());
|
||||
if(!_func)
|
||||
{
|
||||
verbprintf(1, "function: '%s' ... not found\n", _name.c_str());
|
||||
}
|
||||
else
|
||||
{
|
||||
verbprintf(1, "function: '%s' ... found\n", _name.c_str());
|
||||
}
|
||||
|
||||
return _func;
|
||||
}
|
||||
@@ -271,7 +743,7 @@ error_func_real(error_level_t level, int num, const char* const* params)
|
||||
if(num == 0)
|
||||
{
|
||||
// conditional reporting of warnings and informational messages
|
||||
if(error_print)
|
||||
if(error_print > 0)
|
||||
{
|
||||
if(level == BPatchInfo)
|
||||
{
|
||||
@@ -440,16 +912,3 @@ c_stdlib_function_constraint(const std::string& _func)
|
||||
}
|
||||
//======================================================================================//
|
||||
//
|
||||
inline void
|
||||
consume()
|
||||
{
|
||||
consume_parameters(initialize_expr, bpatch, use_mpi, stl_func_instr, werror,
|
||||
loop_level_instr, error_print, binary_rewrite, debug_print,
|
||||
expect_error, is_static_exe, available_module_functions,
|
||||
instrumented_module_functions);
|
||||
}
|
||||
//
|
||||
namespace
|
||||
{
|
||||
static auto _consumed = (consume(), true);
|
||||
}
|
||||
|
||||
+216
-49
@@ -3,78 +3,245 @@ if(NOT OMNITRACE_DYNINST_API_RT_DIR AND OMNITRACE_DYNINST_API_RT)
|
||||
DIRECTORY)
|
||||
endif()
|
||||
|
||||
include(ProcessorCount)
|
||||
if(NOT DEFINED NUM_PROCS_REAL)
|
||||
processorcount(NUM_PROCS_REAL)
|
||||
endif()
|
||||
|
||||
if(NOT DEFINED NUM_PROCS)
|
||||
set(NUM_PROCS 2)
|
||||
endif()
|
||||
|
||||
math(EXPR NUM_THREADS "${NUM_PROCS_REAL} + (${NUM_PROCS_REAL} / 2)")
|
||||
if(NUM_THREADS GREATER 12)
|
||||
set(NUM_THREADS 12)
|
||||
endif()
|
||||
|
||||
if(OMNITRACE_BUILD_DYNINST)
|
||||
set(OMNITRACE_DYNINST_API_RT_DIR
|
||||
"${PROJECT_BINARY_DIR}/external/dyninst/dyninstAPI_RT:${PROJECT_BINARY_DIR}/external/dyninst/dyninstAPI"
|
||||
)
|
||||
endif()
|
||||
|
||||
set(_test_environment
|
||||
set(_base_environment
|
||||
"OMNITRACE_USE_PERFETTO=ON"
|
||||
"OMNITRACE_USE_TIMEMORY=ON"
|
||||
"OMNITRACE_USE_SAMPLING=OFF"
|
||||
"OMNITRACE_TIME_OUTPUT=OFF"
|
||||
"OMP_PROC_BIND=spread"
|
||||
"OMP_PLACES=threads"
|
||||
"OMP_NUM_THREADS=2"
|
||||
"LD_LIBRARY_PATH=${PROJECT_BINARY_DIR}:${OMNITRACE_DYNINST_API_RT_DIR}:$ENV{LD_LIBRARY_PATH}"
|
||||
)
|
||||
|
||||
if(TARGET transpose)
|
||||
if(TRANSPOSE_USE_MPI AND NUM_PROCS GREATER 0)
|
||||
set(COMMAND_PREFIX ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${NUM_PROCS})
|
||||
set(_test_environment ${_base_environment} "OMNITRACE_CRITICAL_TRACE=OFF")
|
||||
|
||||
set(_fast_environment
|
||||
"OMNITRACE_USE_PERFETTO=OFF"
|
||||
"OMNITRACE_USE_TIMEMORY=OFF"
|
||||
"OMNITRACE_USE_SAMPLING=OFF"
|
||||
"OMNITRACE_CRITICAL_TRACE=OFF"
|
||||
"OMNITRACE_TIME_OUTPUT=OFF"
|
||||
"OMP_PROC_BIND=spread"
|
||||
"OMP_PLACES=threads"
|
||||
"OMP_NUM_THREADS=2"
|
||||
"LD_LIBRARY_PATH=${PROJECT_BINARY_DIR}:${OMNITRACE_DYNINST_API_RT_DIR}:$ENV{LD_LIBRARY_PATH}"
|
||||
)
|
||||
|
||||
function(OMNITRACE_ADD_TEST)
|
||||
cmake_parse_arguments(
|
||||
TEST
|
||||
"" # options
|
||||
"NAME;TARGET;MPI;NUM_PROCS" # single value args
|
||||
"REWRITE_ARGS;RUNTIME_ARGS;RUN_ARGS;ENVIRONMENT;LABELS" # multiple value args
|
||||
${ARGN})
|
||||
|
||||
if("${TEST_MPI}" STREQUAL "")
|
||||
set(TEST_MPI OFF)
|
||||
endif()
|
||||
|
||||
add_test(
|
||||
NAME transpose-binary-rewrite
|
||||
COMMAND
|
||||
$<TARGET_FILE:omnitrace-exe> -o $<TARGET_FILE_DIR:transpose>/transpose.inst -v
|
||||
1 -- $<TARGET_FILE:transpose>
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
if(NOT DEFINED TEST_NUM_PROCS)
|
||||
set(TEST_NUM_PROCS ${NUM_PROCS})
|
||||
endif()
|
||||
|
||||
add_test(
|
||||
NAME transpose-binary-rewrite-run
|
||||
COMMAND ${COMMAND_PREFIX} $<TARGET_FILE_DIR:transpose>/transpose.inst
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
if(NUM_PROCS EQUAL 0)
|
||||
set(TEST_NUM_PROCS 0)
|
||||
endif()
|
||||
|
||||
add_test(
|
||||
NAME transpose-runtime-instrument
|
||||
COMMAND $<TARGET_FILE:omnitrace-exe> -v 1 --label file line return args --
|
||||
$<TARGET_FILE:transpose>
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
if(NOT DEFINED TEST_ENVIRONMENT OR "${TEST_ENVIRONMENT}" STREQUAL "")
|
||||
set(TEST_ENVIRONMENT "${_test_environment}")
|
||||
endif()
|
||||
|
||||
set_tests_properties(transpose-binary-rewrite-run PROPERTIES DEPENDS
|
||||
transpose-binary-rewrite)
|
||||
if(TARGET ${TEST_TARGET})
|
||||
if(DEFINED TEST_MPI
|
||||
AND ${TEST_MPI}
|
||||
AND TEST_NUM_PROCS GREATER 0)
|
||||
if(NOT TEST_NUM_PROCS GREATER NUM_PROCS_REAL)
|
||||
set(COMMAND_PREFIX ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG}
|
||||
${TEST_NUM_PROCS})
|
||||
else()
|
||||
set(COMMAND_PREFIX ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} 1)
|
||||
endif()
|
||||
else()
|
||||
list(APPEND TEST_ENVIRONMENT "OMNITRACE_USE_PID=OFF")
|
||||
endif()
|
||||
|
||||
set_tests_properties(
|
||||
transpose-binary-rewrite transpose-binary-rewrite-run transpose-runtime-instrument
|
||||
PROPERTIES ENVIRONMENT "${_test_environment}" TIMEOUT 600)
|
||||
endif()
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-baseline
|
||||
COMMAND $<TARGET_FILE:${TEST_TARGET}> ${TEST_RUN_ARGS}
|
||||
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
|
||||
|
||||
if(TARGET parallel-overhead)
|
||||
add_test(
|
||||
NAME parallel-overhead-binary-rewrite
|
||||
COMMAND
|
||||
$<TARGET_FILE:omnitrace-exe> -o
|
||||
$<TARGET_FILE_DIR:parallel-overhead>/parallel-overhead.inst -v 1
|
||||
--min-address-range-loop=72 --label file line return args --
|
||||
$<TARGET_FILE:parallel-overhead>
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-binary-rewrite
|
||||
COMMAND
|
||||
$<TARGET_FILE:omnitrace-exe> -o
|
||||
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_TARGET}.inst ${TEST_REWRITE_ARGS}
|
||||
-- $<TARGET_FILE:${TEST_TARGET}>
|
||||
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
|
||||
|
||||
add_test(
|
||||
NAME parallel-overhead-binary-rewrite-run
|
||||
COMMAND $<TARGET_FILE_DIR:parallel-overhead>/parallel-overhead.inst
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-binary-rewrite-sampling
|
||||
COMMAND
|
||||
$<TARGET_FILE:omnitrace-exe> -o
|
||||
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_TARGET}.samp -M sampling
|
||||
${TEST_REWRITE_ARGS} -- $<TARGET_FILE:${TEST_TARGET}>
|
||||
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
|
||||
|
||||
add_test(
|
||||
NAME parallel-overhead-runtime-instrument
|
||||
COMMAND $<TARGET_FILE:omnitrace-exe> -v 1 -- $<TARGET_FILE:parallel-overhead>
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-binary-rewrite-run
|
||||
COMMAND ${COMMAND_PREFIX}
|
||||
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_TARGET}.inst ${TEST_RUN_ARGS}
|
||||
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
|
||||
|
||||
set_tests_properties(parallel-overhead-binary-rewrite-run
|
||||
PROPERTIES DEPENDS parallel-overhead-binary-rewrite)
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-binary-rewrite-run-sampling
|
||||
COMMAND ${COMMAND_PREFIX}
|
||||
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_TARGET}.samp ${TEST_RUN_ARGS}
|
||||
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
|
||||
|
||||
set_tests_properties(
|
||||
parallel-overhead-binary-rewrite parallel-overhead-binary-rewrite-run
|
||||
parallel-overhead-runtime-instrument
|
||||
PROPERTIES ENVIRONMENT "${_test_environment}" TIMEOUT 600)
|
||||
endif()
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-runtime-instrument
|
||||
COMMAND $<TARGET_FILE:omnitrace-exe> ${TEST_RUNTIME_ARGS} --
|
||||
$<TARGET_FILE:${TEST_TARGET}> ${TEST_RUN_ARGS}
|
||||
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
|
||||
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-runtime-instrument-sampling
|
||||
COMMAND
|
||||
$<TARGET_FILE:omnitrace-exe> -M sampling --env
|
||||
OMNITRACE_OUTPUT_PREFIX=sampling- ${TEST_RUNTIME_ARGS} --
|
||||
$<TARGET_FILE:${TEST_TARGET}> ${TEST_RUN_ARGS}
|
||||
WORKING_DIRECTORY $<TARGET_FILE_DIR:${TEST_TARGET}>)
|
||||
|
||||
set_tests_properties(${TEST_NAME}-binary-rewrite-run
|
||||
PROPERTIES DEPENDS ${TEST_NAME}-binary-rewrite)
|
||||
|
||||
set_tests_properties(${TEST_NAME}-binary-rewrite-run-sampling
|
||||
PROPERTIES DEPENDS ${TEST_NAME}-binary-rewrite-sampling)
|
||||
|
||||
foreach(
|
||||
_TEST
|
||||
baseline binary-rewrite binary-rewrite-run binary-rewrite-sampling
|
||||
binary-rewrite-run-sampling runtime-instrument runtime-instrument-sampling)
|
||||
string(REPLACE "-run-" "-" _PREFIX "${TEST_NAME}-${_TEST}-")
|
||||
set(_environ "${TEST_ENVIRONMENT}")
|
||||
list(APPEND _environ "OMNITRACE_OUTPUT_PATH=omnitrace-tests-output"
|
||||
"OMNITRACE_OUTPUT_PREFIX=${_PREFIX}")
|
||||
set(_LABELS "${_TEST}")
|
||||
string(REPLACE "-run" "" _LABELS "${_TEST}")
|
||||
string(REPLACE "-sampling" ";sampling" _LABELS "${_LABELS}")
|
||||
set_tests_properties(
|
||||
${TEST_NAME}-${_TEST} PROPERTIES ENVIRONMENT "${_environ}" TIMEOUT 600
|
||||
LABELS "${_LABELS};${TEST_LABELS}")
|
||||
endforeach()
|
||||
endif()
|
||||
endfunction()
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME transpose
|
||||
TARGET transpose
|
||||
MPI ${TRANSPOSE_USE_MPI}
|
||||
NUM_PROCS ${NUM_PROCS}
|
||||
REWRITE_ARGS -e -v 1
|
||||
RUNTIME_ARGS -e -v 1 --label file line return args
|
||||
RUN_ARGS ""
|
||||
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=ON")
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME transpose-no-save-fpr
|
||||
TARGET transpose
|
||||
MPI ${TRANSPOSE_USE_MPI}
|
||||
NUM_PROCS ${NUM_PROCS}
|
||||
REWRITE_ARGS -e -v 1 --dyninst-options DelayedParsing TypeChecking
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
--dyninst-options
|
||||
DelayedParsing
|
||||
TypeChecking
|
||||
RUN_ARGS ""
|
||||
ENVIRONMENT "${_fast_environment}")
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME parallel-overhead
|
||||
TARGET parallel-overhead
|
||||
REWRITE_ARGS -e -v 1 --min-address-range-loop=64
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--min-address-range-loop=64
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
RUN_ARGS 10 ${NUM_THREADS}
|
||||
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF")
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME parallel-overhead-no-save-fpr
|
||||
TARGET parallel-overhead
|
||||
REWRITE_ARGS -e -v 1 --min-address-range-loop=32 --dyninst-options DelayedParsing
|
||||
TypeChecking
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--min-address-range-loop=32
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
--dyninst-options
|
||||
DelayedParsing
|
||||
TypeChecking
|
||||
RUN_ARGS 20 ${NUM_THREADS}
|
||||
ENVIRONMENT "${_fast_environment}")
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME lulesh
|
||||
TARGET lulesh
|
||||
MPI ${LULESH_USE_MPI}
|
||||
NUM_PROCS 8
|
||||
REWRITE_ARGS -e -v 1
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
-ME
|
||||
[==['lib(gomp|m-)']==]
|
||||
RUN_ARGS -i 10 -s 20 -p
|
||||
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF")
|
||||
|
||||
Reference in New Issue
Block a user