Omnitrace sample documentation (#179)

* Documentation for omnitrace-sample

* Improve omnitrace-sample

- improve the printing of the env updates
- remove env settings when something is deactivated
- restore env settings when something is deactivated

[ROCm/rocprofiler-systems commit: 67f7471253]
Этот коммит содержится в:
Jonathan R. Madsen
2022-10-19 03:30:00 -05:00
коммит произвёл GitHub
родитель 8f8ead76b5
Коммит 7234e8cf45
15 изменённых файлов: 789 добавлений и 92 удалений
+100 -4
Просмотреть файл
@@ -85,9 +85,43 @@ such as the memory usage, page-faults, and context-switches, and thread-level me
## Documentation
The full documentation for [omnitrace](https://github.com/AMDResearch/omnitrace) is available at [amdresearch.github.io/omnitrace](https://amdresearch.github.io/omnitrace/).
See the [Getting Started documentation](https://amdresearch.github.io/omnitrace/getting_started) for general tips and a detailed discussion about sampling vs. binary instrumentation.
## Quick Start
### Installation
- Visit [Releases](https://github.com/AMDResearch/omnitrace/releases) page
- Select appropriate installer (recommendation: `.sh` scripts do not require super-user priviledges unlike the DEB/RPM installers)
- If targeting a ROCm application, find the installer script with the matching ROCm version
- If you are unsure about your Linux distro, check `/etc/os-release`
- If no installer script matches your target OS, try one of the Ubuntu 18.04 `*.sh` installers
- This installation may be built against older library versions supported on your distro via backwards compatibility
### Setup
> NOTE: Replace `/opt/omnitrace` below with installation prefix as necessary.
- Option 1: Source `setup-env.sh` script
```bash
source /opt/omnitrace/share/omnitrace/setup-env.sh
```
- Option 2: Load modulefile
```bash
module use /opt/omnitrace/share/modulefiles
module load omnitrace
```
- Option 3: Manual
```bash
export PATH=/opt/omnitrace/bin:${PATH}
export LD_LIBRARY_PATH=/opt/omnitrace/lib:${LD_LIBRARY_PATH}
```
### Omnitrace Settings
Generate an omnitrace configuration file using `omnitrace-avail -G omnitrace.cfg`. Optionally, use `omnitrace-avail -G omnitrace.cfg --all` for
@@ -111,9 +145,23 @@ Once the configuration file is adjusted to your preferences, either export the p
or place this file in `${HOME}/.omnitrace.cfg` to ensure these values are always read as the default. If you wish to change any of these settings,
you can override them via environment variables or by specifying an alternative `OMNITRACE_CONFIG_FILE`.
### Omnitrace Executable
### Call-Stack Sampling
The `omnitrace` executable is used to instrument an existing binary.
The `omnitrace-sample` executable is used to execute call-stack sampling on a target application without binary instrumentation.
Use a double-hypen (`--`) to separate the command-line arguments for `omnitrace-sample` from the target application and it's arguments.
```shell
omnitrace-sample --help
omnitrace-sample <omnitrace-options> -- <exe> <exe-options>
omnitrace-sample -f 1000 -- ls -la
```
### Binary Instrumentation
The `omnitrace` executable is used to instrument an existing binary. Call-stack sampling can be enabled alongside
the execution an instrumented binary, to help "fill in the gaps" between the instrumentation via setting the `OMNITRACE_USE_SAMPLING`
configuration variable to `ON`.
Similar to `omnitrace-sample`, use a double-hypen (`--`) to separate the command-line arguments for `omnitrace` from the target application and it's arguments.
```shell
omnitrace --help
@@ -183,9 +231,57 @@ omnitrace -ME '^(libhsa-runtime64|libz\\.so)' -- /path/to/app
omnitrace -E 'rocr::atomic|rocr::core|rocr::HSA' -- /path/to/app
```
### Visualizing Perfetto Results
### Python Profiling and Tracing
Visit [ui.perfetto.dev](https://ui.perfetto.dev) in your browser and open up the `.proto` file(s) created by omnitrace.
Use the `omnitrace-python` script to profile/trace Python interpreter function calls.
Use a double-hypen (`--`) to separate the command-line arguments for `omnitrace-python` from the target script and it's arguments.
```shell
omnitrace-python --help
omnitrace-python <omnitrace-options> -- <python-script> <script-args>
omnitrace-python -- ./script.py
```
Please note, the first argument after the double-hyphen *must be a Python script*, e.g. `omnitrace-python -- ./script.py`.
If you need to specify a specific python interpreter version, use `omnitrace-python-X.Y` where `X.Y` is the Python
major and minor version:
```shell
omnitrace-python-3.8 -- ./script.py
```
If you need to specify the full path to a Python interpreter, set the `PYTHON_EXECUTABLE` environment variable:
```shell
PYTHON_EXECUTABLE=/opt/conda/bin/python omnitrace-python -- ./script.py
```
If you want to restrict the data collection to specific function(s) and its callees, pass the `-b` / `--builtin` option after decorating the
function(s) with `@profile`. Use the `@noprofile` decorator for excluding/ignoring function(s) and its callees:
```python
def foo():
pass
@noprofile
def bar():
foo()
@profile
def spam():
foo()
bar()
```
Each time `spam` is called during profiling, the profiling results will include 1 entry for `spam` and 1 entry
for `foo` via the direct call within `spam`. There will be no entries for `bar` or the `foo` invocation within it.
### Trace Visualization
- Visit [ui.perfetto.dev](https://ui.perfetto.dev) in the web-browser
- Select "Open trace file" from panel on the left
- Locate the omnitrace perfetto output (extension: `.proto`)
![omnitrace-perfetto](source/docs/images/omnitrace-perfetto.png)
Submodule projects/rocprofiler-systems/external/timemory updated: e6305b0455...95df33c9c4
+121 -14
Просмотреть файл
@@ -52,14 +52,17 @@
#endif
namespace color = tim::log::color;
using tim::log::stream;
using namespace timemory::join;
using tim::get_env;
using tim::log::colorized;
using tim::log::stream;
namespace
{
int verbose = 0;
}
int verbose = 0;
auto updated_envs = std::set<std::string_view>{};
auto original_envs = std::set<std::string>{};
} // namespace
std::string
get_command(const char* _argv0)
@@ -92,7 +95,11 @@ get_initial_environment()
{
int idx = 0;
while(environ[idx] != nullptr)
_env.emplace_back(strdup(environ[idx++]));
{
auto* _v = environ[idx++];
original_envs.emplace(_v);
_env.emplace_back(strdup(_v));
}
}
update_env(_env, "LD_PRELOAD",
@@ -106,22 +113,25 @@ get_initial_environment()
update_env(_env, "OMNITRACE_USE_SAMPLING", true);
update_env(_env, "OMNITRACE_CRITICAL_TRACE", false);
update_env(_env, "OMNITRACE_USE_PROCESS_SAMPLING", false);
// update_env(_env, "OMNITRACE_USE_PID", false);
// update_env(_env, "OMNITRACE_TIME_OUTPUT", false);
// update_env(_env, "OMNITRACE_OUTPUT_PATH", "omnitrace-output/%tag%/%launch_time%");
#if defined(OMNITRACE_USE_ROCTRACER) || defined(OMNITRACE_USE_ROCPROFILER)
update_env(_env, "HSA_TOOLS_LIB", _dl_libpath);
update_env(_env, "HSA_TOOLS_REPORT_LOAD_FAILURE", "1");
if(!getenv("HSA_TOOLS_REPORT_LOAD_FAILURE"))
update_env(_env, "HSA_TOOLS_REPORT_LOAD_FAILURE", "1");
#endif
#if defined(OMNITRACE_USE_ROCPROFILER)
update_env(_env, "ROCP_TOOL_LIB", _omni_libpath);
update_env(_env, "ROCP_HSA_INTERCEPT", "1");
if(!getenv("ROCP_HSA_INTERCEPT")) update_env(_env, "ROCP_HSA_INTERCEPT", "1");
#endif
#if defined(OMNITRACE_USE_OMPT)
update_env(_env, "OMP_TOOL_LIBRARIES", _dl_libpath);
if(!getenv("OMP_TOOL_LIBRARIES"))
update_env(_env, "OMP_TOOL_LIBRARIES", _dl_libpath, true);
#endif
free(_dl_libpath);
@@ -140,11 +150,58 @@ get_internal_libpath(const std::string& _lib)
return omnitrace::common::join("/", _dir, "..", "lib", _lib);
}
void
print_updated_environment(std::vector<char*> _env)
{
std::sort(_env.begin(), _env.end(), [](auto* _lhs, auto* _rhs) {
if(!_lhs) return false;
if(!_rhs) return true;
return std::string_view{ _lhs } < std::string_view{ _rhs };
});
std::vector<char*> _updates = {};
std::vector<char*> _general = {};
for(auto* itr : _env)
{
if(itr == nullptr) continue;
auto _is_omni = (std::string_view{ itr }.find("OMNITRACE") == 0);
auto _updated = false;
for(const auto& vitr : updated_envs)
{
if(std::string_view{ itr }.find(vitr) == 0)
{
_updated = true;
break;
}
}
if(_updated)
_updates.emplace_back(itr);
else if(verbose >= 1 && _is_omni)
_general.emplace_back(itr);
}
if(_general.size() + _updates.size() == 0 || verbose < 0) return;
std::cerr << std::endl;
for(auto& itr : _general)
stream(std::cerr, color::source()) << itr << "\n";
for(auto& itr : _updates)
stream(std::cerr, color::source()) << itr << "\n";
std::cerr << std::endl;
}
template <typename Tp>
void
update_env(std::vector<char*>& _environ, std::string_view _env_var, Tp&& _env_val,
bool _append)
{
updated_envs.emplace(_env_var);
auto _key = join("", _env_var, "=");
for(auto& itr : _environ)
{
@@ -153,11 +210,13 @@ update_env(std::vector<char*>& _environ, std::string_view _env_var, Tp&& _env_va
{
if(_append)
{
auto _val = std::string{ itr }.substr(_key.length());
free(itr);
itr = strdup(
omnitrace::common::join('=', _env_var, join(":", _env_val, _val))
.c_str());
if(std::string_view{ itr }.find(join("", _env_val)) ==
std::string_view::npos)
{
auto _val = std::string{ itr }.substr(_key.length());
free(itr);
itr = strdup(join('=', _env_var, join(":", _env_val, _val)).c_str());
}
}
else
{
@@ -171,6 +230,22 @@ update_env(std::vector<char*>& _environ, std::string_view _env_var, Tp&& _env_va
strdup(omnitrace::common::join('=', _env_var, _env_val).c_str()));
}
void
remove_env(std::vector<char*>& _environ, std::string_view _env_var)
{
auto _key = join("", _env_var, "=");
auto _match = [&_key](auto itr) { return std::string_view{ itr }.find(_key) == 0; };
_environ.erase(std::remove_if(_environ.begin(), _environ.end(), _match),
_environ.end());
for(const auto& itr : original_envs)
{
if(std::string_view{ itr }.find(_key) == 0)
_environ.emplace_back(strdup(itr.c_str()));
}
}
std::vector<char*>
parse_args(int argc, char** argv, std::vector<char*>& _env)
{
@@ -200,6 +275,11 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
exit(_pec);
};
auto* _dl_libpath =
realpath(get_internal_libpath("libomnitrace-dl.so").c_str(), nullptr);
auto* _omni_libpath =
realpath(get_internal_libpath("libomnitrace.so").c_str(), nullptr);
auto parser = parser_t(argv[0]);
parser.on_error([](parser_t&, const parser_err_t& _err) {
@@ -273,6 +353,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
.dtype("bool")
.action([&](parser_t& p) {
auto _colorized = !p.get<bool>("monochrome");
colorized() = _colorized;
p.set_use_color(_colorized);
update_env(_env, "OMNITRACE_COLORIZED_LOG", (_colorized) ? "1" : "0");
update_env(_env, "COLORIZED_LOG", (_colorized) ? "1" : "0");
@@ -599,6 +680,12 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
_update("OMNITRACE_TRACE_THREAD_LOCKS", _v.count("mutex-locks") > 0);
_update("OMNITRACE_TRACE_THREAD_RW_LOCKS", _v.count("rw-locks") > 0);
_update("OMNITRACE_TRACE_THREAD_SPIN_LOCKS", _v.count("spin-locks") > 0);
if(_v.count("all") > 0 || _v.count("ompt") > 0)
update_env(_env, "OMP_TOOL_LIBRARIES", _dl_libpath, true);
if(_v.count("all") > 0 || _v.count("kokkosp") > 0)
update_env(_env, "KOKKOS_PROFILE_LIBRARY", _omni_libpath, true);
});
parser.add_argument({ "-E", "--exclude" }, "Exclude data from these backends")
@@ -619,6 +706,25 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
_update("OMNITRACE_TRACE_THREAD_LOCKS", _v.count("mutex-locks") > 0);
_update("OMNITRACE_TRACE_THREAD_RW_LOCKS", _v.count("rw-locks") > 0);
_update("OMNITRACE_TRACE_THREAD_SPIN_LOCKS", _v.count("spin-locks") > 0);
if(_v.count("all") > 0 ||
(_v.count("roctracer") > 0 && _v.count("rocprofiler") > 0))
{
remove_env(_env, "HSA_TOOLS_LIB");
remove_env(_env, "HSA_TOOLS_REPORT_LOAD_FAILURE");
}
if(_v.count("all") > 0 || _v.count("rocprofiler") > 0)
{
remove_env(_env, "ROCP_TOOL_LIB");
remove_env(_env, "ROCP_HSA_INTERCEPT");
}
if(_v.count("all") > 0 || _v.count("ompt") > 0)
remove_env(_env, "OMP_TOOL_LIBRARIES");
if(_v.count("all") > 0 || _v.count("kokkosp") > 0)
remove_env(_env, "KOKKOS_PROFILE_LIBRARY");
});
_add_separator("HARDWARE COUNTER OPTIONS", "");
@@ -626,7 +732,6 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
.add_argument({ "-C", "--cpu-events" },
"Set the CPU hardware counter events to record (ref: "
"`omnitrace-avail -H -c CPU`)")
.set_default(std::set<std::string>{})
.action([&](parser_t& p) {
auto _events =
join(array_config{ "," }, p.get<std::vector<std::string>>("cpu-events"));
@@ -638,7 +743,6 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
.add_argument({ "-G", "--gpu-events" },
"Set the GPU hardware counter events to record (ref: "
"`omnitrace-avail -H -c GPU`)")
.set_default(std::set<std::string>{})
.action([&](parser_t& p) {
auto _events =
join(array_config{ "," }, p.get<std::vector<std::string>>("gpu-events"));
@@ -695,5 +799,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
throw std::runtime_error(
"Error! '--profile' argument conflicts with '--flat-profile' argument");
free(_dl_libpath);
free(_omni_libpath);
return _outv;
}
+1 -9
Просмотреть файл
@@ -52,15 +52,7 @@ main(int argc, char** argv)
_argv.emplace_back(argv[i]);
}
std::sort(_env.begin(), _env.end(), [](auto* _lhs, auto* _rhs) {
if(!_lhs) return false;
if(!_rhs) return true;
return std::string_view{ _lhs } < std::string_view{ _rhs };
});
for(auto* itr : _env)
if(itr != nullptr && std::string_view{ itr }.find("OMNITRACE") == 0)
std::cout << itr << "\n";
print_updated_environment(_env);
if(!_argv.empty())
{
+5
Просмотреть файл
@@ -35,6 +35,8 @@ get_realpath(const std::string&);
void
print_command(const std::vector<char*>& _argv);
void print_updated_environment(std::vector<char*>);
std::vector<char*>
get_initial_environment();
@@ -45,5 +47,8 @@ template <typename Tp>
void
update_env(std::vector<char*>&, std::string_view, Tp&&, bool _append = false);
void
remove_env(std::vector<char*>&, std::string_view);
std::vector<char*>
parse_args(int argc, char** argv, std::vector<char*>&);
+182 -6
Просмотреть файл
@@ -4,10 +4,186 @@
.. toctree::
:glob:
:maxdepth: 3
setup
nomenclature
instrumenting
runtime
critical_trace
```
<style>
em { color: Green; }
</style>
## Nomenclature
The list provided below is intended to (A) provide a basic glossary for those who are not familiar with binary instrumentation, etc. and (B)
provide clarification to ambiguities when certain terms have different contextual meanings,
e.g., omnitrace's meaning of the term "module" when instrumenting Python.
- **Binary**
- File written in the Executable and Linkable Format (ELF)
- Standard file format for executable files, shared libraries, etc.
- **Binary Instrumentation**
- Inserting callbacks to instrumentation into an existing binary. This can be performed statically or dynamically
- **Static Binary Instrumentation**
- Loads an existing binary, determines instrumentation points, and generates a new binary with instrumentation directly embedded
- Applicable to executables and libraries but limited to only the functions defined in the binary
- Also known as: **Binary Rewrite**
- **Dynamic Binary Instrumentation**
- Loads an existing binary into memory, inserts instrumentation, executes binary
- Limited to executables but capable of instrumenting linked libraries
- Also known as: **Runtime Instrumentation**
- **Statistical Sampling**
- Also known as (simply) "sampling"
- At periodic intervals, the application is paused and the current call-stack of the CPU is recorded alongside with various other metrics
- Uses timers that measure either (A) real clock time or (B) the CPU time used by the current thread and the CPU time expended on behalf of the thread by the system
- **Sampling Rate**
- The period at which (A) or (B) are triggered (in units of `# interrupts / second`)
- Higher values increase the number of samples
- **Sampling Delay**
- How long to wait before (A) and (B) begin triggering at their designated rate
- **Sampling Duration**
- The time (in realtime) after the start of the application to record samples. Once this time limit has been reached, no more samples will be recorded.
- **Process Sampling**
- At periodic (realtime) intervals, a background thread records global metrics without interrupting the current process. These metrics include, but are not limited to: CPU frequency,
CPU memory high-water mark (i.e. peak memory usage), GPU Temperature, GPU Power usage, etc.
- **Sampling Rate**
- The realtime period for recording metrics (in units of `# measurements / second`)
- Higher values increase the number of samples
- **Sampling Delay**
- How long to wait (in realtime) before recording samples
- **Sampling Duration**
- The time (in realtime) after the start of the application to record samples. Once this time limit has been reached, no more samples will be recorded.
- **Module**
- With respect to binary instrumentation, a module is defined as either the filename (e.g. `foo.c`) or library name (`libfoo.so`) which contains the definition of one or more functions
- With respect to Python instrumentation, a module is defined as the *file* which contains the definition of one or more functions.
- The full path to this file *typically* contains the name of the "Python module"
- **Basic Block**
- Straight-line code sequence with:
- No branches in (except for the entry)
- No branches out (except for the exit)
- **Address Range**
- The instructions for a function in a binary start at certain address with the ELF file and end at a certain address, the range is `end - start`
- The address range is a decent approximation for the "cost" of a function, i.e., a larger address range approx. equates to more instructions
- **Instrumentation Traps**
- On the x86 architecture, because instructions are of variable size, the instruction at a point may be too small for Dyninst to replace it with the normal code sequence used to call instrumentation
- Also, when instrumentation is placed at points other than subroutine entry, exit, or call points, traps may be used to ensure the instrumentation fits
- By default, omnitrace avoids instrumentation which requires using a trap
- **Overlapping functions**
- Due to language constructs or compiler optimizations, it may be possible for multiple functions to overlap (that is, share part of the same function body) or for a single function to have multiple entry points
- In practice, it is impossible to determine the difference between multiple overlapping functions and a single function with multiple entry points
- By default, omnitrace avoids instrumenting overlapping functions
## General Tips
- ***Use `omnitrace-avail` to lookup configuration settings***, hardware counters, and data collection components
- Use `-d` flag for descriptions
- Generate a default configuration with `omnitrace-avail -G ${HOME}/.omnitrace.cfg` and tweak accordingly to the desired default behavior
- ***Decide whether binary instrumentation, statistical sampling, or both*** will provide the desired performance data (for non-Python applications)
- Compile code with optimization enabled (e.g. `-O2` or higher), disable asserts (i.e. `-DNDEBUG`), and include debug info (i.e. `-g1` at a minimum)
- NOTE: compiling with debug info does not slow down the code, it only increases compile time and the size of the binary
- In CMake, this is generally as easy as settings `CMAKE_BUILD_TYPE=RelWithDebInfo` or `CMAKE_BUILD_TYPE=Release` and `CMAKE_<LANG>_FLAGS=-g1`
- Use ***binary instrumentation for characterizing the performance of every invocation of specific functions***
- Use ***statistical sampling to characterize the performance of the entire application while minimizing overhead***
- Enable statistical sampling after binary instrumentation to help "fill in the gaps" between instrumented regions
- Use the user API to create custom regions, enable/disable omnitrace to specific processes, threads, and/or regions
- Dynamic symbol interception, callback APIs, and the user API are always available with binary instrumentation and sampling
- Dynamic symbol interception and callback APIs are (generally) controlled through `OMNITRACE_USE_<API>` options, e.g. `OMNITRACE_USE_KOKKOSP`, `OMNITRACE_USE_OMPT` enable Kokkos-Tools and OpenMP-Tools callbacks, respectively
- When generically seeking regions for performance improvement:
- ***Start off collecting a flat profile***
- Look for functions with high call counts, large cumulative runtimes/values, and/or large standard deviations
- When call-counts are high, improving the performance of this function or "inlining" the function can be quick and easy performance improvements
- When the standard-deviation is high, collect a hierarchical profile and see if the high variation can be attributable to the calling context. In this scenario, consider creating a specialized version for the function for the longer running contexts
- ***Collect a hierarchical profile*** and, keeping the flat-profiling data in mind, verify the functions noted in the flat profile are part of the "critical path" of your application
- E.g. function(s) with high call counts, etc. which are part of a "setup" or "post-processing" phase which does not consume much time relative to the overall time is, generally, a lower priority for optimization
- ***Use the information from the profiles when analyzing detailed traces***
- When using binary instrumentation in the "trace" mode, the ***binary rewrites are preferable to runtime instrumentation***.
- Binary rewrites only instrument the functions defined in the target binary, whereas runtime instrumentation can/will instrument functions defined in the shared libraries which are linked into the target binary
- When using binary instrumentation with MPI, avoid runtime instrumentation
- Runtime instrumentation requires a fork + ptrace: which is generally incompatible with how MPI applications spawn their processes
- Binary rewrite the executable using MPI (and, optionally, libraries used by the executable) and execute the generated instrumented executable instead of the original, e.g. `mpirun -n 2 ./myexe` should be `mpirun -n 2 ./myexe.inst` where `myexe.inst` is the generated instrumented `myexe` executable.
## Data Collection Mode(s)
Omnitrace supports several modes of recording trace and profiling data for your application:
| Mode | Descriptions |
|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Binary Instrumentation | Locates functions (and loops, if desired) in binary and inserts snippets at the entry and exit |
| Statistical Sampling | Periodically pauses application at specified intervals and records various metrics for the given call-stack |
| Callback APIs | Parallelism frameworks such as ROCm, OpenMP, and Kokkos will make callbacks into omnitrace to provide information about the work the API is performing |
| Dynamic Symbol Interception | Wrap function symbols defined in position independent dynamic library/executable, e.g. `pthread_mutex_lock` in libpthread.so or `MPI_Init` in the MPI library |
| User API | User-defined regions and controls for omnitrace |
The two most generic, important modes are binary instrumentation and statistical sampling. It is important to understand the advantages and disadvantages.
Binary instrumentation and statistical sampling can be performed with the `omnitrace` executable but for statistical sampling, it is highly recommended to use the
`omnitrace-sample` executable instead if no binary instrumentation is required/desired. With either tool, the callback APIs and dynamic symbol interception can be
utilized.
### Binary Instrumentation
Binary instrumentation will allow one to deterministically record measurements for every single invocation of a given function.
Binary instrumentation effectively adds instructions to the target application to collect the required information and, thus, has the potential to cause performance changes which may,
in some cases, lead to inaccurate results. The effect depends on what information being collected and which features are activated in omnitrace. For example, collecting only the wall-clock timing data
will have less effect than collected the wall-clock timing, cpu-clock timing, memory usage, cache-misses, and number of instructions executed. Similarly, collecting a flat profile will have
less overhead than a hierarchical profile and collecting a trace OR a profile will have less overhead than collecting a trace AND a profile.
In omnitrace, the primary heuristic for controlling the overhead with binary instrumentation is the minimum number of instructions for selecting functions for instrumentation.
### Statistical Sampling
Statistical call-stack sampling periodically interrupts the application at regular intervals using operating system interrupts.
Sampling is typically less numerically accurate and specific, but allows the target program to run at near full speed.
In constrast to the data derived from binary instrumentation, the resulting data is not exact but, instead, a statistical approximation.
However, sampling often provides a more accurate picture of the application execution because it is less intrusive to the target application and has fewer
side effects on memory caches or instruction decoding pipelines. Furthermore, since sampling does not affect the execution speed as significantly, is it
relatively immune to over-evaluating the cost of small, frequently called functions or "tight" loops.
In omnitrace, the overhead for statistical sampling is a factor of the sampling rate and whether the samples are taken with respect to the CPU time and/or real time.
### Binary Instrumentation vs. Statistical Sampling Example
Consider for the following code:
```cpp
long fib(long n)
{
if(n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
void run(long n)
{
long result = fib(nfib);
printf("[%li] fibonacci(%li) = %li\n", i, nfib, result);
}
int main(int argc, char** argv)
{
long nfib = 30;
long nitr = 10;
if(argc > 1) nfib = atol(argv[1]);
if(argc > 2) nitr = atol(argv[2]);
for(long i = 0; i < nitr; ++i)
run(nfib);
return 0;
}
```
Binary instrumentation of the `fib` function will record ***every single invocation*** of the function -- which for a very small function
such as `fib`, will result in *significant* overhead since this simple function tends to be less than 20 or so instructions, whereas the entry and
exit snippets are ~1024 instructions. Thus, ***we generally want to avoid instrumenting functions where the instrumented function has significantly fewer
instructions than entry + exit instrumentation*** (please note, however, that many of the instructions entry/exit functions are either logging functions or
depend on the runtime settins and thus may never be executed). However, due to the number of potentially executed instructions in the entry/exit snippets,
the default behavior of omnitrace is to only instrument functions which contain fewer than 1024 instructions.
However, recording every single invocation of the function can be extremely useful for detecting anomalies: profiles will show min/max values much smaller/larger
than the average and/or high standard deviation and traces will allow you to identify exactly when and where those instances deviated from the norm.
Consider the level of details in the following traces where, in the top image, every instance of the `fib` function was instrumented vs. the bottom image
where the `fib` call-stack was derived via sampling:
#### Binary Instrumentation of Fibonacci Function
![instrumented-fibonnaci-trace](images/fibonacci-instrumented.png)
#### Statistical Sampling of Fibonacci Function
![sampled-fibonnaci-trace](images/fibonacci-sampling.png)
Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 106 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 408 KiB

+5
Просмотреть файл
@@ -9,7 +9,12 @@
about
features
installation
setup
getting_started
runtime
sampling
instrumenting
critical_trace
output
user_api
python
+6 -2
Просмотреть файл
@@ -1,4 +1,4 @@
# Instrumenting with Omnitrace
# Binary Instrumentation
```eval_rst
.. toctree::
@@ -8,9 +8,13 @@
## omnitrace Executable
> ***NOTE: With the introduction of `omnitrace-sample`, in future versions of omnitrace, the current `omnitrace` executable***
> ***noted below will likely be renamed to `omnitrace-instrument` and a new `omnitrace` executable will serve as a common***
> ***executable for multiple executables, e.g. `omnitrace sample ...`, `omnitrace run ...`, `omnitrace rewrite ...`, etc.***
Instrumentation is performed with the `omnitrace` executable. View the help menu with the `-h` / `--help` option:
```shell
```console
$ omnitrace --help
[omnitrace] Usage: omnitrace [ --help (count: 0, dtype: bool)
--debug (max: 1, dtype: bool)
-51
Просмотреть файл
@@ -1,51 +0,0 @@
# Nomenclature
```eval_rst
.. toctree::
:glob:
:maxdepth: 3
```
The list provided below is intended to (A) provide a basic glossary for those who are not familiar with binary instrumentation and (B) provide clarification to ambiguities when certain terms
have different contextual meanings, e.g., omnitrace's meaning of the term "module" when instrumenting Python.
- **Binary**
- File written in the Executable and Linkable Format (ELF)
- Standard file format for executable files, shared libraries, etc.
- **Binary Instrumentation**
- Inserting callbacks to instrumentation into an existing binary. This can be performed statically or dynamically
- **Static Binary Instrumentation**
- Loads an existing binary, determines instrumentation points, and generates a new binary with instrumentation directly embedded
- Applicable to executables and libraries but limited to only the functions defined in the binary
- Also known as: **Binary Rewrite**
- **Dynamic Binary Instrumentation**
- Loads an existing binary into memory, inserts instrumentation, executes binary
- Limited to executables but capable of instrumenting linked libraries
- Also known as: **Runtime Instrumentation**
- **Sampling**
- At periodic intervals, the application is paused and the current call-stack of the CPU is recorded alongside with various other metrics
- Uses timers that measure either (A) real clock time or (B) the CPU time used by the current thread and the CPU time expended on behalf of the thread by the system
- **Sampling Rate**
- The period at which (A) or (B) are triggered (in units of `# interrupts / second`)
- Higher values increase the number of samples
- **Sampling Delay**
- How long to wait before (A) and (B) begin triggering at their designated rate
- **Module**
- With respect to binary instrumentation, a module is defined as either the filename (e.g. `foo.c`) or library name (`libfoo.so`) which contains the definition of one or more functions
- With respect to Python instrumentation, a module is defined as the _file_ which contains the definition of one or more functions.
- The full path to this file _typically_ contains the name of the "Python module"
- **Basic Block**
- Straight-line code sequence with:
- No branches in (except for the entry)
- No branches out (except for the exit)
- **Address Range**
- The instructions for a function in a binary start at certain address with the ELF file and end at a certain address, the range is `end - start`
- The address range is a decent approximation for the "cost" of a function, i.e., a larger address range approx. equates to more instructions
- **Instrumentation Traps**
- On the x86 architecture, because instructions are of variable size, the instruction at a point may be too small for Dyninst to replace it with the normal code sequence used to call instrumentation
- Also, when instrumentation is placed at points other than subroutine entry, exit, or call points, traps may be used to ensure the instrumentation fits
- By default, omnitrace avoids instrumentation which requires using a trap
- **Overlapping functions**
- Due to language constructs or compiler optimizations, it may be possible for multiple functions to overlap (that is, share part of the same function body) or for a single function to have multiple entry points
- In practice, it is impossible to determine the difference between multiple overlapping functions and a single function with multiple entry points
- By default, omnitrace avoids instrumenting overlapping functions
+1 -1
Просмотреть файл
@@ -1,4 +1,4 @@
# Customizing Omnitrace Runtime
# Configuring Omnitrace Runtime
```eval_rst
.. toctree::
+357
Просмотреть файл
@@ -0,0 +1,357 @@
# Call-Stack Sampling
```eval_rst
.. toctree::
:glob:
:maxdepth: 4
```
> ***NOTE: Set `OMNITRACE_USE_SAMPLING=ON` to activate call-stack sampling when executing an instrumented binary***
Call-stack sampling can be activated with either a binary instrumented via the `omnitrace` executable or via the `omnitrace-sample` executable.
***Effectively***, all of the commands below are equivalent:
- Binary rewrite with only instrumentation necessary to start/stop sampling
```console
omnitrace -M sampling -o foo.inst -- foo
./foo.inst
```
- Runtime instrumentation with only instrumentation necessary to start/stop sampling
```console
omnitrace -M sampling -- foo
```
- No instrumentation required
```console
omnitrace-sample -- foo
```
All `omnitrace -M sampling` (referred to as "instrumented-sampling" henceforth) does is wrap the `main` of the executable with initialization
before `main` starts and finalization after `main` ends.
This can be easily accomplished without instrumentation via a `LD_PRELOAD` of a library with containing a dynamic symbol wrapper around `__libc_start_main`.
Thus, whenever binary instrumentation is unnecessary, using `omnitrace-sample` is recommended over `omnitrace -M sampling` for several reasons:
1. `omnitrace-sample` provides command-line options for controlling features of omnitrace instead of *requiring* configuration files or environment variables
2. Despite the fact that instrumented-sampling only requires inserting snippets around one function (`main`), Dyninst
does not have a feature for specifying that parsing and processing all the other symbols in the binary is unnecessary,
thus, in the best case scenario, instrumented-sampling has a slightly slower launch time when the target binary is relatively small
but, in the worst case scenarios, requires a significant amount of time and memory to launch
3. `omnitrace-sample` is fully compatible with MPI, e.g. `mpirun -n 2 omnitrace-sample -- foo`, whereas `mpirun -n 2 omnitrace -M sampling -- foo`
is incompatible with some MPI distributions (particularly OpenMPI) because of MPI restrictions against forking within an MPI rank
- If you recall, when MPI and binary instrumentation is involved, two steps are involed: (1) do a binary rewrite of the executable
and (2) use the instrumented executable in leiu of the original executable. `omnitrace-sample` is thus much easier to use with MPI.
## omnitrace-sample Executable
View the help menu of `omnitrace-sample` with the `-h` / `--help` option:
```console
$ omnitrace-sample --help
[omnitrace-sample] Usage: omnitrace-sample [ --help (count: 0, dtype: bool)
--monochrome (max: 1, dtype: bool)
--debug (max: 1, dtype: bool)
--verbose (count: 1)
--config (min: 0, dtype: filepath)
--output (min: 1)
--trace (max: 1, dtype: bool)
--profile (max: 1, dtype: bool)
--flat-profile (max: 1, dtype: bool)
--host (max: 1, dtype: bool)
--device (max: 1, dtype: bool)
--trace-file (count: 1, dtype: filepath)
--trace-buffer-size (count: 1, dtype: KB)
--trace-fill-policy (count: 1)
--profile-format (min: 1)
--profile-diff (min: 1)
--process-freq (count: 1)
--process-wait (count: 1)
--process-duration (count: 1)
--cpus (count: unlimited, dtype: int or range)
--gpus (count: unlimited, dtype: int or range)
--freq (count: 1)
--wait (count: 1)
--duration (count: 1)
--tids (min: 1)
--cputime (min: 0)
--realtime (min: 0)
--include (count: unlimited)
--exclude (count: unlimited)
--cpu-events (count: unlimited)
--gpu-events (count: unlimited)
--inlines (max: 1, dtype: bool)
--hsa-interrupt (count: 1, dtype: int)
]
Options:
-h, -?, --help Shows this page
[DEBUG OPTIONS]
--monochrome Disable colorized output
--debug Debug output
-v, --verbose Verbose output
[GENERAL OPTIONS]
-c, --config Configuration file
-o, --output Output path. Accepts 1-2 parameters corresponding to the output path and the output prefix
-T, --trace Generate a detailed trace (perfetto output)
-P, --profile Generate a call-stack-based profile (conflicts with --flat-profile)
-F, --flat-profile Generate a flat profile (conflicts with --profile)
-H, --host Enable sampling host-based metrics for the process. E.g. CPU frequency, memory usage, etc.
-D, --device Enable sampling device-based metrics for the process. E.g. GPU temperature, memory usage, etc.
[TRACING OPTIONS]
--trace-file Specify the trace output filename. Relative filepath will be with respect to output path and output prefix.
--trace-buffer-size Size limit for the trace output (in KB)
--trace-fill-policy [ discard | ring_buffer ]
Policy for new data when the buffer size limit is reached:
- discard : new data is ignored
- ring_buffer : new data overwrites oldest data
[PROFILE OPTIONS]
--profile-format [ console | json | text ]
Data formats for profiling results
--profile-diff Generate a diff output b/t the profile collected and an existing profile from another run Accepts 1-2 parameters
corresponding to the input path and the input prefix
[HOST/DEVICE (PROCESS SAMPLING) OPTIONS]
--process-freq Set the default host/device sampling frequency (number of interrupts per second)
--process-wait Set the default wait time (i.e. delay) before taking first host/device sample (in seconds of realtime)
--process-duration Set the duration of the host/device sampling (in seconds of realtime)
--cpus CPU IDs for frequency sampling. Supports integers and/or ranges
--gpus GPU IDs for SMI queries. Supports integers and/or ranges
[GENERAL SAMPLING OPTIONS]
-f, --freq Set the default sampling frequency (number of interrupts per second)
-w, --wait Set the default wait time (i.e. delay) before taking first sample (in seconds). This delay time is based on the clock
of the sampler, i.e., a delay of 1 second for CPU-clock sampler may not equal 1 second of realtime
-d, --duration Set the duration of the sampling (in seconds of realtime). I.e., it is possible (currently) to set a CPU-clock time
delay that exceeds the real-time duration... resulting in zero samples being taken
-t, --tids Specify the default thread IDs for sampling, where 0 (zero) is the main thread and each thread created by the target
application is assigned an atomically incrementing value.
[SAMPLING TIMER OPTIONS]
--cputime Sample based on a CPU-clock timer (default). Accepts zero or more arguments:
0. Enables sampling based on CPU-clock timer.
1. Interrupts per second. E.g., 100 == sample every 10 milliseconds of CPU-time.
2. Delay (in seconds of CPU-clock time). I.e., how long each thread should wait before taking first sample.
3+ Thread IDs to target for sampling, starting at 0 (the main thread).
May be specified as index or range, e.g., '0 2-4' will be interpreted as:
sample the main thread (0), do not sample the first child thread but sample the 2nd, 3rd, and 4th child threads
--realtime Sample based on a real-clock timer. Accepts zero or more arguments:
0. Enables sampling based on real-clock timer.
1. Interrupts per second. E.g., 100 == sample every 10 milliseconds of realtime.
2. Delay (in seconds of real-clock time). I.e., how long each thread should wait before taking first sample.
3+ Thread IDs to target for sampling, starting at 0 (the main thread).
May be specified as index or range, e.g., '0 2-4' will be interpreted as:
sample the main thread (0), do not sample the first child thread but sample the 2nd, 3rd, and 4th child threads
When sampling with a real-clock timer, please note that enabling this will cause threads which are typically "idle"
to consume more resources since, while idle, the real-clock time increases (and therefore triggers taking samples)
whereas the CPU-clock time does not.
[BACKEND OPTIONS] (These options control region information captured w/o sampling or instrumentation)
-I, --include [ all | kokkosp | mpip | mutex-locks | ompt | rcclp | rocm-smi | rocprofiler | roctracer | roctx | rw-locks | spin-locks ]
Include data from these backends
-E, --exclude [ all | kokkosp | mpip | mutex-locks | ompt | rcclp | rocm-smi | rocprofiler | roctracer | roctx | rw-locks | spin-locks ]
Exclude data from these backends
[HARDWARE COUNTER OPTIONS]
-C, --cpu-events Set the CPU hardware counter events to record (ref: `omnitrace-avail -H -c CPU`)
-G, --gpu-events Set the GPU hardware counter events to record (ref: `omnitrace-avail -H -c GPU`)
[MISCELLANEOUS OPTIONS]
-i, --inlines Include inline info in output when available
--hsa-interrupt [ 0 | 1 ] Set the value of the HSA_ENABLE_INTERRUPT environment variable.
ROCm version 5.2 and older have a bug which will cause a deadlock if a sample is taken while waiting for the signal
that a kernel completed -- which happens when sampling with a real-clock timer. We require this option to be set to
when --realtime is specified to make users aware that, while this may fix the bug, it can have a negative impact on
performance.
Values:
0 avoid triggering the bug, potentially at the cost of reduced performance
1 do not modify how ROCm is notified about kernel completion
```
The general syntax for separating omnitrace command line arguments from the application arguments follows the
is consistent with the LLVM style of using a standalone double-hyphen (`--`). All arguments preceding the double-hyphen
are interpreted as belonging to omnitrace and all arguments following the double-hyphen are interpreted as the
application and it's arguments. The double-hyphen is only necessary when passing command line arguments to the target
which also use hyphens. E.g. `omnitrace-sample ls` works but, in order to run `ls -la`, use `omnitrace-sample -- ls -la`.
[Configuring Omnitrace Runtime](runtime.md) establish the precedence of environment variable values over values specified in the configuration files. This enables
the user to configure the omnitrace runtime to their preferred default behavior in a file such as `~/.omnitrace.cfg` and then easily override
those settings via something like `OMNITRACE_ENABLED=OFF omnitrace-sample -- foo`.
Similarly, the command line arguments passed to `omnitrace-sample` take precedence over environment variables.
All of the command-line options above correlate to one or more configuration settings, e.g. `--cpu-events` correlates to the `OMNITRACE_PAPI_EVENTS` configuration variable.
After the command-line arguments to `omnitrace-sample` have been processed but before the target application is executed, `omnitrace-sample` will emit a log
for which environment variables where set and/or modified:
The snippet below shows the environment updates when `omnitrace-sample` is invoked with no arguments
```console
$ omnitrace-sample -- ./parallel-overhead-locks 30 4 100
HSA_TOOLS_LIB=/opt/omnitrace/lib/libomnitrace-dl.so.1.7.1
HSA_TOOLS_REPORT_LOAD_FAILURE=1
LD_PRELOAD=/opt/omnitrace/lib/libomnitrace-dl.so.1.7.1
OMNITRACE_CRITICAL_TRACE=false
OMNITRACE_USE_PROCESS_SAMPLING=false
OMNITRACE_USE_SAMPLING=true
OMP_TOOL_LIBRARIES=/opt/omnitrace/lib/libomnitrace-dl.so.1.7.1
ROCP_TOOL_LIB=/opt/omnitrace/lib/libomnitrace.so.1.7.1
...
```
The snippet below shows the environment updates when `omnitrace-sample` enables profiling, tracing, host process-sampling, device process-sampling, and all the available backends:
```console
$ omnitrace-sample -PTDH -I all -- ./parallel-overhead-locks 30 4 100
HSA_TOOLS_LIB=/opt/omnitrace/lib/libomnitrace-dl.so.1.7.1
HSA_TOOLS_REPORT_LOAD_FAILURE=1
KOKKOS_PROFILE_LIBRARY=/opt/omnitrace/lib/libomnitrace.so.1.7.1
LD_PRELOAD=/opt/omnitrace/lib/libomnitrace-dl.so.1.7.1
OMNITRACE_CPU_FREQ_ENABLED=true
OMNITRACE_CRITICAL_TRACE=false
OMNITRACE_TRACE_THREAD_LOCKS=true
OMNITRACE_TRACE_THREAD_RW_LOCKS=true
OMNITRACE_TRACE_THREAD_SPIN_LOCKS=true
OMNITRACE_USE_KOKKOSP=true
OMNITRACE_USE_MPIP=true
OMNITRACE_USE_OMPT=true
OMNITRACE_USE_PERFETTO=true
OMNITRACE_USE_PROCESS_SAMPLING=true
OMNITRACE_USE_RCCLP=true
OMNITRACE_USE_ROCM_SMI=true
OMNITRACE_USE_ROCPROFILER=true
OMNITRACE_USE_ROCTRACER=true
OMNITRACE_USE_ROCTX=true
OMNITRACE_USE_SAMPLING=true
OMNITRACE_USE_TIMEMORY=true
OMP_TOOL_LIBRARIES=/opt/omnitrace/lib/libomnitrace-dl.so.1.7.1
ROCP_TOOL_LIB=/opt/omnitrace/lib/libomnitrace.so.1.7.1
...
```
The snippet below shows the environment updates when `omnitrace-sample` enables profiling, tracing, host process-sampling, device process-sampling,
sets the output path to `omnitrace-output`, the output prefix to `%tag%` and disables all the available backends:
```console
$ omnitrace-sample -PTDH -E all -o omnitrace-output %tag% -- ./parallel-overhead-locks 30 4 100
LD_PRELOAD=/opt/omnitrace/lib/libomnitrace-dl.so.1.7.1
OMNITRACE_CPU_FREQ_ENABLED=true
OMNITRACE_CRITICAL_TRACE=false
OMNITRACE_OUTPUT_PATH=omnitrace-output
OMNITRACE_OUTPUT_PREFIX=%tag%
OMNITRACE_TRACE_THREAD_LOCKS=false
OMNITRACE_TRACE_THREAD_RW_LOCKS=false
OMNITRACE_TRACE_THREAD_SPIN_LOCKS=false
OMNITRACE_USE_KOKKOSP=false
OMNITRACE_USE_MPIP=false
OMNITRACE_USE_OMPT=false
OMNITRACE_USE_PERFETTO=true
OMNITRACE_USE_PROCESS_SAMPLING=true
OMNITRACE_USE_RCCLP=false
OMNITRACE_USE_ROCM_SMI=false
OMNITRACE_USE_ROCPROFILER=false
OMNITRACE_USE_ROCTRACER=false
OMNITRACE_USE_ROCTX=false
OMNITRACE_USE_SAMPLING=true
OMNITRACE_USE_TIMEMORY=true
...
```
## omnitrace-sample Example
```console
$ omnitrace-sample -PTDH -E all -o omnitrace-output %tag% -c -- ./parallel-overhead-locks 30 4 100
LD_PRELOAD=/opt/omnitrace/lib/libomnitrace-dl.so.1.7.1
OMNITRACE_CONFIG_FILE=
OMNITRACE_CPU_FREQ_ENABLED=true
OMNITRACE_CRITICAL_TRACE=false
OMNITRACE_OUTPUT_PATH=omnitrace-output
OMNITRACE_OUTPUT_PREFIX=%tag%
OMNITRACE_TRACE_THREAD_LOCKS=false
OMNITRACE_TRACE_THREAD_RW_LOCKS=false
OMNITRACE_TRACE_THREAD_SPIN_LOCKS=false
OMNITRACE_USE_KOKKOSP=false
OMNITRACE_USE_MPIP=false
OMNITRACE_USE_OMPT=false
OMNITRACE_USE_PERFETTO=true
OMNITRACE_USE_PROCESS_SAMPLING=true
OMNITRACE_USE_RCCLP=false
OMNITRACE_USE_ROCM_SMI=false
OMNITRACE_USE_ROCPROFILER=false
OMNITRACE_USE_ROCTRACER=false
OMNITRACE_USE_ROCTX=false
OMNITRACE_USE_SAMPLING=true
OMNITRACE_USE_TIMEMORY=true
[omnitrace][omnitrace_init_tooling] Instrumentation mode: Sampling
______ .___ ___. .__ __. __ .___________..______ ___ ______ _______
/ __ \ | \/ | | \ | | | | | || _ \ / \ / || ____|
| | | | | \ / | | \| | | | `---| |----`| |_) | / ^ \ | ,----'| |__
| | | | | |\/| | | . ` | | | | | | / / /_\ \ | | | __|
| `--' | | | | | | |\ | | | | | | |\ \----./ _____ \ | `----.| |____
\______/ |__| |__| |__| \__| |__| |__| | _| `._____/__/ \__\ \______||_______|
[759.689] perfetto.cc:55903 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024000 KB, total sessions:1, uid:0 session name: ""
[parallel-overhead-locks] Threads: 4
[parallel-overhead-locks] Iterations: 100
[parallel-overhead-locks] fibonacci(30)...
[1] number of iterations: 100
[2] number of iterations: 100
[3] number of iterations: 100
[4] number of iterations: 100
[parallel-overhead-locks] fibonacci(30) x 4 = 394644873
[parallel-overhead-locks] number of mutex locks = 400
[omnitrace][107157][0][omnitrace_finalize]
[omnitrace][107157][0][omnitrace_finalize] finalizing...
[omnitrace][107157][0][omnitrace_finalize]
[omnitrace][107157][0][omnitrace_finalize] omnitrace/process/107157 : 0.610427 sec wall_clock, 2.248 MB peak_rss, 2.265 MB page_rss, 2.560000 sec cpu_clock, 419.4 % cpu_util [laps: 1]
[omnitrace][107157][0][omnitrace_finalize] omnitrace/process/107157/thread/0 : 0.608866 sec wall_clock, 0.000677 sec thread_cpu_clock, 0.1 % thread_cpu_util, 2.248 MB peak_rss [laps: 1]
[omnitrace][107157][0][omnitrace_finalize] omnitrace/process/107157/thread/1 : 0.608237 sec wall_clock, 0.603553 sec thread_cpu_clock, 99.2 % thread_cpu_util, 2.204 MB peak_rss [laps: 1]
[omnitrace][107157][0][omnitrace_finalize] omnitrace/process/107157/thread/2 : 0.601430 sec wall_clock, 0.598378 sec thread_cpu_clock, 99.5 % thread_cpu_util, 1.156 MB peak_rss [laps: 1]
[omnitrace][107157][0][omnitrace_finalize] omnitrace/process/107157/thread/3 : 0.570223 sec wall_clock, 0.568713 sec thread_cpu_clock, 99.7 % thread_cpu_util, 0.772 MB peak_rss [laps: 1]
[omnitrace][107157][0][omnitrace_finalize] omnitrace/process/107157/thread/4 : 0.557637 sec wall_clock, 0.557198 sec thread_cpu_clock, 99.9 % thread_cpu_util, 0.156 MB peak_rss [laps: 1]
[omnitrace][107157][0][omnitrace_finalize]
[omnitrace][107157][0][omnitrace_finalize] Finalizing perfetto...
[omnitrace][107157][perfetto]> Outputting '/home/user/data/omnitrace-output/2022-10-19_02.46/parallel-overhead-locksperfetto-trace-107157.proto' (842.90 KB / 0.84 MB / 0.00 GB)... Done
[omnitrace][107157][trip_count]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockstrip_count-107157.json'
[omnitrace][107157][trip_count]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockstrip_count-107157.txt'
[omnitrace][107157][sampling_percent]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockssampling_percent-107157.json'
[omnitrace][107157][sampling_percent]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockssampling_percent-107157.txt'
[omnitrace][107157][sampling_cpu_clock]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockssampling_cpu_clock-107157.json'
[omnitrace][107157][sampling_cpu_clock]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockssampling_cpu_clock-107157.txt'
[omnitrace][107157][sampling_wall_clock]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockssampling_wall_clock-107157.json'
[omnitrace][107157][sampling_wall_clock]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockssampling_wall_clock-107157.txt'
[omnitrace][107157][wall_clock]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockswall_clock-107157.json'
[omnitrace][107157][wall_clock]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-lockswall_clock-107157.txt'
[omnitrace][107157][metadata]> Outputting 'omnitrace-output/2022-10-19_02.46/parallel-overhead-locksmetadata-107157.json' and 'omnitrace-output/2022-10-19_02.46/parallel-overhead-locksfunctions-107157.json'
[omnitrace][107157][0][omnitrace_finalize] Finalized
[761.584] perfetto.cc:57382 Tracing session 1 ended, total sessions:0
```
+9 -3
Просмотреть файл
@@ -1,4 +1,4 @@
# Setup
# Setup and Validation
```eval_rst
.. toctree::
@@ -8,13 +8,13 @@
## Configuring Environment
Source the `setup-env.sh` script to prefix the `PATH`, `LD_LIBRARY_PATH`, etc. environment variables:
Once omnitrace is installed, source the `setup-env.sh` script to prefix the `PATH`, `LD_LIBRARY_PATH`, etc. environment variables:
```bash
source /opt/omnitrace/share/omnitrace/setup-env.sh
```
Alternatively, if environment modules are supported, add the `<prefix>/share/modulefiles` directory to `MODULEPATH` via:
Alternatively, if environment modules are supported, add the `<prefix>/share/modulefiles` directory to `MODULEPATH`:
```bash
module use /opt/omnitrace/share/modulefiles
@@ -38,6 +38,12 @@ If all the following commands execute successfully with output, then you are rea
```bash
which omnitrace
which omnitrace-avail
which omnitrace-sample
omnitrace --help
omnitrace-avail --all
omnitrace-sample --help
# if built with python support
which omnitrace-python
omnitrace-python --help
```
+1 -1
Просмотреть файл
@@ -580,7 +580,7 @@ omnitrace_finalize_hidden(void)
return;
}
OMNITRACE_VERBOSE_F(0, "\n");
if(get_verbose() >= 0 || get_debug()) fprintf(stderr, "\n");
OMNITRACE_VERBOSE_F(0, "finalizing...\n");
sampling::block_samples();