Update documentation (#53)

- updated info about OMNITRACE_USE_MPI
- removed wiki links
- info about metadata.json
- update HW counters and fix typos
- fix update-docs.sh

[ROCm/rocprofiler-systems commit: bab90baf0b]
This commit is contained in:
Jonathan R. Madsen
2022-05-08 02:51:35 -05:00
zatwierdzone przez GitHub
rodzic 060da8159c
commit 0094a471fd
5 zmienionych plików z 571 dodań i 187 usunięć
@@ -149,9 +149,9 @@ source ${OMNITRACE_ROOT}/share/omnitrace/setup-env.sh
#### MPI Support within Omnitrace
[Omnitrace](https://github.com/AMDResearch/omnitrace) can have full (`OMNITRACE_USE_MPI=ON`) or partial (`OMNITRACE_USE_MPI_HEADERS=ON`) MPI support.
The only difference between these two modes is whether or not the results collected via timemory can be aggregated into one output file. The primary
benefits of partial or full MPI support are the automatic wrapping of MPI functions and the ability to label output with suffixes which correspond to the
`MPI_COMM_WORLD` rank ID instead of using the system process identifier (i.e. PID).
The only difference between these two modes is whether or not the results collected via timemory and/or perfetto can be aggregated into a single
output file during finalization. The primary benefits of partial or full MPI support are the automatic wrapping of MPI functions and the ability
to label output with suffixes which correspond to the `MPI_COMM_WORLD` rank ID instead of using the system process identifier (i.e. PID).
In general, it is recommended to use partial MPI support with the OpenMPI headers as this is the most portable configuration.
If full MPI support is selected, make sure your target application is built against the same MPI distribution as omnitrace,
i.e. do not build omnitrace with MPICH and use it on a target application built against OpenMPI.
@@ -49,9 +49,3 @@ have different contextual meanings, e.g., omnitrace's meaning of the term "modul
- Due to language constructs or compiler optimizations, it may be possible for multiple functions to overlap (that is, share part of the same function body) or for a single function to have multiple entry points
- In practice, it is impossible to determine the difference between multiple overlapping functions and a single function with multiple entry points
- By default, omnitrace avoids instrumenting overlapping functions
## Additional Notes
The ["Data granularity in profiler types"](https://en.wikipedia.org/wiki/Profiling_(computer_programming)#Data_granularity_in_profiler_types) section of
the Wikipedia ["Profiling (computer programming)"](https://en.wikipedia.org/wiki/Profiling_(computer_programming)) page may be a useful reference in understanding
the different profiling modes and their trade-offs.
@@ -55,7 +55,114 @@ $ omnitrace -- ./foo
## Metadata
[Omnitrace](https://github.com/AMDResearch/omnitrace) will output a metadata.json file.
[Omnitrace](https://github.com/AMDResearch/omnitrace) will output a metadata.json file. This metadata file will contain
information about the settings, environment variables, output files, and info about the system and the run:
```json
{
"omnitrace": {
"metadata": {
"info": {
"HW_L1_CACHE_SIZE": 32768,
"HW_L2_CACHE_SIZE": 524288,
"SHELL": "/bin/bash",
"HW_PHYSICAL_CPU": 12,
"CPU_FEATURES": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es",
"HW_CONCURRENCY": 24,
"LAUNCH_TIME": "02:04",
"CPU_MODEL": "AMD Ryzen Threadripper PRO 3945WX 12-Cores",
"TIMEMORY_GIT_REVISION": "52e7034fd419ff296506cdef43084f6071dbaba1",
"TIMEMORY_VERSION": "3.3.0rc4",
"CPU_FREQUENCY": 2400,
"TIMEMORY_API": "tim::project::timemory",
"PWD": "/home/jrmadsen/devel/c++/AARInternal/hosttrace-dyninst/build-vscode",
"HW_L3_CACHE_SIZE": 16777216,
"USER": "jrmadsen",
"HOME": "/home/jrmadsen",
"TIMEMORY_GIT_DESCRIBE": "v3.2.0-263-g52e7034f",
"LAUNCH_DATE": "05/08/22",
"CPU_VENDOR": "AuthenticAMD"
},
"output": {
"text": [
{
"value": [
"omnitrace-tests-output/parallel-overhead-binary-rewrite/roctracer.txt"
],
"key": "roctracer"
},
{
"value": [
"omnitrace-tests-output/parallel-overhead-binary-rewrite/wall_clock.txt"
],
"key": "wall_clock"
}
],
"json": [
{
"value": [
"omnitrace-tests-output/parallel-overhead-binary-rewrite/roctracer.json",
"omnitrace-tests-output/parallel-overhead-binary-rewrite/roctracer.tree.json"
],
"key": "roctracer"
},
{
"value": [
"omnitrace-tests-output/parallel-overhead-binary-rewrite/wall_clock.json",
"omnitrace-tests-output/parallel-overhead-binary-rewrite/wall_clock.tree.json"
],
"key": "wall_clock"
}
]
},
"environment": [
{
"value": "/home/jrmadsen",
"key": "HOME"
},
{
"value": "/bin/bash",
"key": "SHELL"
},
{
"value": "jrmadsen",
"key": "USER"
},
{
"value": "true",
"key": "... etc. ..."
}
],
"settings": {
"OMNITRACE_JSON_OUTPUT": {
"count": -1,
"environ_updated": false,
"name": "json_output",
"data_type": "bool",
"initial": true,
"enabled": true,
"value": true,
"max_count": 1,
"cmdline": [
"--omnitrace-json-output"
],
"environ": "OMNITRACE_JSON_OUTPUT",
"config_updated": false,
"categories": [
"io",
"json",
"native"
],
"description": "Write json output files"
},
"... etc. ...": {
"etc.": true
}
}
}
}
}
```
## Configuring Output
@@ -227,120 +227,401 @@ $ omnitrace-avail -C -bd
```console
$ omnitrace-avail -H -bd
|---------------------|-------------------------------------------------|
| HARDWARE COUNTER | DESCRIPTION |
|---------------------|-------------------------------------------------|
| CPU | |
|---------------------|-------------------------------------------------|
| PAPI_L1_DCM | Level 1 data cache misses |
| PAPI_L1_ICM | Level 1 instruction cache misses |
| PAPI_L2_DCM | Level 2 data cache misses |
| PAPI_L2_ICM | Level 2 instruction cache misses |
| PAPI_L3_DCM | Level 3 data cache misses |
| PAPI_L3_ICM | Level 3 instruction cache misses |
| PAPI_L1_TCM | Level 1 cache misses |
| PAPI_L2_TCM | Level 2 cache misses |
| PAPI_L3_TCM | Level 3 cache misses |
| PAPI_CA_SNP | Requests for a snoop |
| PAPI_CA_SHR | Requests for exclusive access to shared cach... |
| PAPI_CA_CLN | Requests for exclusive access to clean cache... |
| PAPI_CA_INV | Requests for cache line invalidation |
| PAPI_CA_ITV | Requests for cache line intervention |
| PAPI_L3_LDM | Level 3 load misses |
| PAPI_L3_STM | Level 3 store misses |
| PAPI_BRU_IDL | Cycles branch units are idle |
| PAPI_FXU_IDL | Cycles integer units are idle |
| PAPI_FPU_IDL | Cycles floating point units are idle |
| PAPI_LSU_IDL | Cycles load/store units are idle |
| PAPI_TLB_DM | Data translation lookaside buffer misses |
| PAPI_TLB_IM | Instruction translation lookaside buffer misses |
| PAPI_TLB_TL | Total translation lookaside buffer misses |
| PAPI_L1_LDM | Level 1 load misses |
| PAPI_L1_STM | Level 1 store misses |
| PAPI_L2_LDM | Level 2 load misses |
| PAPI_L2_STM | Level 2 store misses |
| PAPI_BTAC_M | Branch target address cache misses |
| PAPI_PRF_DM | Data prefetch cache misses |
| PAPI_L3_DCH | Level 3 data cache hits |
| PAPI_TLB_SD | Translation lookaside buffer shootdowns |
| PAPI_CSR_FAL | Failed store conditional instructions |
| PAPI_CSR_SUC | Successful store conditional instructions |
| PAPI_CSR_TOT | Total store conditional instructions |
| PAPI_MEM_SCY | Cycles Stalled Waiting for memory accesses |
| PAPI_MEM_RCY | Cycles Stalled Waiting for memory reads |
| PAPI_MEM_WCY | Cycles Stalled Waiting for memory writes |
| PAPI_STL_ICY | Cycles with no instruction issue |
| PAPI_FUL_ICY | Cycles with maximum instruction issue |
| PAPI_STL_CCY | Cycles with no instructions completed |
| PAPI_FUL_CCY | Cycles with maximum instructions completed |
| PAPI_HW_INT | Hardware interrupts |
| PAPI_BR_UCN | Unconditional branch instructions |
| PAPI_BR_CN | Conditional branch instructions |
| PAPI_BR_TKN | Conditional branch instructions taken |
| PAPI_BR_NTK | Conditional branch instructions not taken |
| PAPI_BR_MSP | Conditional branch instructions mispredicted |
| PAPI_BR_PRC | Conditional branch instructions correctly pr... |
| PAPI_FMA_INS | FMA instructions completed |
| PAPI_TOT_IIS | Instructions issued |
| PAPI_TOT_INS | Instructions completed |
| PAPI_INT_INS | Integer instructions |
| PAPI_FP_INS | Floating point instructions |
| PAPI_LD_INS | Load instructions |
| PAPI_SR_INS | Store instructions |
| PAPI_BR_INS | Branch instructions |
| PAPI_VEC_INS | Vector/SIMD instructions (could include inte... |
| PAPI_RES_STL | Cycles stalled on any resource |
| PAPI_FP_STAL | Cycles the FP unit(s) are stalled |
| PAPI_TOT_CYC | Total cycles |
| PAPI_LST_INS | Load/store instructions completed |
| PAPI_SYC_INS | Synchronization instructions completed |
| PAPI_L1_DCH | Level 1 data cache hits |
| PAPI_L2_DCH | Level 2 data cache hits |
| PAPI_L1_DCA | Level 1 data cache accesses |
| PAPI_L2_DCA | Level 2 data cache accesses |
| PAPI_L3_DCA | Level 3 data cache accesses |
| PAPI_L1_DCR | Level 1 data cache reads |
| PAPI_L2_DCR | Level 2 data cache reads |
| PAPI_L3_DCR | Level 3 data cache reads |
| PAPI_L1_DCW | Level 1 data cache writes |
| PAPI_L2_DCW | Level 2 data cache writes |
| PAPI_L3_DCW | Level 3 data cache writes |
| PAPI_L1_ICH | Level 1 instruction cache hits |
| PAPI_L2_ICH | Level 2 instruction cache hits |
| PAPI_L3_ICH | Level 3 instruction cache hits |
| PAPI_L1_ICA | Level 1 instruction cache accesses |
| PAPI_L2_ICA | Level 2 instruction cache accesses |
| PAPI_L3_ICA | Level 3 instruction cache accesses |
| PAPI_L1_ICR | Level 1 instruction cache reads |
| PAPI_L2_ICR | Level 2 instruction cache reads |
| PAPI_L3_ICR | Level 3 instruction cache reads |
| PAPI_L1_ICW | Level 1 instruction cache writes |
| PAPI_L2_ICW | Level 2 instruction cache writes |
| PAPI_L3_ICW | Level 3 instruction cache writes |
| PAPI_L1_TCH | Level 1 total cache hits |
| PAPI_L2_TCH | Level 2 total cache hits |
| PAPI_L3_TCH | Level 3 total cache hits |
| PAPI_L1_TCA | Level 1 total cache accesses |
| PAPI_L2_TCA | Level 2 total cache accesses |
| PAPI_L3_TCA | Level 3 total cache accesses |
| PAPI_L1_TCR | Level 1 total cache reads |
| PAPI_L2_TCR | Level 2 total cache reads |
| PAPI_L3_TCR | Level 3 total cache reads |
| PAPI_L1_TCW | Level 1 total cache writes |
| PAPI_L2_TCW | Level 2 total cache writes |
| PAPI_L3_TCW | Level 3 total cache writes |
| PAPI_FML_INS | Floating point multiply instructions |
| PAPI_FAD_INS | Floating point add instructions |
| PAPI_FDV_INS | Floating point divide instructions |
| PAPI_FSQ_INS | Floating point square root instructions |
| PAPI_FNV_INS | Floating point inverse instructions |
| PAPI_FP_OPS | Floating point operations |
| PAPI_SP_OPS | Floating point operations; optimized to coun... |
| PAPI_DP_OPS | Floating point operations; optimized to coun... |
| PAPI_VEC_SP | Single precision vector/SIMD instructions |
| PAPI_VEC_DP | Double precision vector/SIMD instructions |
| PAPI_REF_CYC | Reference clock cycles |
|---------------------|-------------------------------------------------|
|---------------------------------------|---------------------------------------|
| HARDWARE COUNTER | DESCRIPTION |
|---------------------------------------|---------------------------------------|
| CPU | |
|---------------------------------------|---------------------------------------|
| PAPI_L1_DCM | Level 1 data cache misses |
| PAPI_L1_ICM | Level 1 instruction cache misses |
| PAPI_L2_DCM | Level 2 data cache misses |
| PAPI_L2_ICM | Level 2 instruction cache misses |
| PAPI_L3_DCM | Level 3 data cache misses |
| PAPI_L3_ICM | Level 3 instruction cache misses |
| PAPI_L1_TCM | Level 1 cache misses |
| PAPI_L2_TCM | Level 2 cache misses |
| PAPI_L3_TCM | Level 3 cache misses |
| PAPI_CA_SNP | Requests for a snoop |
| PAPI_CA_SHR | Requests for exclusive access to s... |
| PAPI_CA_CLN | Requests for exclusive access to c... |
| PAPI_CA_INV | Requests for cache line invalidation |
| PAPI_CA_ITV | Requests for cache line intervention |
| PAPI_L3_LDM | Level 3 load misses |
| PAPI_L3_STM | Level 3 store misses |
| PAPI_BRU_IDL | Cycles branch units are idle |
| PAPI_FXU_IDL | Cycles integer units are idle |
| PAPI_FPU_IDL | Cycles floating point units are idle |
| PAPI_LSU_IDL | Cycles load/store units are idle |
| PAPI_TLB_DM | Data translation lookaside buffer ... |
| PAPI_TLB_IM | Instruction translation lookaside ... |
| PAPI_TLB_TL | Total translation lookaside buffer... |
| PAPI_L1_LDM | Level 1 load misses |
| PAPI_L1_STM | Level 1 store misses |
| PAPI_L2_LDM | Level 2 load misses |
| PAPI_L2_STM | Level 2 store misses |
| PAPI_BTAC_M | Branch target address cache misses |
| PAPI_PRF_DM | Data prefetch cache misses |
| PAPI_L3_DCH | Level 3 data cache hits |
| PAPI_TLB_SD | Translation lookaside buffer shoot... |
| PAPI_CSR_FAL | Failed store conditional instructions |
| PAPI_CSR_SUC | Successful store conditional instr... |
| PAPI_CSR_TOT | Total store conditional instructions |
| PAPI_MEM_SCY | Cycles Stalled Waiting for memory ... |
| PAPI_MEM_RCY | Cycles Stalled Waiting for memory ... |
| PAPI_MEM_WCY | Cycles Stalled Waiting for memory ... |
| PAPI_STL_ICY | Cycles with no instruction issue |
| PAPI_FUL_ICY | Cycles with maximum instruction issue |
| PAPI_STL_CCY | Cycles with no instructions completed |
| PAPI_FUL_CCY | Cycles with maximum instructions c... |
| PAPI_HW_INT | Hardware interrupts |
| PAPI_BR_UCN | Unconditional branch instructions |
| PAPI_BR_CN | Conditional branch instructions |
| PAPI_BR_TKN | Conditional branch instructions taken |
| PAPI_BR_NTK | Conditional branch instructions no... |
| PAPI_BR_MSP | Conditional branch instructions mi... |
| PAPI_BR_PRC | Conditional branch instructions co... |
| PAPI_FMA_INS | FMA instructions completed |
| PAPI_TOT_IIS | Instructions issued |
| PAPI_TOT_INS | Instructions completed |
| PAPI_INT_INS | Integer instructions |
| PAPI_FP_INS | Floating point instructions |
| PAPI_LD_INS | Load instructions |
| PAPI_SR_INS | Store instructions |
| PAPI_BR_INS | Branch instructions |
| PAPI_VEC_INS | Vector/SIMD instructions (could in... |
| PAPI_RES_STL | Cycles stalled on any resource |
| PAPI_FP_STAL | Cycles the FP unit(s) are stalled |
| PAPI_TOT_CYC | Total cycles |
| PAPI_LST_INS | Load/store instructions completed |
| PAPI_SYC_INS | Synchronization instructions compl... |
| PAPI_L1_DCH | Level 1 data cache hits |
| PAPI_L2_DCH | Level 2 data cache hits |
| PAPI_L1_DCA | Level 1 data cache accesses |
| PAPI_L2_DCA | Level 2 data cache accesses |
| PAPI_L3_DCA | Level 3 data cache accesses |
| PAPI_L1_DCR | Level 1 data cache reads |
| PAPI_L2_DCR | Level 2 data cache reads |
| PAPI_L3_DCR | Level 3 data cache reads |
| PAPI_L1_DCW | Level 1 data cache writes |
| PAPI_L2_DCW | Level 2 data cache writes |
| PAPI_L3_DCW | Level 3 data cache writes |
| PAPI_L1_ICH | Level 1 instruction cache hits |
| PAPI_L2_ICH | Level 2 instruction cache hits |
| PAPI_L3_ICH | Level 3 instruction cache hits |
| PAPI_L1_ICA | Level 1 instruction cache accesses |
| PAPI_L2_ICA | Level 2 instruction cache accesses |
| PAPI_L3_ICA | Level 3 instruction cache accesses |
| PAPI_L1_ICR | Level 1 instruction cache reads |
| PAPI_L2_ICR | Level 2 instruction cache reads |
| PAPI_L3_ICR | Level 3 instruction cache reads |
| PAPI_L1_ICW | Level 1 instruction cache writes |
| PAPI_L2_ICW | Level 2 instruction cache writes |
| PAPI_L3_ICW | Level 3 instruction cache writes |
| PAPI_L1_TCH | Level 1 total cache hits |
| PAPI_L2_TCH | Level 2 total cache hits |
| PAPI_L3_TCH | Level 3 total cache hits |
| PAPI_L1_TCA | Level 1 total cache accesses |
| PAPI_L2_TCA | Level 2 total cache accesses |
| PAPI_L3_TCA | Level 3 total cache accesses |
| PAPI_L1_TCR | Level 1 total cache reads |
| PAPI_L2_TCR | Level 2 total cache reads |
| PAPI_L3_TCR | Level 3 total cache reads |
| PAPI_L1_TCW | Level 1 total cache writes |
| PAPI_L2_TCW | Level 2 total cache writes |
| PAPI_L3_TCW | Level 3 total cache writes |
| PAPI_FML_INS | Floating point multiply instructions |
| PAPI_FAD_INS | Floating point add instructions |
| PAPI_FDV_INS | Floating point divide instructions |
| PAPI_FSQ_INS | Floating point square root instruc... |
| PAPI_FNV_INS | Floating point inverse instructions |
| PAPI_FP_OPS | Floating point operations |
| PAPI_SP_OPS | Floating point operations; optimiz... |
| PAPI_DP_OPS | Floating point operations; optimiz... |
| PAPI_VEC_SP | Single precision vector/SIMD instr... |
| PAPI_VEC_DP | Double precision vector/SIMD instr... |
| PAPI_REF_CYC | Reference clock cycles |
| perf::PERF_COUNT_HW_CPU_CYCLES | PERF_COUNT_HW_CPU_CYCLES |
| perf::PERF_COUNT_HW_CPU_CYCLES:u=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... |
| perf::PERF_COUNT_HW_CPU_CYCLES:k=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... |
| perf::PERF_COUNT_HW_CPU_CYCLES:h=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... |
| perf::PERF_COUNT_HW_CPU_CYCLES:per... | perf::PERF_COUNT_HW_CPU_CYCLES + s... |
| perf::PERF_COUNT_HW_CPU_CYCLES:freq=0 | perf::PERF_COUNT_HW_CPU_CYCLES + s... |
| perf::PERF_COUNT_HW_CPU_CYCLES:pre... | perf::PERF_COUNT_HW_CPU_CYCLES + p... |
| perf::PERF_COUNT_HW_CPU_CYCLES:excl=0 | perf::PERF_COUNT_HW_CPU_CYCLES + e... |
| perf::PERF_COUNT_HW_CPU_CYCLES:mg=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... |
| perf::PERF_COUNT_HW_CPU_CYCLES:mh=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... |
| perf::PERF_COUNT_HW_CPU_CYCLES:cpu=0 | perf::PERF_COUNT_HW_CPU_CYCLES + C... |
| perf::PERF_COUNT_HW_CPU_CYCLES:pin... | perf::PERF_COUNT_HW_CPU_CYCLES + p... |
| perf::CYCLES | PERF_COUNT_HW_CPU_CYCLES |
| perf::CYCLES:u=0 | perf::CYCLES + monitor at user level |
| perf::CYCLES:k=0 | perf::CYCLES + monitor at kernel l... |
| perf::CYCLES:h=0 | perf::CYCLES + monitor at hypervis... |
| perf::CYCLES:period=0 | perf::CYCLES + sampling period |
| perf::CYCLES:freq=0 | perf::CYCLES + sampling frequency ... |
| perf::CYCLES:precise=0 | perf::CYCLES + precise event sampling |
| perf::CYCLES:excl=0 | perf::CYCLES + exclusive access |
| perf::CYCLES:mg=0 | perf::CYCLES + monitor guest execu... |
| perf::CYCLES:mh=0 | perf::CYCLES + monitor host execution |
| perf::CYCLES:cpu=0 | perf::CYCLES + CPU to program |
| perf::CYCLES:pinned=0 | perf::CYCLES + pin event to counters |
| perf::CPU-CYCLES | PERF_COUNT_HW_CPU_CYCLES |
| perf::CPU-CYCLES:u=0 | perf::CPU-CYCLES + monitor at user... |
| perf::CPU-CYCLES:k=0 | perf::CPU-CYCLES + monitor at kern... |
| perf::CPU-CYCLES:h=0 | perf::CPU-CYCLES + monitor at hype... |
| perf::CPU-CYCLES:period=0 | perf::CPU-CYCLES + sampling period |
| perf::CPU-CYCLES:freq=0 | perf::CPU-CYCLES + sampling freque... |
| perf::CPU-CYCLES:precise=0 | perf::CPU-CYCLES + precise event s... |
| perf::CPU-CYCLES:excl=0 | perf::CPU-CYCLES + exclusive access |
| perf::CPU-CYCLES:mg=0 | perf::CPU-CYCLES + monitor guest e... |
| perf::CPU-CYCLES:mh=0 | perf::CPU-CYCLES + monitor host ex... |
| perf::CPU-CYCLES:cpu=0 | perf::CPU-CYCLES + CPU to program |
| perf::CPU-CYCLES:pinned=0 | perf::CPU-CYCLES + pin event to co... |
| perf::PERF_COUNT_HW_INSTRUCTIONS | PERF_COUNT_HW_INSTRUCTIONS |
| perf::PERF_COUNT_HW_INSTRUCTIONS:u=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:k=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:h=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:p... | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:f... | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:p... | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:e... | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:mg=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:mh=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:c... | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| perf::PERF_COUNT_HW_INSTRUCTIONS:p... | perf::PERF_COUNT_HW_INSTRUCTIONS +... |
| ... etc. ... | |
| perf_raw::r0000 | perf_events raw event syntax: r[0-... |
| perf_raw::r0000:u=0 | perf_raw::r0000 + monitor at user ... |
| perf_raw::r0000:k=0 | perf_raw::r0000 + monitor at kerne... |
| perf_raw::r0000:h=0 | perf_raw::r0000 + monitor at hyper... |
| perf_raw::r0000:period=0 | perf_raw::r0000 + sampling period |
| perf_raw::r0000:freq=0 | perf_raw::r0000 + sampling frequen... |
| perf_raw::r0000:precise=0 | perf_raw::r0000 + precise event sa... |
| perf_raw::r0000:excl=0 | perf_raw::r0000 + exclusive access |
| perf_raw::r0000:mg=0 | perf_raw::r0000 + monitor guest ex... |
| perf_raw::r0000:mh=0 | perf_raw::r0000 + monitor host exe... |
| perf_raw::r0000:cpu=0 | perf_raw::r0000 + CPU to program |
| perf_raw::r0000:pinned=0 | perf_raw::r0000 + pin event to cou... |
| perf_raw::r0000:hw_smpl=0 | perf_raw::r0000 + enable hardware ... |
| L1_ITLB_MISS_L2_ITLB_HIT | Number of instruction fetches that... |
| L1_ITLB_MISS_L2_ITLB_HIT:e=0 | L1_ITLB_MISS_L2_ITLB_HIT + edge level |
| L1_ITLB_MISS_L2_ITLB_HIT:i=0 | L1_ITLB_MISS_L2_ITLB_HIT + invert |
| L1_ITLB_MISS_L2_ITLB_HIT:c=0 | L1_ITLB_MISS_L2_ITLB_HIT + counter... |
| L1_ITLB_MISS_L2_ITLB_HIT:g=0 | L1_ITLB_MISS_L2_ITLB_HIT + measure... |
| L1_ITLB_MISS_L2_ITLB_HIT:u=0 | L1_ITLB_MISS_L2_ITLB_HIT + monitor... |
| L1_ITLB_MISS_L2_ITLB_HIT:k=0 | L1_ITLB_MISS_L2_ITLB_HIT + monitor... |
| L1_ITLB_MISS_L2_ITLB_HIT:period=0 | L1_ITLB_MISS_L2_ITLB_HIT + samplin... |
| L1_ITLB_MISS_L2_ITLB_HIT:freq=0 | L1_ITLB_MISS_L2_ITLB_HIT + samplin... |
| L1_ITLB_MISS_L2_ITLB_HIT:excl=0 | L1_ITLB_MISS_L2_ITLB_HIT + exclusi... |
| L1_ITLB_MISS_L2_ITLB_HIT:mg=0 | L1_ITLB_MISS_L2_ITLB_HIT + monitor... |
| L1_ITLB_MISS_L2_ITLB_HIT:mh=0 | L1_ITLB_MISS_L2_ITLB_HIT + monitor... |
| L1_ITLB_MISS_L2_ITLB_HIT:cpu=0 | L1_ITLB_MISS_L2_ITLB_HIT + CPU to ... |
| L1_ITLB_MISS_L2_ITLB_HIT:pinned=0 | L1_ITLB_MISS_L2_ITLB_HIT + pin eve... |
| L1_ITLB_MISS_L2_ITLB_MISS | Number of instruction fetches that... |
| L1_ITLB_MISS_L2_ITLB_MISS:IF1G | L1_ITLB_MISS_L2_ITLB_MISS + Number... |
| L1_ITLB_MISS_L2_ITLB_MISS:IF2M | L1_ITLB_MISS_L2_ITLB_MISS + Number... |
| L1_ITLB_MISS_L2_ITLB_MISS:IF4K | L1_ITLB_MISS_L2_ITLB_MISS + Number... |
| L1_ITLB_MISS_L2_ITLB_MISS:e=0 | L1_ITLB_MISS_L2_ITLB_MISS + edge l... |
| L1_ITLB_MISS_L2_ITLB_MISS:i=0 | L1_ITLB_MISS_L2_ITLB_MISS + invert |
| L1_ITLB_MISS_L2_ITLB_MISS:c=0 | L1_ITLB_MISS_L2_ITLB_MISS + counte... |
| L1_ITLB_MISS_L2_ITLB_MISS:g=0 | L1_ITLB_MISS_L2_ITLB_MISS + measur... |
| L1_ITLB_MISS_L2_ITLB_MISS:u=0 | L1_ITLB_MISS_L2_ITLB_MISS + monito... |
| L1_ITLB_MISS_L2_ITLB_MISS:k=0 | L1_ITLB_MISS_L2_ITLB_MISS + monito... |
| L1_ITLB_MISS_L2_ITLB_MISS:period=0 | L1_ITLB_MISS_L2_ITLB_MISS + sampli... |
| L1_ITLB_MISS_L2_ITLB_MISS:freq=0 | L1_ITLB_MISS_L2_ITLB_MISS + sampli... |
| L1_ITLB_MISS_L2_ITLB_MISS:excl=0 | L1_ITLB_MISS_L2_ITLB_MISS + exclus... |
| L1_ITLB_MISS_L2_ITLB_MISS:mg=0 | L1_ITLB_MISS_L2_ITLB_MISS + monito... |
| L1_ITLB_MISS_L2_ITLB_MISS:mh=0 | L1_ITLB_MISS_L2_ITLB_MISS + monito... |
| L1_ITLB_MISS_L2_ITLB_MISS:cpu=0 | L1_ITLB_MISS_L2_ITLB_MISS + CPU to... |
| L1_ITLB_MISS_L2_ITLB_MISS:pinned=0 | L1_ITLB_MISS_L2_ITLB_MISS + pin ev... |
| RETIRED_SSE_AVX_FLOPS | This is a retire-based event. The ... |
| RETIRED_SSE_AVX_FLOPS:ADD_SUB_FLOPS | RETIRED_SSE_AVX_FLOPS + Addition/s... |
| RETIRED_SSE_AVX_FLOPS:MULT_FLOPS | RETIRED_SSE_AVX_FLOPS + Multiplica... |
| RETIRED_SSE_AVX_FLOPS:DIV_FLOPS | RETIRED_SSE_AVX_FLOPS + Division F... |
| RETIRED_SSE_AVX_FLOPS:MAC_FLOPS | RETIRED_SSE_AVX_FLOPS + Double pre... |
| RETIRED_SSE_AVX_FLOPS:ANY | RETIRED_SSE_AVX_FLOPS + Double pre... |
| RETIRED_SSE_AVX_FLOPS:e=0 | RETIRED_SSE_AVX_FLOPS + edge level |
| RETIRED_SSE_AVX_FLOPS:i=0 | RETIRED_SSE_AVX_FLOPS + invert |
| RETIRED_SSE_AVX_FLOPS:c=0 | RETIRED_SSE_AVX_FLOPS + counter-ma... |
| RETIRED_SSE_AVX_FLOPS:g=0 | RETIRED_SSE_AVX_FLOPS + measure in... |
| RETIRED_SSE_AVX_FLOPS:u=0 | RETIRED_SSE_AVX_FLOPS + monitor at... |
| RETIRED_SSE_AVX_FLOPS:k=0 | RETIRED_SSE_AVX_FLOPS + monitor at... |
| RETIRED_SSE_AVX_FLOPS:period=0 | RETIRED_SSE_AVX_FLOPS + sampling p... |
| RETIRED_SSE_AVX_FLOPS:freq=0 | RETIRED_SSE_AVX_FLOPS + sampling f... |
| RETIRED_SSE_AVX_FLOPS:excl=0 | RETIRED_SSE_AVX_FLOPS + exclusive ... |
| RETIRED_SSE_AVX_FLOPS:mg=0 | RETIRED_SSE_AVX_FLOPS + monitor gu... |
| RETIRED_SSE_AVX_FLOPS:mh=0 | RETIRED_SSE_AVX_FLOPS + monitor ho... |
| RETIRED_SSE_AVX_FLOPS:cpu=0 | RETIRED_SSE_AVX_FLOPS + CPU to pro... |
| RETIRED_SSE_AVX_FLOPS:pinned=0 | RETIRED_SSE_AVX_FLOPS + pin event ... |
| DIV_CYCLES_BUSY_COUNT | Number of cycles when the divider ... |
| DIV_CYCLES_BUSY_COUNT:e=0 | DIV_CYCLES_BUSY_COUNT + edge level |
| DIV_CYCLES_BUSY_COUNT:i=0 | DIV_CYCLES_BUSY_COUNT + invert |
| DIV_CYCLES_BUSY_COUNT:c=0 | DIV_CYCLES_BUSY_COUNT + counter-ma... |
| DIV_CYCLES_BUSY_COUNT:g=0 | DIV_CYCLES_BUSY_COUNT + measure in... |
| DIV_CYCLES_BUSY_COUNT:u=0 | DIV_CYCLES_BUSY_COUNT + monitor at... |
| DIV_CYCLES_BUSY_COUNT:k=0 | DIV_CYCLES_BUSY_COUNT + monitor at... |
| DIV_CYCLES_BUSY_COUNT:period=0 | DIV_CYCLES_BUSY_COUNT + sampling p... |
| DIV_CYCLES_BUSY_COUNT:freq=0 | DIV_CYCLES_BUSY_COUNT + sampling f... |
| DIV_CYCLES_BUSY_COUNT:excl=0 | DIV_CYCLES_BUSY_COUNT + exclusive ... |
| DIV_CYCLES_BUSY_COUNT:mg=0 | DIV_CYCLES_BUSY_COUNT + monitor gu... |
| DIV_CYCLES_BUSY_COUNT:mh=0 | DIV_CYCLES_BUSY_COUNT + monitor ho... |
| DIV_CYCLES_BUSY_COUNT:cpu=0 | DIV_CYCLES_BUSY_COUNT + CPU to pro... |
| DIV_CYCLES_BUSY_COUNT:pinned=0 | DIV_CYCLES_BUSY_COUNT + pin event ... |
| DIV_OP_COUNT | Number of divide uops. |
| DIV_OP_COUNT:e=0 | DIV_OP_COUNT + edge level |
| DIV_OP_COUNT:i=0 | DIV_OP_COUNT + invert |
| DIV_OP_COUNT:c=0 | DIV_OP_COUNT + counter-mask in ran... |
| DIV_OP_COUNT:g=0 | DIV_OP_COUNT + measure in guest |
| DIV_OP_COUNT:u=0 | DIV_OP_COUNT + monitor at user level |
| DIV_OP_COUNT:k=0 | DIV_OP_COUNT + monitor at kernel l... |
| DIV_OP_COUNT:period=0 | DIV_OP_COUNT + sampling period |
| DIV_OP_COUNT:freq=0 | DIV_OP_COUNT + sampling frequency ... |
| DIV_OP_COUNT:excl=0 | DIV_OP_COUNT + exclusive access |
| DIV_OP_COUNT:mg=0 | DIV_OP_COUNT + monitor guest execu... |
| DIV_OP_COUNT:mh=0 | DIV_OP_COUNT + monitor host execution |
| DIV_OP_COUNT:cpu=0 | DIV_OP_COUNT + CPU to program |
| DIV_OP_COUNT:pinned=0 | DIV_OP_COUNT + pin event to counters |
| ... etc. ... | |
| amd64_rapl::RAPL_ENERGY_PKG | Number of Joules consumed by all c... |
| amd64_rapl::RAPL_ENERGY_PKG:u=0 | amd64_rapl::RAPL_ENERGY_PKG + moni... |
| amd64_rapl::RAPL_ENERGY_PKG:k=0 | amd64_rapl::RAPL_ENERGY_PKG + moni... |
| amd64_rapl::RAPL_ENERGY_PKG:period=0 | amd64_rapl::RAPL_ENERGY_PKG + samp... |
| amd64_rapl::RAPL_ENERGY_PKG:freq=0 | amd64_rapl::RAPL_ENERGY_PKG + samp... |
| amd64_rapl::RAPL_ENERGY_PKG:excl=0 | amd64_rapl::RAPL_ENERGY_PKG + excl... |
| amd64_rapl::RAPL_ENERGY_PKG:mg=0 | amd64_rapl::RAPL_ENERGY_PKG + moni... |
| amd64_rapl::RAPL_ENERGY_PKG:mh=0 | amd64_rapl::RAPL_ENERGY_PKG + moni... |
| amd64_rapl::RAPL_ENERGY_PKG:cpu=0 | amd64_rapl::RAPL_ENERGY_PKG + CPU ... |
| amd64_rapl::RAPL_ENERGY_PKG:pinned=0 | amd64_rapl::RAPL_ENERGY_PKG + pin ... |
| appio:::READ_BYTES | Bytes read |
| appio:::READ_CALLS | Number of read calls |
| appio:::READ_ERR | Number of read calls that resulted... |
| appio:::READ_INTERRUPTED | Number of read calls that timed ou... |
| appio:::READ_WOULD_BLOCK | Number of read calls that would ha... |
| appio:::READ_SHORT | Number of read calls that returned... |
| appio:::READ_EOF | Number of read calls that returned... |
| appio:::READ_BLOCK_SIZE | Average block size of reads |
| appio:::READ_USEC | Real microseconds spent in reads |
| appio:::WRITE_BYTES | Bytes written |
| appio:::WRITE_CALLS | Number of write calls |
| appio:::WRITE_ERR | Number of write calls that resulte... |
| appio:::WRITE_SHORT | Number of write calls that wrote l... |
| appio:::WRITE_INTERRUPTED | Number of write calls that timed o... |
| appio:::WRITE_WOULD_BLOCK | Number of write calls that would h... |
| appio:::WRITE_BLOCK_SIZE | Mean block size of writes |
| appio:::WRITE_USEC | Real microseconds spent in writes |
| appio:::OPEN_CALLS | Number of open calls |
| appio:::OPEN_ERR | Number of open calls that resulted... |
| appio:::OPEN_FDS | Number of currently open descriptors |
| appio:::SELECT_USEC | Real microseconds spent in select ... |
| appio:::RECV_BYTES | Bytes read in recv/recvmsg/recvfrom |
| appio:::RECV_CALLS | Number of recv/recvmsg/recvfrom calls |
| appio:::RECV_ERR | Number of recv/recvmsg/recvfrom ca... |
| appio:::RECV_INTERRUPTED | Number of recv/recvmsg/recvfrom ca... |
| appio:::RECV_WOULD_BLOCK | Number of recv/recvmsg/recvfrom ca... |
| appio:::RECV_SHORT | Number of recv/recvmsg/recvfrom ca... |
| appio:::RECV_EOF | Number of recv/recvmsg/recvfrom ca... |
| appio:::RECV_BLOCK_SIZE | Average block size of recv/recvmsg... |
| appio:::RECV_USEC | Real microseconds spent in recv/re... |
| appio:::SOCK_READ_BYTES | Bytes read from socket |
| appio:::SOCK_READ_CALLS | Number of read calls on socket |
| appio:::SOCK_READ_ERR | Number of read calls on socket tha... |
| appio:::SOCK_READ_SHORT | Number of read calls on socket tha... |
| appio:::SOCK_READ_WOULD_BLOCK | Number of read calls on socket tha... |
| appio:::SOCK_READ_USEC | Real microseconds spent in read(s)... |
| appio:::SOCK_WRITE_BYTES | Bytes written to socket |
| appio:::SOCK_WRITE_CALLS | Number of write calls to socket |
| appio:::SOCK_WRITE_ERR | Number of write calls to socket th... |
| appio:::SOCK_WRITE_SHORT | Number of write calls to socket th... |
| appio:::SOCK_WRITE_WOULD_BLOCK | Number of write calls to socket th... |
| appio:::SOCK_WRITE_USEC | Real microseconds spent in write(s... |
| appio:::SEEK_CALLS | Number of seek calls |
| appio:::SEEK_ABS_STRIDE_SIZE | Average absolute stride size of seeks |
| appio:::SEEK_USEC | Real microseconds spent in seek calls |
| coretemp:::hwmon2:in0_input | V, amdgpu module, label vddgfx |
| coretemp:::hwmon2:temp1_input | degrees C, amdgpu module, label edge |
| coretemp:::hwmon2:temp2_input | degrees C, amdgpu module, label ju... |
| coretemp:::hwmon2:temp3_input | degrees C, amdgpu module, label mem |
| coretemp:::hwmon2:fan1_input | RPM, amdgpu module, label ? |
| coretemp:::hwmon0:temp1_input | degrees C, nvme module, label Comp... |
| coretemp:::hwmon0:temp2_input | degrees C, nvme module, label Sens... |
| coretemp:::hwmon0:temp3_input | degrees C, nvme module, label Sens... |
| coretemp:::hwmon3:temp1_input | degrees C, k10temp module, label Tctl |
| coretemp:::hwmon3:temp2_input | degrees C, k10temp module, label Tdie |
| coretemp:::hwmon3:temp5_input | degrees C, k10temp module, label T... |
| coretemp:::hwmon3:temp7_input | degrees C, k10temp module, label T... |
| coretemp:::hwmon1:temp1_input | degrees C, enp1s0 module, label PH... |
| coretemp:::hwmon1:temp2_input | degrees C, enp1s0 module, label MA... |
| io:::rchar | Characters read. |
| io:::wchar | Characters written. |
| io:::syscr | Characters read by system calls. |
| io:::syscw | Characters written by system calls. |
| io:::read_bytes | Binary bytes read. |
| io:::write_bytes | Binary bytes written. |
| io:::cancelled_write_bytes | Binary write bytes cancelled. |
| net:::lo:rx:bytes | lo receive bytes |
| net:::lo:rx:packets | lo receive packets |
| net:::lo:rx:errors | lo receive errors |
| net:::lo:rx:dropped | lo receive dropped |
| net:::lo:rx:fifo | lo receive fifo |
| net:::lo:rx:frame | lo receive frame |
| net:::lo:rx:compressed | lo receive compressed |
| net:::lo:rx:multicast | lo receive multicast |
| net:::lo:tx:bytes | lo transmit bytes |
| net:::lo:tx:packets | lo transmit packets |
| net:::lo:tx:errors | lo transmit errors |
| net:::lo:tx:dropped | lo transmit dropped |
| net:::lo:tx:fifo | lo transmit fifo |
| net:::lo:tx:colls | lo transmit colls |
| net:::lo:tx:carrier | lo transmit carrier |
| net:::lo:tx:compressed | lo transmit compressed |
| net:::enp1s0:rx:bytes | enp1s0 receive bytes |
| net:::enp1s0:rx:packets | enp1s0 receive packets |
| net:::enp1s0:rx:errors | enp1s0 receive errors |
| net:::enp1s0:rx:dropped | enp1s0 receive dropped |
| net:::enp1s0:rx:fifo | enp1s0 receive fifo |
| net:::enp1s0:rx:frame | enp1s0 receive frame |
| net:::enp1s0:rx:compressed | enp1s0 receive compressed |
| net:::enp1s0:rx:multicast | enp1s0 receive multicast |
| net:::enp1s0:tx:bytes | enp1s0 transmit bytes |
| net:::enp1s0:tx:packets | enp1s0 transmit packets |
| net:::enp1s0:tx:errors | enp1s0 transmit errors |
| net:::enp1s0:tx:dropped | enp1s0 transmit dropped |
| net:::enp1s0:tx:fifo | enp1s0 transmit fifo |
| net:::enp1s0:tx:colls | enp1s0 transmit colls |
| net:::enp1s0:tx:carrier | enp1s0 transmit carrier |
| net:::enp1s0:tx:compressed | enp1s0 transmit compressed |
| net:::vxlan.calico:rx:bytes | vxlan.calico receive bytes |
| net:::vxlan.calico:rx:packets | vxlan.calico receive packets |
| net:::vxlan.calico:rx:errors | vxlan.calico receive errors |
| net:::vxlan.calico:rx:dropped | vxlan.calico receive dropped |
| net:::vxlan.calico:rx:fifo | vxlan.calico receive fifo |
| net:::vxlan.calico:rx:frame | vxlan.calico receive frame |
| net:::vxlan.calico:rx:compressed | vxlan.calico receive compressed |
| net:::vxlan.calico:rx:multicast | vxlan.calico receive multicast |
| net:::vxlan.calico:tx:bytes | vxlan.calico transmit bytes |
| net:::vxlan.calico:tx:packets | vxlan.calico transmit packets |
| net:::vxlan.calico:tx:errors | vxlan.calico transmit errors |
| net:::vxlan.calico:tx:dropped | vxlan.calico transmit dropped |
| net:::vxlan.calico:tx:fifo | vxlan.calico transmit fifo |
| net:::vxlan.calico:tx:colls | vxlan.calico transmit colls |
| net:::vxlan.calico:tx:carrier | vxlan.calico transmit carrier |
| net:::vxlan.calico:tx:compressed | vxlan.calico transmit compressed |
| net:::cali59d6fabc2aa:rx:bytes | cali59d6fabc2aa receive bytes |
| net:::cali59d6fabc2aa:rx:packets | cali59d6fabc2aa receive packets |
| net:::cali59d6fabc2aa:rx:errors | cali59d6fabc2aa receive errors |
| net:::cali59d6fabc2aa:rx:dropped | cali59d6fabc2aa receive dropped |
| net:::cali59d6fabc2aa:rx:fifo | cali59d6fabc2aa receive fifo |
| net:::cali59d6fabc2aa:rx:frame | cali59d6fabc2aa receive frame |
| net:::cali59d6fabc2aa:rx:compressed | cali59d6fabc2aa receive compressed |
| net:::cali59d6fabc2aa:rx:multicast | cali59d6fabc2aa receive multicast |
| net:::cali59d6fabc2aa:tx:bytes | cali59d6fabc2aa transmit bytes |
| net:::cali59d6fabc2aa:tx:packets | cali59d6fabc2aa transmit packets |
| net:::cali59d6fabc2aa:tx:errors | cali59d6fabc2aa transmit errors |
| net:::cali59d6fabc2aa:tx:dropped | cali59d6fabc2aa transmit dropped |
| net:::cali59d6fabc2aa:tx:fifo | cali59d6fabc2aa transmit fifo |
| net:::cali59d6fabc2aa:tx:colls | cali59d6fabc2aa transmit colls |
| net:::cali59d6fabc2aa:tx:carrier | cali59d6fabc2aa transmit carrier |
| net:::cali59d6fabc2aa:tx:compressed | cali59d6fabc2aa transmit compressed |
|---------------------------------------|---------------------------------------|
```
## Creating a Configuration File
@@ -370,44 +651,92 @@ but do not override an existing value for the environment variable.
```shell
# lvals starting with $ are variables
$USE = ON
$ENABLE = ON
$SAMPLE = OFF
# use fields
OMNITRACE_USE_PERFETTO = $USE
OMNITRACE_USE_TIMEMORY = $USE
OMNITRACE_USE_SAMPLING = $USE
OMNITRACE_USE_PID = OFF
OMNITRACE_USE_PERFETTO = $ENABLE
OMNITRACE_USE_TIMEMORY = $ENABLE
OMNITRACE_USE_SAMPLING = $SAMPLE
OMNITRACE_USE_THREAD_SAMPLING = $SAMPLE
OMNITRACE_CRITICAL_TRACE = OFF
# debug
OMNITRACE_DEBUG = OFF
OMNITRACE_VERBOSE = 1
OMNITRACE_DL_VERBOSE = 1
# output fields
OMNITRACE_OUTPUT_PREFIX = %tag%-
OMNITRACE_OUTPUT_PATH = omnitrace-example-output
OMNITRACE_OUTPUT_PREFIX = %tag%/
OMNITRACE_TIME_OUTPUT = OFF
OMNITRACE_USE_PID = OFF
# timemory fields
OMNITRACE_PAPI_EVENTS = PAPI_TOT_INS PAPI_FP_INS
OMNITRACE_TIMEMORY_COMPONENTS = wall_clock trip_count
OMNITRACE_MEMORY_UNITS = MB
OMNITRACE_TIMING_UNITS = sec
# sampling fields
OMNITRACE_SAMPLING_FREQ = 10
# rocm-smi fields
OMNITRACE_ROCM_SMI_DEVICES = 1
OMNITRACE_ROCM_SMI_DEVICES = 0
# misc env variables
OMNITRACE_SAMPLING_KEEP_DYNINST_SUFFIX = OFF
OMNITRACE_SAMPLING_KEEP_INTERNAL = OFF
```
### Sample JSON Configuration File
The full JSON specification for a configuration value contains a lot of information:
```json
{
"omnitrace": {
"settings": {
"OMNITRACE_ADD_SECONDARY": {
"count": -1,
"name": "add_secondary",
"data_type": "bool",
"initial": true,
"value": true,
"max_count": 1,
"cmdline": [
"--omnitrace-add-secondary"
],
"environ": "OMNITRACE_ADD_SECONDARY",
"cereal_class_version": 1,
"categories": [
"component",
"data",
"native"
],
"description": "Enable/disable components adding secondary (child) entries when available. E.g. suppress individual CUDA kernels, etc. when using Cupti components"
}
}
}
}
```
However when writing an JSON configuration file, the following is minimally acceptable to set `OMNITRACE_ADD_SECONDARY=false`:
```json
{
"omnitrace": {
"settings": {
"OMNITRACE_ADD_SECONDARY": {
"value": true
}
}
}
}
```
### Sample XML Configuration File
The full XML specification for a configuration value contains
a lot of information:
The full XML specification for a configuration value contains the same information as the JSON specification:
```xml
<?xml version="1.0" encoding="utf-8"?>
@@ -424,7 +753,7 @@ a lot of information:
<count>-1</count>
<max_count>1</max_count>
<cmdline>
<value0>--timemory-add-secondary</value0>
<value0>--omnitrace-add-secondary</value0>
</cmdline>
<categories>
<value0>component</value0>
@@ -441,8 +770,7 @@ a lot of information:
</timemory_xml>
```
Howver when writing an XML configuration file, the following is perfectly acceptable
to set `OMNITRACE_ADD_SECONDARY=false`:
However, when writing an XML configuration file, the following is minimally acceptable to set `OMNITRACE_ADD_SECONDARY=false`:
```xml
<?xml version="1.0" encoding="utf-8"?>
@@ -456,51 +784,3 @@ to set `OMNITRACE_ADD_SECONDARY=false`:
</omnitrace>
</timemory_xml>
```
### Sample JSON Configuration File
The full JSON specification for a configuration value contains the same information as the XML:
```json
{
"omnitrace": {
"settings": {
"OMNITRACE_ADD_SECONDARY": {
"count": -1,
"name": "add_secondary",
"data_type": "bool",
"initial": true,
"value": true,
"max_count": 1,
"cmdline": [
"--timemory-add-secondary"
],
"environ": "OMNITRACE_ADD_SECONDARY",
"cereal_class_version": 1,
"categories": [
"component",
"data",
"native"
],
"description": "Enable/disable components adding secondary (child) entries when available. E.g. suppress individual CUDA kernels, etc. when using Cupti components"
}
}
}
}
```
Similarly, the
Howver when writing an XML configuration file, the following is perfectly acceptable
to set `OMNITRACE_ADD_SECONDARY=false`:
```json
{
"omnitrace": {
"settings": {
"OMNITRACE_ADD_SECONDARY": {
"value": true
}
}
}
}
```
@@ -25,8 +25,11 @@ make html
if [ -d ${SOURCE_DIR}/docs ]; then
message "Removing stale documentation in ${SOURCE_DIR}/docs/"
echo rm -rf ${SOURCE_DIR}/docs/*
rm -rf ${SOURCE_DIR}/docs/*
message "Adding nojekyll to docs/"
cp -r ${WORK_DIR}/.nojekyll ${SOURCE_DIR}/docs/.nojekyll
message "Copying source/docs/_build/html/* to docs/"
echo cp -r ${WORK_DIR}/_build/html/* ${SOURCE_DIR}/docs/
cp -r ${WORK_DIR}/_build/html/* ${SOURCE_DIR}/docs/
fi