## Motivation
Fix roctx range markers (Push/Pop, Start/Stop) not being displayed correctly in rocpd output. The Visualizer was showing only Stop/Pop events as instant markers instead of proper duration ranges with labels, while Perfetto output displayed them correctly.
## Technical Details
In `tool_tracing_callback_stop()`, the rocpd/database output was using `user_data->value` (timestamp of the Pop/Stop event) instead of `begin_ts` (corrected timestamp from the corresponding Push/Start event) when calling `cache_region()`.
The Perfetto output already used `begin_ts` correctly (line 818). This change aligns the rocpd output with the Perfetto behavior by using `begin_ts` instead of `user_data->value` (line 887).
Updated rocpd validation rules
* fix: resolve crash when profiling TensorFlow GPU application
* incorporate review comments
* updated min_rows from 3 to 2 for threads table validation as internal threads are not profiled and are now correctly bypassed
* Put cached perfetto traces as default one
* Improve cached data and perfetto traces in order to be more aligned with E2E tests
* Addressing PR comments and findings
* Force early instrumentation bundle instantiation
* Sync-up insturumented containers with thread growth data
* Revert ompvv number of host threads to default 8
* Fixed counter track namings for amd-smi
* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
* Add XGMI and PCIe metrics to the profiling data
Add support for AMD XGMI (GPU-to-GPU interconnect) and PCIe
metrics:
* XGMI link width in bits
* XGMI link speed in GT/s
* Per-link read bandwidth (KB)
* Per-link write bandwidth (KB)
- Add new categories for PCIe metrics:
* PCIe link width
* PCIe link speed in GT/s
* Accumulated bandwidth (MB)
* Instantaneous bandwidth (MB/s)
* Fix VCN/JPEG insert logic
* Modify the gpu_metrics struct to accomodate XCP structure
* Add ctest automation for gpu interconnect metrics
* Refactor to move gpu_metrics struct and serialization to another file
* Possible fix for timeout in CI
Fix redundant skip check in ctest
Add xgmi and pcie option in rocprof-sys-avail.
* Change2: Address review comments
Change ctest sampling to avoid timeout
Change variable name and code structuring
* Add option in ctest to run rocprof-sys-run without rewrite
Run transferbench with rocprof-sys-run without sampling
* Change3: Fix sample insert bug and address review comments
xgmi and pci support check
renaming variables
additional hip_api validation in rocpd
* Reduce the load from the trnasferBench sample
The CI builds were timing out when flushing a big temporary file to the
DB: (2720824.23 KB / 2720.82 MB / 2.72 GB)...
* Forward ctest labels from the execution test to the validation test.
* Adjust test validation parameters for amid_smi samples
The actual number of samples will vary depending on the GPU. This test
is just to validate the presence of the samples
* Round the sum of percentages before validating to account for floating point errors
---------
Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
- Integrate rocprofiler-systems with rocprofiler-sdk-rocpd to fetch schema
- If rocprofiler-sdk-rocpd is not availabe, use embedded schema files. With this we provide rocpd format support even if ROCm is not available
- Include detection in CMake if rocprofiler-sdk-rocpd package is available (and valid), and build database class upon that
- Update embedded schema that is used as a fallback.
- Update some validation tests to account for schema changes.
* Check if test exists before adding validation
* Adjust validation parameters for rocpd_string
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>