a3819f09ad554fc0a945aaef197dfd7989edfc50
83 Коммитов
| Автор | SHA1 | Сообщение | Дата | |
|---|---|---|---|---|
|
|
8d7be2e4b4 |
SWDEV-483130: Replace calls to deprecated functions hipHostMalloc/hipHostFree (#1070)
* SWDEV-483130: Replace calls to deprecated functions hipHostMalloc/hipHostFree * SWDEV-483130: Replace calls to deprecated functions hipHostMalloc/hipHostFree. Moved definitions from lib/commons/defines.hpp to samples/common/defines.hpp and tests/common/defines.hpp * Updated comment for clarity * Update tests/rocprofv3/aborted-app/validate.py Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Formatting * Formatting * Updated CHANGELOG --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> |
||
|
|
93e82663d9 |
PC sampling: online partial PC sampling decoding (#1004)
* PC sampling: online partial PC sampling decoding PC sampling service decodes a PC sample partially by replacing the PC with an id of the loaded code object instance containing PC and the offset of the PC within that code object instance. * PC sampling: marker records removed * PC sampling parser: minor doc update in mock * PC sampling: introducing rocprofiler_pc_t * NULL value of the code object id introduced. * Clarifying documenation related to PC offset. * PC offset documentation improvement * PC sampling parser benchmark: Reducing the number of samples to recreate half of performance. |
||
|
|
5d54682468 |
Misc cleanup and stale code removal (#1026)
* Remove custom allocators - remove unused lib/rocprofiler-sdk/allocator.* - remove unused lib/rocprofiler-sdk/context/allocator.hpp * Fix rocprofiler_strip_target (rocprofiler_utilities.cmake) * Remove old HSA_TOOLS_LIB support - remove OnLoad/OnUnload functions used by HSA_TOOLS_LIB env variable * Fix linter warnings + specific NOLINT exceptions - replace bare NOLINT with NOLINT(<warning-name>) |
||
|
|
bb25376480 |
Misc API cleanup and consistency fixes (#1023)
- ROCPROFILER_API after function - use rocprofiler_tracing_operation_t in lieu of uint32_t where appropriate - rocprofiler_tracing_operation_t is not int32_t typedef (formerly uint32_t) - use const T* instead of T* where appropriate |
||
|
|
20e07caad4 |
Reorganize thread trace codeobj headers (#1001)
* include/rocprofiler-sdk/cxx/codeobj - Relocated from include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj * Update include/rocprofiler-sdk/cxx - cmake updates - correct namespace rocprofiler::codeobj rocprofiler::sdk::codeobj * Update codeobj tests and samples |
||
|
|
0f89f0449d |
PC sampling: chiplet id + integration test fix (#983)
* PCS: show chiplet; cover loading/unloading in integration test * Use (code_object_id, pc_addr) pair as instruction id. |
||
|
|
1e49b43738 |
Miscellaneous updates (#959)
- missing-new-line CI job: ensures all source files end with new line - logging updates - add new line to the end of many files - fix header include ordering is misc places - transition to use hsa::get_core_table() and hsa::get_amd_ext_table() in various places instead of making copies |
||
|
|
78fd8cb379 |
Returning code object id information in code_printing.cpp:Instruction (#965)
* Returning code object id information in code_printing.cpp:Instruction * Adding assertions * Simplifying decoder library |
||
|
|
a045947a89 | Removing cache of decoded lines and returning shared_ptr (#953) | ||
|
|
81d1407565 |
Incremental Counter Profile Creation (#933)
* Incremental Counter Profile Creation
Adds support for incremental counter creation. How this functions is the
behavior of rocprofiler_create_profile_config has been changed.
rocprofiler_create_profile_config(rocprofiler_agent_id_t agent_id,
rocprofiler_counter_id_t* counters_list,
size_t counters_count,
rocprofiler_profile_config_id_t* config_id)
The behavior of this function now allows an existing config_id to be
supplied via config_id. The counters contained in this config will be
copied over and used as a base for a new config along with any counters
supplied in counters_list. The new config id is returned via config_id
and can be used in future dispatch/agent counting sessions.
A new config is created over modifying an existing config since there
is no gaurentee that the existing config isn't already in use. While we
could add locks (or other mutual exclusion properties) to check if its
in use and reject an update, the benefit from doing so is minor in
comparison to just creating a new config. This also side steps a common
pattern a tool may use to add additional counters at some point later on
during execution. Now they can do that without destroying the existing
config.
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
||
|
|
9676295d3d |
ATT API changes - add user_data field and separation of dispatch vs agent profiling (#893)
* DRM Issue Fix for SLES 15 (#897) * DRM Issue Fix * Formatting Fix * PC sampling: CID manager unit test (#898) * Adding per-dispatch userdata field to ATT * Clang tidy * Formatting * Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> * Adding dispatch_id, fixing user_data and update aql_profile_v2 * Formatting * Tidy fixes * Second fix for userdata * removing assert for union * Adding serialization. Created agent profiling-like thread trace * Implemented agent thread trace * Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> * Restructured thread trace packets * Added agent API tests * Fixing multigpu for agent test * Formatting * Formatting * Improving header locations * Fixing merge conflicts * Tidy * Tidy * Tidy --------- Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com> Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> |
||
|
|
c49719649b |
SWDEV-465322: Adding support for Perfcounter SIMD Mask in ATT (#910)
* SWDEV-465322: Adding support for r Perfcounter SIMD Mask in ATT * Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> * Adding unit tests * Adding counters check for gfx9 and SQ block only * Addressing review comments * changing the struct size * fixing header includes --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> |
||
|
|
f5753d3ae3 |
Add dimension query to counter collection sample (#918)
Co-authored-by: Benjamin Welton <ben@amd.com> |
||
|
|
3d9a448797 |
Small change to sample for clarity (#913)
Co-authored-by: Benjamin Welton <ben@amd.com> |
||
|
|
b0c41827c3 |
PC sampling client: using raw pointers (#902)
* PC sampling client: using raw pointers to prevent premature destruction of buffers * PCS client: freeing buffer_ids |
||
|
|
0e43a30de0 | Update client.cpp (#900) | ||
|
|
a76f61a0a3 |
Migrate to rocprofiler-sdk:: namespace in CMake everywhere (#892)
- remove all usage/support for rocprofiler:: namespace |
||
|
|
5525b400c3 |
Miscellanous AFAR 5 Updates (#891)
* Dispatch table copy/update uses ROCP_TRACE instead of ROCP_INFO * Update rocprofiler-sdk CMake config - rocprofiler::rocprofiler is alias to rocprofiler-sdk::rocprofiler-sdk instead of other way around * Prefer rocprofiler-sdk::rocprofiler-sdk over rocprofiler::rocprofiler * Fix WITH_UNWIND for glog - requires a value of "none" instead of boolean now * Update include/rocprofiler-sdk/registration.h - explicit struct names to permit forward decl * Update include/rocprofiler-sdk/cxx/serialization.hpp - ROCPROFILER_SDK_CEREAL_NAMESPACE_BEGIN and ROCPROFILER_SDK_CEREAL_NAMESPACE_END to enable customized namespace |
||
|
|
1b95089c28 |
Enable ATT continuous mode and code object tracing registration (#850)
* Adding ATT continuous mode and ATT code object tracking * Fixing aql_packet.cpp * Updating to aqlprofile codeobj changes * Removing kernel packet from ATT dispatch callback * Changing getSymbolMap() to return relative vaddr * Tidy fixes * Formatting * Fix shadowing * Fixing packet test * Updating tests * Simplifying multi-agent traces * Adding dynamic codeobj tracking * leftover book-keeping for codeobj markers * Formatting * Formatting * Temporary removing codeobj marker * Formatting * Re-enabling codeobj tracking * Making copy of coreapi table * Fixing issues with toolData lifetile * Formatting * Fixing issues with ASAN * Improving memory profile * Removing misplaced annotation * Fixing queue type and allowing shared_locks in globalThreadTracer * Update logging * Changing ATT formats to be more in line with the SDk (#883) * Fixing some merge conflicts * Fixing cmakelists * Fixing merge conflicts * Formatting |
||
|
|
385980e279 |
Moving ATT to amd_detail (#885)
* Moving ATT to amd_detail * Formatting |
||
|
|
a84c9fa7d4 |
Removing code object static library (#865)
* Removing static library build for codeobj library * Moving codeobj library to amd_detail * Formatting * Formatting * Adding findDW * Adding libdw to common samples cmake |
||
|
|
987ae3cc47 |
PC Sampling Support (#715)
* cmake formatting (cmake-format) (#188) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * source formatting (clang-format v11) (#189) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: design of the pc sampling data struct; guarding parts of code that uses ROCr marker packets * source formatting (clang-format v11) (#191) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * cmake formatting (cmake-format) (#192) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: shadow variable fix * pcs: fix for compiler errors reported by CI/CD * source formatting (clang-format v11) (#193) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: docs fix; samples uses rocprofiler::rocprofiler library * cmake formatting (cmake-format) (#195) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: client in samples folder fixed * pcs: client requires rocprofiler package as dependency * pcs: client uses single context * source formatting (clang-format v11) (#196) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: client using single buffer; no buffer destroy in client * pcs: client::setup explicitly called from the example * pcs: rocprofiler_pc_sample_record_t updated * pcs: fixed init of external correlation id * source formatting (clang-format v11) (#198) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: remove outdated files; update CMakeLists * cmake formatting (cmake-format) (#212) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: using rocprofiler_agent_id_t * pcs: Removing trailing whitespaces Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> * source formatting (clang-format v11) (#214) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: mapping agent_id to the agent * source formatting (clang-format v11) (#215) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: const while iterating over agents * source formatting (clang-format v11) (#216) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: calling get_buffer instead of get_buffers * pcs: workgroup typo * pcs: documentation for the public PC sampling API * pcs: queue_cb_t signature adaptation * pcs: mocks removed * pcs: updating HsaApiTable with HSA/ROCr PC sampling API * pcs: querying available PC sampling configs through IOCTL * pcs: create the PCS session in IOCTL * pcs: first actual PC samples delivered to the rocprofiler's client :) * pcs: works with marker packet too * pcs: using HSA table to call pc sampling related functions * pcs: using ioctl instead of kfd in naming * pcs: configuration service test fixed * pcs: sample processing test fixed * pcs: marker packet macro wrapper removed * pcs: marker packet is part of the rocprofiler_packet union * pcs: one fixme added * pcs: client that uses pc-sampling and code obj tracing * pcs: client that supprts PC sampling and code obj tracing refactored * pcs: show more info for each PC sample * pcs: hex output for the samples that do not belong to the matmul kernel * pcs: querying avail configuration happens immediately before configuring * pcs: hsa_ven_amd_pcs_create_from_id renamed * pcs: using hsa_stop; accessing a buffer by id from parser * pcs: includes reworked, tests returned to life * pcs: rocrofiler dir removed as outdated * cmake formatting (cmake-format) (#271) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * source formatting (clang-format v11) (#272) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: some warnings fixed * source formatting (clang-format v11) (#273) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * cmake formatting (cmake-format) (#274) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: show MI200 relevant information in the sample * pcs: queue cb fixed; rocr.h include fixed * source formatting (clang-format v11) (#296) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: getting hsa_agent and the doorbell_id from hsa_queue * source formatting (clang-format v11) (#297) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: correlation ID logic fixed * source formatting (clang-format v11) (#303) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: pure pc sampling example fixed * source formatting (clang-format v11) (#307) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * cmake formatting (cmake-format) (#308) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: interval value if the PC sampling is already configured * pcs: ROCPROFILER_STATUS_ERROR_PC_SAMPLING_ALREADY_CONFIGURED New status code if another process configured PC sampling service with different configuration. Samples are extended to consider this case and retry if it happens. * pcs: hsa_amd_queue_get_info mocked in tests * source formatting (clang-format v11) (#328) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs (tests): query configs after configuring service * source formatting (clang-format v11) (#329) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: sample checks workgroup_id_* and wave_id * source formatting (clang-format v11) (#330) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs samples: running samples on the device 0 * pcs: kfd_ioctl updated * pcs: ioctl config struct changed fields names * pcs: status when PC sampling is configured by another process is renamed * pcs: HSA PC sampling API table fixed * pcs: tmp hack to be able to use HSA pc sampling table * source formatting (clang-format v11) (#443) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs service use CIDs generated by HIP API tracing service * source formatting (clang-format v11) (#455) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * cmake formatting (cmake-format) (#456) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: CID manager * pcs: explicit flush with no delivered data executes retirement logic * source formatting (clang-format v11) (#464) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: rocprofiler_query_pc_sampling_agent_configurations docs update * source formatting (clang-format v11) (#465) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: rocprofiler_configure_pc_sampling_service docs update * pcs: explicit sync introduced in PCSCIDManager * pcs: new logic for retiring CIDs in PC sampling service documented * pcs: queue interception cb signature updated * source formatting (clang-format v11) (#471) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: if no agents supports PC sampling, fail gracefully * elaborating when KFD returns EBUSY and EEXIST * pcs: the second PC sampling examples fails gracefully * code samples use only single kernel for now * pcs: CID manager refactored * source formatting (clang-format v11) (#481) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: ioctl update * source formatting (clang-format v11) (#531) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs:code sample to test PC sampling applied on concurrent kernels * source formatting (clang-format v11) (#533) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: pc sampling strest test included * cmake formatting (cmake-format) (#539) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * source formatting (clang-format v11) (#540) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: standalone benchmark * cmake formatting (cmake-format) (#555) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: glance in external correlation IDs * source formatting (clang-format v11) (#557) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * another change in ioctl interface * pcs: update queue interceptor callbacks and samples accroding to the agent 0 version * source formatting (clang-format v11) (#611) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: avoid running problematic PC sampling test * pcs: guarding tests not to fail on architectures not supporting PC sampling * source formatting (clang-format v11) (#617) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: check IOCTL version prior to each KFD call * pcs: ioctl refactoring * pcs: PC sampling service increases the ref_count of the correlation ID of the kernel dispatch * cmake formatting (cmake-format) (#631) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * source formatting (clang-format v11) (#632) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: PC sampling service provides external correlation IDs * source formatting (clang-format v11) (#644) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: use rocprofiler_dim3_t for workgrou_ip * source formatting (clang-format v11) (#645) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: minor fixes * pcs: updating the documentation for the pc sampling API functions * pcs: api table and queue controller fix * pcs: don't generate marker packets for the agent if PC sampling is not configured on it * pcs: multi-GPU and single-GPU clients * source formatting (clang-format v11) (#700) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: warning and errors fixed * source formatting (clang-format v11) (#702) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: clang compiler errors and warnings fixed * source formatting (clang-format v11) (#716) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: const reference in cid manager * source formatting (clang-format v11) (#717) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: const & func in manager explicit * pcs: test to cover creating PC sampling service of agent that does not exist * pcs: generate marker packets if service is active * source formatting (clang-format v11) (#719) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: refactoring hsa_adapter; use the correlation_id->thread_idx * Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/utils.cpp * Update utils.cpp * moving pc-sampling tests and samples to pc-sampling label * Format fix * pcs: use configured instead of active service * Update source/lib/rocprofiler-sdk/pc_sampling/service.cpp * pcs: ensure configuring PC sampling on the HSA level is called only once * pcs: minor fix * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * pcs: refactoring IOCTL integration * Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: reverting back what bot doubled * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: retesting the bot * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: why bot fails on this IOCTL status * pcs: why failing on <vector> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: returning commits removed by bot * pcs: formatting locally * pcs: clients are flushing buffers inside the tool_fini * pcs: sync function in public API * pcs: sync prior to unloading the code object * pcs: sync function requires context * pcs: client uses CID retirement service * pcs: test for flusing internal ROCr buffers * pcs: source formatting * Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: code samples refactoring * pcs: public API header refactored * pcs: rocprofiler_buffer_flush drains internal PC sampling buffers too * pcs: remove unnecessary functions * pcs: do not call hsa's copytables * pcs: include reordering * pcs: using ROCP_ERROR inside PC sampling implementation * pcs: pc_sampling sample uses ostream instean of printfs * pcs: pc_sampling_codeobj tracing using ostream instead of prints * pcs: registering once for interceptor callbacks * pcs: do not generate internal CIDs if not in debug mode * pcs: rebasing fixed; missing external correlation IDs * pcs: code formatting * enable kernel tracing service to receive external correlation IDs * pcs: using ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL * pcs: polishing parser * formatting * updating parser to use workgroup_id * kfd_ioctl.h extracted in details folder * refactoring * pcs: preparing to generate code object information * flush internal buffers prior to unloading code object * pcs: generating marker records * pcs: wrap code_object's shutdown function * ROCR_VISIBLE_DEVICES and HIP_VISISBLE_DEVICES unsupported at the moment * documenting the ignorance of ROCR/HIP_VISIBLE_DEVICES * pcs: separate structs for code object loading/unloading markers * pcs: inst_pkt_t changed the namespace * pcs: removing wrapper around the shutdown function * pcs: size in record field * pcs: documentation refactoring + typdefs * renaming PCSAgentConfig to PCSAgentSession * pcs: service does not keep a pointer to the context * pcs: static assertions related to the versioning * pcs: rocprofiler_pc_sampling_configuration_t size field * pcs: report API unimplemented unleass explicitly enabled * pcs: skip tests if KFD does not support PC sampling * pcs: if ROCr hides some devices, no PC samples will be delivered for it * pcs: hip error check after kernel launch * formatting * removing PCS info from agent.h * fix based on review * Update continuous integration workflow - use mi200 runner for code coverage (supports PC sampling) - split sanitizer jobs across navi3, vega20, and mi300 * Updating pc sampling test labels * ROCP_PC_SAMPLING_ENABLED env in CI * ROCP_PC_SAMPLING_ENABLED for all CI mi200 jobs * Rearrange sanitizer assignments * fixes according to review * removed unused functions * pcs: rocprofiler_agent_id_t instead of handle as a key in map * Update source/lib/rocprofiler-sdk/context/context.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * removing drm_fd from the agent.h * pcs: removing one sample due to complexity * pcs: refactoring sample * simplifying sample * new lines * Improve queue_control enable intercepter logic * Update lib/rocprofiler-sdk/hsa/types.hpp - handle amd_ext size for HSA 1.12.0 * ROCP_PC_SAMPLING_ENABLED -> ROCPROFILER_PC_SAMPLING_BETA_ENABLED * Update hsa_adapter.cpp - anonymous namespace + remove debug * parser update * Apply suggestions from code review --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> Co-authored-by: vlaindic <vladimir.indic@amd.com> Co-authored-by: vlaindic <vlaindic@amd.com> Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
28e6430d04 |
[2/N] Agent Counter implementation with unit tests to check functionality (#846)
Agent Counter Collection API with tests and samples. --------- Co-authored-by: Benjamin Welton <ben@amd.com> |
||
|
|
358c599c3f | avoiding early destruction of code objects list (#847) | ||
|
|
099ac7c72d |
Gbaraldi/att tool (#766)
* Enabling codeobj and thread trace samples * Updating aqlprofile_v2 header * Codeobj and thread trace samples with output log files * Fixing clang format * Cmake formatting * Adding coverage to codeobj * Comment trace sample * Adding ATT Parser API * Fixing forwarding to aqlprofile * Clang formatting * Clang tidy * Adding option to print memory kernels * Clang format * Remove default from switch case * Separating client/main on codeobj sample for ASAn * Formatting * Gbaraldi/att tool rebase (#801) * Enabling codeobj and thread trace samples * Updating aqlprofile_v2 header * Codeobj and thread trace samples with output log files * Fixing clang format * Cmake formatting * Adding coverage to codeobj * Comment trace sample * Removing python from workflow * Adding ATT Parser API * Fixing forwarding to aqlprofile * Clang formatting * Clang tidy * Adding option to print memory kernels * Clang format * Remove default from switch case * Separating client/main on codeobj sample for ASAn * Formatting * Enabling codeobj and thread trace samples * Updating aqlprofile_v2 header * Codeobj and thread trace samples with output log files * Fixing clang format * Cmake formatting * Adding coverage to codeobj * Comment trace sample * Adding ATT Parser API * Fixing forwarding to aqlprofile * Clang formatting * Clang tidy * Adding option to print memory kernels * Clang format * Remove default from switch case * Separating client/main on codeobj sample for ASAn * Formatting * Fix codeobj library * Allow thread trace in parallel with other service * Zeroing the HSA signals * Adding exception wrappers in ATT sample * Removed force configure * Remove force configure from ISA decode * Removing codecov flag * Gbaraldi/att tool tests (#828) * Adding tests for codeobj ISA decode * Adding ATT tests * Adding ATT integration tests * Formatting * Changing codeobj binary extension * Renaming codeobj library spaces * Fixing samples * Formatting * Formatting * Fixing int test * Fixing linker error * Fixing memory fault * Moving kernel ot inside namespace * ASAN linking fix * Removing unecessary headers * Formatting * Fixing target_cu * Remove codeobj binary * Revert "Remove codeobj binary" This reverts commit 7d286f89d8096bc36925cd79cd742a5e6d10d179. * Enable memory snapshot * adding comgr --------- Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com> |
||
|
|
de13d2ac5d |
Public C++ header files and samples updates (#819)
* Public C++ header files (source/include/rocprofiler-sdk/cxx)
* Update samples/api_buffered_tracing
- scratch memory and page migration
- README
* Update samples/api_buffered_tracing
- page migration component in sample
* Update tests/page-migration/validate.py
- fix checks for page migration operation names
* Update tests/page-migration/validate.py
- fix get_allocated_pages
* Update scratch memory and page migration validations
* Fix include/rocprofiler-sdk/cxx installation
* Rework include/rocprofiler-sdk/cxx
- Improve name_info to support const char*, string_view, string
* Update samples/api_{buffered,callback}_tracing
* External correlation ID request sample
- includes correlation ID retirement demo
* Update samples/api_buffered_tracing/README.md
* Update lib/rocprofiler-sdk/hsa/queue.cpp
- generate correlation ID for kernel launch if one doesn't exist
* Remove priority check from tool libraries (samples/tests)
- if(priority > 0) return nullptr check in rocprofiler_configure has proliferated beyond its intended use
* Apply suggestions from code review
|
||
|
|
8c985543d9 |
Rework counter collection sample app (#822)
* Sync more often in counter collection samples * Update samples/counter_collection/main.cpp - support command line arguments - number of iterations - iterations per sync - number of devices to use |
||
|
|
b570ff5273 |
Update samples/intercept_table (#792)
- install function wrappers around HIP runtime API - easily correlated to the executable - safer than HSA runtime due to potential for HSA to get invoked after main returns |
||
|
|
edb1883a05 |
Modified hipMalloc size for main.cpp in sample (#786)
* Modified hipMalloc size for main.cpp in sample * Update samples/counter_collection/main.cpp --------- Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> |
||
|
|
c2f659ab5c |
Removal of HSA from counter collection (#697)
* Minor fix Removal of HSA from counter collection Tests for AQL Updated counter collection client to build profiles in tool init * Rebased * Debug printing * Formatting * More format * fix shadowing --------- Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> |
||
|
|
07537b6231 |
rocprofiler_kernel_dispatch_info_t + header record for buffered counter collection (#758)
* Update include/rocprofiler-sdk
- defines.h
- ROCPROFILER_VERSION_10_0 -> ROCPROFILER_SDK_VERSION_0_0
- fwd.h
- rocprofiler_counter_record_kind_t
- rocprofiler_kernel_dispatch_info_t
- rocprofiler_record_counter_t
- has dispatch id instead of correlation id
- rocprofiler_counter_info_v0_t
- added rocprofiler_counter_id_t field
- added is_constant field
- reordered better packing
- dispatch_profile.h
- added rocprofiler_profile_counting_dispatch_record_t for use as a header record for rocprofiler_profile_counting_dispatch_data_t
- callback_tracing.h
- rocprofiler_callback_tracing_kernel_dispatch_data_t uses rocprofiler_kernel_dispatch_info_t
- buffer_tracing.h
- rocprofiler_buffer_tracing_kernel_dispatch_record_t uses rocprofiler_kernel_dispatch_info_t
* Update lib/rocprofiler-sdk/*
- transition to rocprofiler_kernel_dispatch_info_t
- set id and is_constant values for rocprofiler_counter_info_v0_t in rocprofiler_query_counter_info
* Update lib/rocprofiler-sdk-tool
- transition to rocprofiler_kernel_dispatch_info_t
* Update lib/rocprofiler-sdk/counters/tests/core.cpp
- transition to rocprofiler_kernel_dispatch_info_t
* Update samples
- transition to rocprofiler_kernel_dispatch_info_t
- transition to rocprofiler_counter_record_kind_t
* Update tests
- transition to rocprofiler_kernel_dispatch_info_t
- transition to rocprofiler_counter_record_kind_t
- improve integration test validation for counter-collection
- update serialization for new/additional types
* Fix tests/counter-collection/validate.py
- loosen restrictions on the length of counter description
* Update include/rocprofiler-sdk/buffer_tracing.h
- remove accidental packed attribute
* Update lib/rocprofiler-sdk/counters/xml/derived_counters.xml
- Add description for TCC_TAG_STALL_sum (reference: https://rocm.docs.amd.com/en/develop/conceptual/gpu-arch/mi300-mi200-performance-counters.html)
* Update tests/page-migration/validate.py
|
||
|
|
3eaa678054 |
CTest Environment Update (#756)
* Update test/tools/json-tool.cpp - push/pop ppid as external correlation id instead of pid * Update environment variables for tests and samples * Revert to old CDash dashboard in run-ci.py * Revert to new CDash dashboard in run-ci.py |
||
|
|
0f5c575435 |
Fix code_object_operation_t and memory_copy_operation_t enums (#751)
- enums for operations should not contain callback/buffer tracing categorization - e.g. ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_LOAD should be ROCPROIFLER_CODE_OBJECT_LOAD |
||
|
|
69b8a43dc6 |
Gbaraldi/threadtrace2 (#724)
* Added first ATT API * Finalizing thread trace API * Fixing more rebase conflicts * Added codeobj disassembly sample * Fixing merge issues with rebase [2] * Adding ATT packets * Implemented thread trace intercept * Moved codeobj parser to same repo as rocprofiler * Moved thread trace to new API * Fixing merge conflicts * Fixing more merge conflicts * Adding thread trace packet reuse * Merged aql_profile_v2 headers * Linked ATT sample to aqlprofile * Updated decoder to include non-loaded codeobjs * Implemented ISA decoder into ATT sample * Added marker_id to vaddr * Updating aql_profile_v2 API to memcpy * Updating thread trace API to include 64bit markers. Using the result of ISA matching. * Added instruction type and cycles summary * Updated sample with selection of kernel by kernel_object * Added option to copy from memory kernels * Moved tool_data in thread_trace to dynamic alloc * Restoring hsa.cpp * Fixed ATT sample crash. General improvements. * Moved codeobj library to outside src/ * Updated license header * Moved codeobj_capture to camelcase * Solving some more merge conflicts * Update samples/advanced_thread_trace/CMakeLists.txt Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update samples/advanced_thread_trace/CMakeLists.txt Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update samples/code_object_isa_decode/CMakeLists.txt Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt * Removing unused parameter check * Adding const to isEmpty * Removing unused warning * Adding libdw-dev to requirements * Running clang-format * Commenting out new aql calls * Clang format * Unused variable fix * Adding codeobj-decoder coverage * Commenting out threadtrace * Update samples/CMakeLists.txt Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * P * WOverloaded * Addressing clang-tidy * Virtual destructor on ttracer class * Corr id * Fixing code source format * Update CMakeLists.txt * Build fixes * Update source/lib/rocprofiler-sdk-codeobj/code_object_track.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fix shadowing * Update CMakeLists.txt * Update samples/CMakeLists.txt Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com> Co-authored-by: Ammar ELWazir <aelwazir@amd.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> |
||
|
|
939e23e9d1 |
Stop all client contexts prior to finalization (#721)
* Stop all client contexts prior to finalization
* Update lib/common/container/static_vector.hpp
- improve emplace_back for non-{move,copy}-assignable object
* Update samples/intercept_table/client.cpp
- improve robustness against static object destruction
* Update lib/rocprofiler-sdk/context/context.cpp
- change storage of registered context array
- stable_vector of optional contexts
- common::static_object wrapper around stable_vector
* Update samples/intercept_table/client.cpp
- use variable template for underlying function pointer
|
||
|
|
7b6d3c70bd |
Shared Library Constructor (rocprofv3 deadlock fix) (#599)
* Moved tests/apps to tests/bin * Renamed cmake project in tests/bin * Update samples - Use ROCPROFILER_DEFAULT_FAIL_REGEX - tweaks to stdout messages * Update tests - Use ROCPROFILER_DEFAULT_FAIL_REGEX * Add tests/lib - libraries with HIP code * Update PTL submodule - remove atexit delete of thread_id_map * Update cmake/rocprofiler_options.cmake - Set ROCPROFILER_DEFAULT_FAIL_REGEX * Update common lib: env + logging - improved customization of logging settings - default to disabling logging to files - install failure handler for rocprofv3 - set_env support in environment.* * Add lib/rocprofiler-sdk/shared_library.cpp - shared library constructor * Update lib/rocprofiler-sdk-tool/tool.cpp - destructor thread safety - convert callback_name_info and buffered_name_info to pointers - install failure handler for logging * Add tests/bin/hip-in-libraries - hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels - used for testing deadlocking within __hipRegisterFatBinary * Update bin/rocprofv3 - reorganized the env variables - use exec to launch command - set ROCPROFILER_LIBRARY_CTOR=1 * Add tests/rocprofv3/tracing-hip-in-libraries - uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels * Update bin/rocprofv3 - fix counter collection (no exec) * Update lib/rocprofiler-sdk-tool/tool.cpp - replace "Kernel-Name" with "Kernel_Name" * Update lib/rocprofiler-sdk/registration.cpp Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries * Update tests/rocprofv3 - replace "Kernel-Name" with "Kernel_Name" * Update tests - vector-ops (bin) stream syncs + runs with 4 queues per device - improve counter-collection/input1 validation - rocprofv3/tracing-hip-in-libraries does not do sys-trace - improved validation script for tracing-hip-in-libraries - updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection * Update samples/counter_collection - updated dispatch_callback(s) and record_callback(s) following reworking of prototypes * Update bin/rocprofv3 - reorganized help menu - added options for sub-HSA tables - added --hip-runtime-trace - changed --hip-trace to include --hip-compiler-trace * Update lib/rocprofiler-sdk-tool - improved kernel filtering - removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk) - fixed issue with counter-collection w/o tracing - added support for fine grained HSA API tracing - removed directly linking to HSA-runtime * Update lib/rocprofiler-sdk/agent.cpp - rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option) * GPR (vector and scalar) info in kernel symbol data - rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info * Header include order fix - Include repo headers first - Third party library headers next - standard library headers last * Update dispatch profiling public API - introduce rocprofiler_profile_counting_dispatch_data_t - change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t - provide rocprofiler_user_data_t pointer in dispatch callback - provide rocprofiler_user_data_t value (from dispatch cb) in record callback * Update tests/bin/CMakeLists.txt - fix add_subdirectory(hip-in-libraries) order * Update VERSION - bump to 0.2.0 in prep for AFAR |
||
|
|
1bb94add11 |
Fix rocprofiler_iterate_callback_tracing_kind_operation_args for HIP compiler callbacks (#532)
* Fix HIP compiler iterate args
- `include/rocprofiler-sdk/hip/api_args.h`
- replace struct fields named "f" with "func"
- replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/callback_tracing.cpp`
- iterate_args for HIP compiler table
- `lib/rocprofiler-sdk/registration.cpp`
- fix warning about roctx num_tables
- `lib/rocprofiler-sdk/hip/hip.def.cpp`
- replace struct fields named "f" with "func"
- replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/{hip,hsa,marker}/utils.hpp`
- improve `stringize_impl`
- `lib/rocprofiler-sdk/hsa/code_object.cpp`
- remove stale commented out code
- `lib/rocprofiler-sdk/hsa/queue_controller.*`
- destory_queue -> destroy_queue
- `tests/tools/json-tool.cpp`
- improve parallelism in tool_tracing_callback
- serialize the marker api args
- only invoke rocprofiler_iterate_callback_tracing_kind_operation_args in exit phase
- `samples/counter_collection/CMakeLists.txt`
- reduce timeout on tests to 120 seconds
* Update lib/rocprofiler-sdk/hsa/utils.hpp
- disable dereference of double pointer in stringize_impl
* Update lib/common
- indirection_level in mpl.hpp
- stringize_arg.hpp
* Rework rocprofiler_iterate_callback_tracing_kind_operation_args
- provide more information in rocprofiler_callback_tracing_operation_args_cb_t
- support specifying the dereference level to account for output paramters
|
||
|
|
a1267e1fd2 |
C compatibility for public headers (#566)
* C compatibility for public headers
- add tests/tools/c-tool.c
- builds a tool (which does nothing) with C language
- ensures that tool can be compiled in C
- add tests/c-tool/CMakeLists.txt
- ensures that tool library build from C is a valid tool
- rocprofiler_counter_info_v0_t is_derived is int instead of bool
- C does not have bool unless <stdbool.h> is included
- add `include/rocprofiler-sdk/hsa/api_trace_version.h
- handles providing HSA_*_TABLE_(MAJOR|STEP)_VERSION values if compiled from C
- cmake define in version.h.in for ROCPROFILER_HSA_*_TABLE_(MAJOR|STEP)_VERSION
- HSA table versions compiled with
- use rocprofiler_(hsa|hip|marker)_api_no_args struct to handle incompatibility b/t empty structs in C vs. C++ (size of 0 vs. size of 1)
- extern "C" in include/rocprofiler-sdk/{hsa,hip,marker}/api_args.h
- fixed spelling error: derrived -> derived
- scope YY_NO_INPUT compile definition to lib/rocprofiler-sdk/counters/parser/*
* Revert CDash dashboard
|
||
|
|
875f53b608 |
Correlation ID Retirement + misc (#527)
* Correlation ID Retirement
- include/rocprofiler-sdk/buffer_tracing.h
- add rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- include/rocprofiler-sdk/fwd.h
- ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT
- lib/rocprofiler-sdk/buffer_tracing.cpp
- kind string for correlation id retirement
- lib/rocprofiler-sdk/buffer.hpp
- emplace returns bool
- lib/rocprofiler-sdk/registration.cpp
- pass lib_instance to copy_table functions
- lib/rocprofiler-sdk/context/context.*
- update correlation_id struct
- make ref_count private
- {get,add,sub}_ref_count() functions
- sub_ref_count() performs correlation id retirement
- use stack for "latest" thread-local correlation id
- lib/rocprofiler-sdk/hip/hip.*
- migrate to new {get,add,sub}_ref_count() for correlation ids
- return in iterate_args
- handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/hsa.*
- migrate to new {get,add,sub}_ref_count() for correlation ids
- return in iterate_args
- handle table instance in copy_table
- lib/rocprofiler-sdk/marker/marker.*
- migrate to new {get,add,sub}_ref_count() for correlation ids
- return in iterate_args
- handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/async_copy.cpp
- migrate to new {get,add,sub}_ref_count() for correlation ids
- handle table instance in async_copy_init / async_copy_save
- lib/rocprofiler-sdk/hsa/queue.cpp
- migrate to new {get,add,sub}_ref_count() for correlation ids
- tweak to external correlation id mapping in WriteInterceptor
- tests/async-copy-tracing/validate.py
- check retired_correlation_ids
- tests/common/serialization.hpp
- support rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- tests/kernel-tracing/validate.py
- check retired_correlation_ids
- tests/common/CMakeLists.txt
- perfetto external project
- tests/common/perfetto.hpp
- perfetto categories + aliases
- add_perfetto_annotation
- metaprogramming helpers
- tests/tools/CMakeLists.txt
- link to tests-perfetto
- tests/tools/json-tool.cpp
- demangling functions
- serialization of marker API callback args
- reduce parallel bottleneck in tool_tracing_callback
- support correlation id retirement
- Multiple threads for buffers
- Support ROCPROFILER_TOOL_CONTEXTS_EXCLUDE env variable
- write_perfetto() function
* Update tests/rocprofv3/tracing/validate.py
- tweak test_hsa_api_trace
* Update PTL submodule
- fixes for data race during destruction of task
* Update lib/rocprofiler-sdk/buffer.*
- unique_buffer_vec_t uses std::unique_ptr instead of allocator::unique_static_ptr_t
* Reduce timeouts in counter collection samples [skip ci]
* Update tests/tools/json-tool.cpp
- tweak demangle(string_view, int*) -> demangle(string_view, int&)
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- move sub_ref_count() to later in async_copy_handler to delay retirement slightly more
|
||
|
|
0d939edbba |
Updates/fixes for CI, docs, tests, samples, and common library (#528)
- .github/workflows/continuous_integration.yml - apt-get update before apt-get install - remove libgtest-dev - actions-comment-pull-request: v2.4.3 -> v2.5.0 - .github/workflows/formatting.yml - create-pull-request: v5 -> v6 - cmake/rocprofiler_options.cmake - remove unused ROCPROFILER_DEBUG_TRACE and ROCPROFILER_LD_AQLPROFILE options - samples/counter_collection/callback_client.cpp - corr_id field renamed to correlation_id - samples/counter_collection/client.cpp - corr_id field renamed to correlation_id - include/rocprofiler-sdk/fwd.h - In rocprofiler_record_counter_t: rename corr_id field to correlation_id - doxygen fixes - lib/common/utility.* - remove get_accurate_clock_id_impl - timestamp_ns() defaults to CLOCK_BOOTTIME - lib/rocprofiler-sdk/counters/core.cpp - fix spelling mistake: extrenal -> external - corr_id field renamed to correlation_id - lib/rocprofiler-sdk-tool/tool.cpp - fix destruction of static tool::output_file before finalization - scripts/update-docs.sh - define PROJECT_NAME - tests/async-copy-tracing/validate.py - init_time and fini_time checks - hip_api_traces, marker_api_tracing - tests/common/serialization.hpp - fix save function for rocprofiler_record_counter_t following rename of corr_id to correlation_id - tests/kernel-tracing/validate.py - init_time and fini_time checks - relax test_total_runtime range - tests/rocprofv3/tracing/CMakeLists.txt - remove -M from rocprofv3-test-systrace-execute - exclude test_hsa_api_trace in rocprofv3-test-systrace-validate due to HIP API tracing - tests/rocprofv3/tracing/validate.py - update test_kernel_trace to accept mangled or demangled - tests/tools/json-tool.cpp - remove use of GLOG - include init_time and fini_time - write_json(...) function |
||
|
|
7adffd5b22 |
Add rocprofiler_query_counter_info function (#452)
* Add rocprofiler_query_counter_info function Replaces rocprofiler_query_counter_name. Allows for querying other types of info from counters (such as description) and gives us some flexibility to add return data in the near future (if we have to). * source formatting (clang-format v11) (#453) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Updated version fetching * source formatting (clang-format v11) (#509) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Merged --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com> |
||
|
|
3638351b4c |
Callback based handler for counter collection (#506)
* Callback based handler for counter collection * source formatting (clang-format v11) (#507) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * cmake formatting (cmake-format) (#508) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Doc fix * Minor doc fix * More doc fixes * More doc fixes * More doc fixes * Update CI * Changes to the API per comments * Mutex exception for HSA * source formatting (clang-format v11) (#511) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Doc fix --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com> |
||
|
|
3eb6a27bc6 |
Add support for AQL dimensions (#262)
* Add support for AQL dimension changes Adds support for returning dimensions from AQLProfile through rocprofiler to tools. Includes a much larger expanded test suite that covers nearly all files in counter collection. Specific changes below: samples/counter_collection/print_functional_counters: Modified to check the validity of dimensions returned in comparison to the actual underlying data obtained from a kernel execution. rocprofiler-sdk/aql/helpers: adds function calls to support fetching dimension information from AQLProfile. rocprofiler-sdk/aql/packet_construct: modified to allow for events to be exported to aid evaluate_ast in decoding the output buffer. lib/rocprofiler-sdk/counters: Instance count now derived from dimension sizes. rocprofiler_query_counter_dimensions now moved to a callback format to improve usability. rocprofiler-sdk/counters/core: Code migrations and exports of functions for testing. rocprofiler-sdk/counters/dimensions: Generates a dimension cache to be used when querying dimension information for a counter id. rocprofiler-sdk/counters/evaluate_ast: Modified to pass back correct dimension information and to check/determine output dimensions for derived counters. rocprofiler-sdk/counters/id_decode: Modified to have a map between dimension name -> dimension along with a conversion from the aql profile id for a dimension (string) -> integer based id (happens only once during init). rocprofiler-sdk/hsa/queue: Modified to allow for making testing easier. Specifically to allow Queue to now be mocked in unit tests for counter collection. * Merge with changes for serialization * Added suggestions * source formatting (clang-format v11) (#457) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Minor fix * Test change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com> |
||
|
|
8a25b239bc |
Fixing counter collection in tools and enabling tests (#436)
* Fixing coutner colleciton in tools and enabling tests * fixing tests * improving coverage on test * Adding vector operations app * Fixing tools bug for counter collection * removing roctx linking |
||
|
|
3f39339926 |
API Tracing Overhaul (#437)
* Update include/rocprofiler-sdk/hsa/*
- split HSA API IDs into separate enumerations
- add support for finalize ext table
* Update include/rocprofiler-sdk/hip/*
- remove compiler_api_args.h
- rocprofiler_hip_api_args_t contains all for HIP runtime and HIP compiler
- ROCPROFILER_HIP_API_ID_ -> ROCPROFILER_HIP_RUNTIME_API_ID_
* Update include/rocprofiler-sdk/marker/table_api_id.h
- ROCPROFILER_MARKER_API_TABLE_ID_ -> ROCPROFILER_MARKER_TABLE_ID_
* Update include/rocprofiler-sdk/*/table_api_id.h
- table_api_id.h -> table_id.h
* Update include/rocprofiler-sdk/*/table_api_id.h
- table_api_id.h -> table_id.h
* Update include/rocprofiler-sdk/fwd.h
- ROCPROFILER_CALLBACK_TRACING_HSA_API split into 4 enum values:
- ROCPROFILER_CALLBACK_TRACING_HSA_CORE_API
- ROCPROFILER_CALLBACK_TRACING_HSA_AMD_EXT_API
- ROCPROFILER_CALLBACK_TRACING_HSA_IMAGE_EXT_API
- ROCPROFILER_CALLBACK_TRACING_HSA_FINALIZE_EXT_API
- ROCPROFILER_BUFFER_TRACING_HSA_API split into 4 enum values:
- ROCPROFILER_BUFFER_TRACING_HSA_CORE_API
- ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API
- ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API
- ROCPROFILER_BUFFER_TRACING_HSA_FINALIZE_EXT_API
- rocprofiler_callback_tracing_code_object_operation_t renamed to rocprofiler_code_object_operation_t (more consistent)
- doxygen updates
* Update include/rocprofiler-sdk/buffer_tracing.h
- improved doxygen comments
- removed unused rocprofiler_buffer_tracing_queue_scheduling_record_t
- removed unused rocprofiler_buffer_tracing_correlation_record_t
* Update include/rocprofiler-sdk/callback_tracing.h
- removed rocprofiler_callback_tracing_hip_compiler_api_data_t
- rocprofiler_hip_api_args_t and rocprofiler_hip_compiler_api_args_t were combined
- rocprofiler_hsa_api_retval_t and rocprofiler_hsa_compiler_api_retval_t were combined
* Update lib/rocprofiler-sdk/hsa/*
- utils.hpp
- formatters for hsa_ext_program_t and hsa_ext_control_directives_t
- defines.hpp
- removed variadic macros from lib/common/defines.hpp
- HSA_API_META_DEFINITION, HSA_API_INFO_DEFINITION_0, HSA_API_INFO_DEFINITION_V specialize on table id
- async_copy.cpp
- ROCPROFILER_HSA_API_ID_* -> ROCPROFILER_HSA_AMD_EXT_API_ID_*
- add table id to templates
- improve async_copy_fini
- hsa.hpp
- add hsa_table_id_lookup
- add hsa_domain_info
- add table id to templates
- add copy_table function
- hsa.cpp
- add table id to templates
- require hsa tables to be trivial and standard layout
- remove set_data_args specialization for hsa_amd_memory_async_copy_rect
- implement copy_table function
- hsa.def.cpp
- update enums
* Update lib/rocprofiler-sdk/hip/*
- defines.hpp
- use lib/common/defines.hpp
- add hip_table_id_lookup to HIP_API_TABLE_LOOKUP_DEFINITION
- hip.hpp
- hip_table_id_lookup
- template iterate_args on table id
- templated copy_table and update_table
- hip.cpp
- replaced api_id_bounds with hip_domain_info
- templated iterate_args on table id
- templated copy_table and update_table
* Update lib/rocprofiler-sdk/marker/*
- defines.hpp
- use lib/common/defines.hpp
- marker.cpp
- updated enums
- marker.def.cpp
- updated enums
* Update lib/rocprofiler-sdk/tests
- common.hpp
- ROCPROFILER_CALL_EXPECT
- callback_data_ext
- update get_callback_tracing_names with new enums
- update get_buffer_tracing_names with new enums
- external_correlation.cpp
- support new HSA API enums
- intercept_table.cpp
- use test/common.hpp
- update to new HSA API enums
- registration.cpp
- support new HSA API enums
- naming.cpp
- validation for all get_ids(), get_names(), name_by_id(), id_by_name(), etc.
* Update lib/common
- defines.hpp
- Move IMPL_DETAIL_FOR_EACH_NARG, GET_ADDR_MEMBER_FIELDS, and GET_NAMED_MEMBER_FIELDS here
- used by HSA, HIP, and Marker
- static_object.hpp
- is_trivial_standard_layout static constexpr member function
- suppress register_static_dtor when is_trivial_standard_layout
* Update lib/rocprofiler-sdk/hsa/code_object.*
- name_by_id
- id_by_name
- get_names
- get_ids
* Update lib/rocprofiler-sdk/registration.cpp
- Update rocprofiler_set_api_table for HSA
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- Update for new HSA enums
- Rework to use switch statement
- rocprofiler_query_callback_tracing_kind_operation_name
- rocprofiler_iterate_callback_tracing_kind_operations
- rocprofiler_iterate_callback_tracing_kind_operation_args
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- Update for new HSA enums
- Rework to use switch statement
- rocprofiler_query_buffer_tracing_kind_operation_name
- rocprofiler_iterate_buffer_tracing_kind_operations
* Update lib/rocprofiler-sdk-tool
- helper.cpp
- update get_buffer_id_names with new enums
- update get_callback_id_names with new enums
- tools.cpp
- update to use new HSA enums
* Update samples/common
- added call_stack.hpp
- source_location struct
- call_stack_t alias
- print_call_stack function
- added name_info.hpp
- utils for getting buffer/callback domain and operation names
* Update samples/api_buffered_tracing/client.cpp
- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums
* Update samples/api_callback_tracing/client.cpp
- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums
* Update tests/tools/json-tool.cpp
- update for new HSA enums
* Update tests/rocprofv3/tracing/validate.py
- update for new HSA domain names
* Update samples/counter_collection/main.cpp
- reduce number of kernels to 50,000 since 200,000 causes issues with thread sanitizer
|
||
|
|
9efafc4d23 |
Split ROCTx API tables and update intercept table API (#421)
* Update include/rocprofiler-sdk
- buffer_tracing.h
- fix doxygen for rocprofiler_buffer_tracing_hip_api_record_t
- update doxygen for rocprofiler_buffer_tracing_marker_api_record_t
- remove unused marker_id field
- fwd.h
- Split ROCPROFILER_CALLBACK_TRACING_MARKER_API into ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API
- Split ROCPROFILER_BUFFER_TRACING_MARKER_API into ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API
- split rocprofiler_runtime_library_t into rocprofiler_runtime_library_t and rocprofiler_intercept_table_t
- after split of ROCTx into 3 tables, specifying rocprofiler_at_internal_thread_create became confusing
* Update include/rocprofiler-sdk-roctx/api_trace.h
- Split into three tables: core, control, and name
- core: what it sounds like
- control: functions for controling the profiler
- name: functions for giving resources names
* Update lib/rocprofiler-sdk-roctx/roctx.cpp
- modifications following split into multiple tables
* Update lib/rocprofiler-sdk/marker/*
- modifications following split of ROCTx API into multiple intercept tables
* Update lib/rocprofiler-sdk/tests
- common.hpp
- add enums to get_callback_tracing_names() and get_buffer_tracing_names()
- intercept_table.cpp
- update test to use rocprofiler_intercept_table_t (and enums) instead of rocproifler_runtime_library_t
- update OR combos tested
- roctx.cpp
- updates following split of ROCTx API table into multiple tables
- use simplified specification of control API
* Update lib/rocprofiler-sdk
- buffer_tracing.cpp
- Updates for ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- callback_tracing.cpp
- Updates for ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- intercept_table.hpp
- notify_runtime_api_registration -> notify_intercept_table_registration
- intercept_table.cpp
- updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
- registration.cpp
- updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
- updates for notify_runtime_api_registration -> notify_intercept_table_registration
* Update lib/rocprofiler-sdk-tool
- helper.cpp
- Updates for new enums in get_callback_id_names() and get_buffer_id_names()
- tool.cpp
- migrate to new enums for split ROCTx tables
- use simplified split for control table vs. core+name tables
* Update samples/{api_callback_tracing,intercept_table}
- intercept_table/client.cpp
- rocprofiler_runtime_library_t -> rocprofiler_intercept_table_t
- api_callback_tracing/client.cpp
- Updates for new enums in get_callback_id_names()
- use simplified split for control table vs. core+name tables
- migrate to new enums for split ROCTx tables
* Update tests
- rocprofv3/tracing/validate.py
- handle new marker domain names
- tools/json-tool.cpp
- Updates for new enums in get_callback_id_names() and get_buffer_id_names()
- use simplified split for control table vs. core+name tables
- migrate to new enums for split ROCTx tables
* Update tests/rocprofv3/tracing/CMakeLists.txt
- fix FAIL_REGULAR_EXPRESSION for rocprofv3-test-trace-execute
* Update lib/rocprofiler-sdk-tool/{output_file,tool}.*
- logging in output_file dtor
- support stdout/stderr
* Update lib/common/container/record_header_buffer.hpp
- reduce probability of is_empty() returning true while emplace is happening
* Update lib/rocprofiler-sdk-tool/tool.cpp
- logging for buffered_tracing_callback
- counter collection uses CSV encoder
* Update bin/rocprofv3
- remove -i flag from help menu
|
||
|
|
9a8b6f6b7b |
Counter API and Samples Updates (#410)
* Update include/rocprofiler-sdk/{counters,profile_config}.h
- use rocprofiler_agent_id_t instead of rocprofiler_agent_t
* Update samples
- use rocprofiler-sdk::rocprofiler-sdk instead of rocprofiler::rocprofiler in cmake
- api_callback_tracing sample roctxProfiler{Pause,Resume}
- api_callback_tracing sample uses ROCTx
- updates to use rocprofiler_agent_id_t
* Update run-ci.py
- exclude rocprofiler-sdk-tool from samples (no sample uses that code)
* Update lib/rocprofiler-sdk-tool/tool.cpp
- Update rocprofiler_iterate_agent_supported_counters to use agent ID
* Update lib/rocprofiler-sdk/counters/core.*
- profile_config has pointer to agent instead of copy
* Update lib/rocprofiler-sdk/agent.*
- provide get_agent(...) func via rocp agent id
* Update lib/rocprofiler-sdk/{buffer,callback}_tracing.cpp
- return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED for enums missing implementation
* Update lib/rocprofiler-sdk/counters.cpp
- update to use rocprofiler_agent_id_t instead of rocprofiler_agent_t
* Update lib/rocprofiler-sdk/profile_config.cpp
- update to use rocprofiler_agent_id_t instead of rocprofiler_agent_t
* Update source/docs
- requirements.txt + install reqs in cmake
* Bump version to 0.1.0
* Update samples/api_callback_tracing/CMakeLists.txt
- LD_LIBRARY_PATH for test
* Update test/rocprofv3/tracing/CMakeLists.txt
- reorder validation files so memory copy comes first
* Update lib/rocprofiler-sdk-tool/tool.cpp
- logging for flushing buffers
- variables for buffer_size and buffer_watermark
- increase the watermark to a full buffer
- use dedicated threads for each buffer
* Update lib/rocprofiler-sdk-tool/CMakeLists.txt
- test sets ROCPROF_LOG_LEVEL and ROCPROFILER_LOG_LEVEL to info
* Remove lib/rocprofiler-sdk-tool/trace_buffer.hpp
* Update lib/rocprofiler-sdk-tool/CMakeLists.txt
- drop log level to warning when leak sanitizer is enabled (produces small memory leak)
|
||
|
|
c641749fe6 |
HIP API Tracing (#357)
* Update include/rocprofiler-sdk/hip*
- updates for intercept table
* Update lib/common/units.hpp
- clang-tidy fixes
* Add lib/rocprofiler-sdk/hip
- tracing implementation for the HIP intercept table
* Update source/lib/rocprofiler-sdk/CMakeLists.txt
- add_subdirectory(hip)
* Update source/lib/rocprofiler-sdk/hsa
- offset function in hsa_api_info<Idx>
- remove report_activity, set_callback
- Tweak HSA_API_TABLE_LOOKUP_DEFINITION
* Update lib/rocprofiler-sdk/hip
- rocprofiler::hip::copy_table
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/hsa/utils.hpp
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/tests/intercept_table.cpp
- remove failures for intercepting HIP API tables
* Update include/rocprofiler-sdk/fwd.h
- add ROCPROFILER_HIP_RUNTIME_LIBRARY (== ROCPROFILER_HIP_LIBRARY)
- add ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_query_buffer_tracing_kind_operation_name
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_iterate_buffer_tracing_kind_operations
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_query_callback_tracing_kind_operation_name
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operations
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operation_args
* Update lib/rocprofiler-sdk/intercept_table.cpp
- support HipDispatchTable and HipCompilerDispatchTable
* Update lib/rocprofiler-sdk/internal_threading.cpp
- Support ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/registration.cpp
- Support "hip" and "hip_compiler" in rocprofiler_set_api_table
- Added some extra logging
* Update samples/api_{buffered,callback}_tracing
- Modifications to demonstrate HIP API tracing
* Update tests/kernel-tracing
- Modifications to handle/test HIP API tracing
* Separate HIP tracing from HIP compiler tracing
* Fix installation of include/rocprofiler-sdk/hip/*
- add compiler and table headers to install
* Fixes to HIP interception
- hip_api_trace.hpp was updated a bit
- removed hipGetDeviceProperties (generic)
- added hipGetDevicePropertiesR0600
- added hipGetDevicePropertiesR0000
- removed hipRegisterTracerCallback
- reordered hipCreateChannelDesc, hipExtModuleLaunchKernel, hipHccModuleLaunchKernel
- added hipDrvGraphAddMemsetNode
- static asserts in hsa_api_info ensuring ordering of pointers
* Update lib/rocprofiler-sdk/hip/hip.*
- use size_t instead of rocprofiler_hip_table_api_id_t as non-type template parameter (smaller binary)
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update lib/rocprofiler-sdk/hsa/hsa.*
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update test/kernel-tracing/validate.py
- does not expect any hip_api_traces until libamdhip.so actually starts using rocprofiler-register
* Update tests/tools/json-tool.cpp
- fix context associated with "HIP_API_CALLBACK"
* Update external/CMakeLists.txt
- move misc variables to top of CMakeLists.txt so they apply to all external subprojects
- BUILD_TESTING (OFF)
- BUILD_SHARED_LIBS (OFF)
- BUILD_OBJECT_LIBS (OFF)
- BUILD_STATIC_LIBS (ON)
- CMAKE_POSITION_INDEPENDENT_CODE (ON)
- CMAKE_VISIBILITY_INLINES_HIDDEN (ON)
- CMAKE_CXX_VISIBILITY_PRESET (hidden)
- disable using libunwind in glog
* Update lib/rocprofiler-{sdk,sdk-tool}/CMakeLists.txt
- remove explicit setting of SKIP_BUILD_RPATH
* Update CMakeLists.txt
- set high-level CMAKE_BUILD_RPATH and CMAKE_INSTALL_RPATH_USE_LINK_PATH
* Update tests/CMakeLists.txt
- include(GNUInstallDirs)
* Update samples/CMakeLists.txt
- include(GNUInstallDirs)
* Update include/rocprofiler-sdk/hip/{compiler_api,api}_args.h
- remove extern "C" due to incompatibility b/t empty struct in C (size 0) vs. empty struct in C++ (size 1)
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- clang-tidy fixes
* Update cmake/rocprofiler_linting.cmake
- add a feature for clang tidy exe
* Update lib/rocprofiler-sdk/hip/hip.cpp
- use recursion instead of fold expression due to clang-tidy errors (maximum nesting level exceeded)
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- fix merge
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- fix merge
* Update bin/rocprofv3
- args for marker, HIP runtime, and HIP compiler tracing
* Update tests/apps/simple-transpose
- use roctx
* Update tests/rocprofv3/tracing
- validate marker API data
* Update lib/rocprofiler-sdk-tool
- support for HIP runtime, HIP compiler, marker API
* Update queue/queue_controller/registration/utility
- call hsa::queue_controller_fini() during finalization
- add a yield function to common/utility.hpp
- implements a thread yield + sleep
- add a sync function to Queue class
- add a iterate_queues member function to QueueController
- this is used to sync each queue during queue_controller_fini()
* Fix data races: queue/context/stable_vector
- stable_vector::emplace_back returns reference
- correlation id map uses stable_vector
- queue_info_session has explicit fields for queue id, hsa agent, rocp agent
- use hsa::get_table() in AsyncSignalHandler
- WriteInterceptor does not use TLS for context array
* Update lib/rocprofiler-sdk/hsa/hsa.*
- static object for API subtables
- accessors for API subtables
- google tests for HSA API subtables
* Update lib/rocprofiler-sdk/hsa/{queue,async_copy}.cpp
- use HSA subtable accessors
* Update rocprofiler_memcheck and CI workflow
- use GCC 13 instead of GCC 11 due to suspected false positives in thread sanitizer
- GCC 13 uses libtsan.so.2
* Update CI workflow
* Update lib/rocprofiler-sdk/counters/{metrics,counters}
- fix possibly dangling reference to a temporary from gcc-13
* Update thread-sanitizer-suppr.txt
- Ignore data races originating in hsa-runtime library
* Update cmake/rocprofiler_memcheck.cmake
- Deduce the sanitizer library to preload by compiling an application and extracting the linked sanitizer library
* Update tests/rocprofv3/tracing/CMakeLists.txt
- add csv files to REQUIRED_FILES and ATTACH_ON_FAIL in validate test
* Update lib/common/container/record_header_buffer.hpp
- fix data race identified by gcc v13 and libtsan.so.2
* Update hip API id, args, and def
- remove hipDrvGraphAddMemsetNode (not part of ROCm 6.0
* Update lib/common/container/record_header_buffer.hpp
- fix deadlock in save/read/reset
* Update source/docs/CMakeLists.txt
- remove COMMAND_ERROR_IS_FATAL ANY to allow for printing of stdout/stderr
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- remove overloads for HIP_MEMSET_NODE_PARAMS
* Update docs/CMakeLists.txt
- use find_program for shell instead of hardcoded /bin/bash
|
||
|
|
802e79b113 |
Tests for agent and aql packet generation (#365)
* Tests for agent and aql packet generation Test for agent and fixing test problems with aql packet that caused test to not run. * cmake formatting (cmake-format) (#366) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * source formatting (clang-format v11) (#367) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Minor tweak * source formatting (clang-format v11) (#368) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Add gfx900 to basic_counters * Update samples/counter_collection/client.cpp - fix data race by flushing buffer during tool_fini * Fix data race for output stream destruction --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
936816f762 |
Async memory copy tracing (#317)
* Update samples/api_buffered_tracing/client.cpp
- support ROCPROFILER_BUFFER_TRACING_MEMORY_COPY
* Update include/rocprofiler-sdk/{buffer_tracing,fwd}.h
- update rocprofiler_buffer_tracing_memory_copy_record_t
- add ROCPROFILER_BUFFER_TRACING_MEMORY_COPY_HOST_TO_HOST to rocprofiler_memory_copy_operation_t
* Update lib/rocprofiler-sdk/context/context.*
- get_registered_contexts functions (local copy)
* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp
- include some memory allocations and memory copies for better testing
* Update tests/common/serialization.hpp
- update serialization save function for rocprofiler_buffer_tracing_memory_copy_record_t
* Update lib/rocprofiler-sdk/hsa/hsa.*
- remove stale set_callback / activity_functor_t code
- forward decl hsa_api_meta
- template struct hsa_api_func for getting function return type and args
* Update tests/kernel-tracing/validate.py
- enforce memory_copies data size
- test timestamps in memory copies data
- improve internal and external correlation id validation
* Update lib/rocprofiler-sdk/hsa/defines.hpp
- HSA_API_META_DEFINITION macro
* Update lib/rocprofiler/hsa/rocprofiler-sdk/hsa/hsa.def.cpp
- HSA_API_META_DEFINITION specializations for async copy functions
* Add lib/rocprofiler-sdk/hsa/async_copy.{hpp,cpp}
- implements buffer memory tracing
* Update lib/rocprofiler-sdk/registration.cpp
- invoke rocprofiler::hsa::async_copy_init
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- logging improvements
- improve hsa <-> rocp agent mapping
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- load original signal in async signal handler before store_screlease
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- use store_relaxed instead of store_screlease
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- logging
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- logging
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- misc changes
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- misc changes
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- misc changes
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- return function pointer instead of lambda
* Update reproducible-runtime.cpp
- device sync
* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp
- use *Async variants of hipMalloc and hipMemcpy
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- populate async data properly
* Update tests/kernel-tracing/validate.py
- verification of async copy direction
* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp
- temporarily disable async memcpy functions
* Create tests/tools
- directory containing tool libraries used for collecting data in integration tests
* Update tests/kernel-tracing
- remove kernel-tracing-test-tool library (now rocprofiler-sdk-json-tool)
- update cmake, validate.py, conftest.py accordingly
* Add tests/async-copy-tracing
- integration test validating async copy tracing in transpose example
* Update tests/CMakeLists.txt
- updates for restructuring
* Revert tests/apps/reproducible-runtime
- restore code to semi-original state (no memory copying)
* Update tests/async-copy-tracing/validate.py
- fix comment in test_async_copy_direction
* Fix building tests against installation
|