* Adding changes to register and read symbols from the hip fat binary
* adding json output for host_functions
* added error handling
* adding json tool support
* Adding tests
* formatting changes
* Adding documentation
* refactoring as per amd-staging
* Adding intializers and changing macros
* Fix page-migration background thread on fork (#31)
* Fix page-migration background thread on fork
After falling off main in the forked child, all the children
try to join on on the parent's monitoring thread. This results
in a deadlock. Parent is waiting for the child to exit, but
the child is trying to join the parent's thread which is
signaled from the parent's static destructors.
Even with just one parent and child, due to copy-on-write
semantics, a child signalling the background thread to join
will still block (thread's updated state is not visible
in the child).
This fix creates background treads on fork per-child with a
pthread_atfork handler, ensuring that each child has its own
monitoring thread.
* Formatting fixes
* Detach page-migration background thread and update test timeout
* Attach files with ctest
* Update corr-id assert
* Tweak on-fork, simplify background thread
* Revert thread detach
* Adding --collection-period feature in rocprofv3 to match v1/v2 parity (#9)
* Adding Trace Period feature to rocprofv3
* Adding feature documentation
* Update source/bin/rocprofv3.py
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fixing format
* Moving to Collection Period and changing the input params
* Format Fixes
* Fixing rebasing issues
* Removing atomic include from the tool
* Adding more options for units, optimizing the code
* Fixing rocprofv3.py
* Fixing time conv & adding time controlled app
* Fixing format
* Changing to shared memory testing methodology
* use of shmem use
* Fix include headers for transpose-time-controlled.cpp
* Format upload-image-to-github.py
* Removing shmem and using only env var to dump timestamps from the tool
* Tool Fixes + Test Config
* Adding Tests
* Fixing Review comments
* Update trace period implementation
* Update trace period tests
* check between start and stop timestamps
* Merge Fix
* Update validate.py
* Improve safety of rocprofiler_stop_context after finalization
* Pass context id to collection_period_cntrl by value
* Adding 20 us error margin
* Ensure log level for collection-period test is not more than warning
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- move error code check macros to implementation
- fix macros which check error code
- use constexpr values instead of #define
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- debugging for error that cannot be locally reproduced
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- improve error handling and logging
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- tweak to non-fatal logging messages
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- cleanup of logging messages
* Update host kernel symbol register data fields
* Update source/lib/rocprofiler-sdk/code_object/hip/code_object.hpp
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Kuricheti, Mythreya <Mythreya.Kuricheti@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Fix page-migration background thread on fork
After falling off main in the forked child, all the children
try to join on on the parent's monitoring thread. This results
in a deadlock. Parent is waiting for the child to exit, but
the child is trying to join the parent's thread which is
signaled from the parent's static destructors.
Even with just one parent and child, due to copy-on-write
semantics, a child signalling the background thread to join
will still block (thread's updated state is not visible
in the child).
This fix creates background treads on fork per-child with a
pthread_atfork handler, ensuring that each child has its own
monitoring thread.
* Formatting fixes
* Detach page-migration background thread and update test timeout
* Attach files with ctest
* Update corr-id assert
* Tweak on-fork, simplify background thread
* Revert thread detach
* Update kfd ioctl header
- Adds new event for dropped events
- Mirrors kernel update by Philip Yang
* Add error code for page migration events
- Adds support for new error code field for page migration end events
- Page migration end event is now generated for migration failure
- Error code is zero for successful migration
* Add dropped event SMI event
- New event type indicates if events were dropped
- Events are dropped if the buffer is full
* Squashed commit of the following:
commit b76f2635f4b65599f03812a73d0cf410f5ada213
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Fri Apr 26 00:29:09 2024 +0000
Changed for PR feedback
commit bedb8ad566ff42fbf117b19202c26c507abcf8ac
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Apr 25 19:20:06 2024 -0500
Fix installation
commit a98f8a69459a1450a1be9c98e20b3c1e7f2568c2
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Apr 25 19:16:35 2024 -0500
Restructure the headers
commit 46489a020ffafdd5f4ce3f580469ff233ef67fe1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Tue Apr 23 23:31:10 2024 +0000
Update hsa include
commit 8e795282cce348fc6aa736b7857b21aeb32aa20a
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Tue Apr 23 23:02:32 2024 +0000
Report page migration events as start/end
* Updated tests accordingly
* Page migration events are reported independently
commit 8784e5ad4895a626a2a8e4ac12f8021b34172bd4
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Tue Apr 16 17:01:57 2024 +0000
Update handling of dropped page migration events
Previously, we dropped all locally buffered events when we detect that
KFD has dropped some events. This may drop too many pending events too eagerly.
When we receive an end event and cannot find the corresponding start,
we can be sure that KFD has dropped some events in the immediate past.
When this happens, we look through all locally buffered events and report
the start events that are older than 10s as partial events --- they have
no "end" information (we expect that the end events have been dropped).
We also set the polling timeout to 10s to prevent the local buffer from
getting too large with events waiting to be paired up.
Updated tests
commit 2e8e0b07eeda9b5990e1ae8d28dcd3a035ce38e1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Tue Apr 16 17:01:31 2024 +0000
Docs for triggers
* Fix page migration sample
* Fix hasher, kfd install
* Add hsa include
* Install KFD include dir
* Updates from code review
- single timestamp field
- node_id -> agent_id
- from_node -> from_agent
- to_node -> to_agent
* Misc revisions
* Remove page-migration install target
* Update page-migration pytest
* Tweak to serialization
* Address PR comments
* Update page-migration test
* Add cli args, update iterations
* Address PR comments
* Add abi.cpp for static_asserts
* Update page_migration gtest with only runtime tests
* Moved helpers into utils.hpp
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Change all rocprofiler-X target names to rocprofiler-sdk-X
* Update rocprofiler-sdk-config.cmake
- fix install tree target names
- simplify logic for using find w/ components and find w/o components
* Update rocprofiler-sdk-roctx-config.cmake
- simplify logic for using find w/ components and find w/o components
* Update samples/intercept_table/CMakeLists.txt
- demonstrate/test use of `find_package(rocprofiler-sdk ... COMPONENTS ...)`
- formerly, the rocprofiler_callback_tracing_record_t data was stored in itr["record"], e.g. itr["record"]["correlation_id"]
- dropped "record" key, e.g. itr["correlation_id"]
* Page migration reporting support
* Page migration: Update parser and reporting
Container does not lave latest KFD header, so CI might fail
* Add kfd_ioctl.h
* Formatting
* Update get_key
- get key was not used (and shouldn't be), so delete it
* clang-tidy fixes
* Tests for page migration
* Apply suggestions from code review
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update tests/bin/page-migration/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update page-migration test app
- add hipHostRegister to register mmap'ed allocation with HIP
- misc cleanup and reorg
- remove HSA_XNACK=1 from test env
* Update lib/rocprofiler-sdk/tests/page_migration.cpp
- fix compilation error
* Minor updates (reorg, rename)
* Page migration reporting support
* Page migration: Update parser and reporting
Container does not lave latest KFD header, so CI might fail
* Update page migration tests, fix trigger types
* Page Migration Tracing Support Refactoring (#753)
* Reorganization
* Update page migration init/fini
* Formatting
* Update page_migration.cpp
- change logging severity
* Skip test if KFD does not support page migration reporting
* Rework skipping test if KFD does not support page migration
* Fix event trigger enum values
* Fix clang-diagnostic-unused-const-variable
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>